In this repository, you can find my work for the Project 1 of the Machine Learning course at EPFL.
This project consists in a Kaggle competition similar to the Higgs Boson Machine Learning Challenge (2014). My team ended at the 3rd place (over 117 teams).
This file explains the organisation and functions of the python scripts. For more information about the implementation, see the PDF report and the commented code.
First, you should place train.csv and test.csv in a data folder at the root of the project.
Contain 3 different cost functions like:
calculate_mse: Mean square errorcalculate_mae: Mean absolute errorcompute_loss_neg_log_likelihood: Negative log likelihood
Contain helper methods for cross validation.
build_k_indices: Builds k indices for k-fold cross validationcross_validation_visualization: Creates a plot showing the accuracy given a lambda value
Contain multiple methods for data processing and utilitary methods necessary to achieve the regression methods:
standardize,buid_poly,add_constant_column,na,impute_dataandprocess_data: All the processing functions. See the report for explications about those functions.compute_gradient: Computes the gradient for gradient descent and stochastic gradient descentbatch_iter: Generate a minibatch iterator for a dataset
Contain functions used to load the data and generate a CSV submission file.
Contain the 6 regression methods needed for this project
least_squares_gd: Linear regression using gradient descentleast_squares_sgd: Linear regression using stochastic gradient descentleast_squares: Least squares regression using normal equationsridge_regression: Ridge regression using normal equationslogistic_regression: using stochastic gradient descentreg_logistic_regression: Regularized logistic regression
Script that generates the exact CSV file submitted on Kaggle.
Python notebook used for tests during this project.