teaching/enseignement

Machine Learning


See also Course material, Other LABS, Course projects

LABS

8. Ensemble Models (II)

Prediction of a categorical variable with 2 levels

We have collected gene expression levels for 4654 genes on 97 early-stage breast cancer samples. After surgical removal of the tumour, some unfortunately relapsed within 5 years (label=+1), while other did not (label=0).
The goal of the lab in to improve predictiion of relapse given gene expressions using ensemble methods.

1. Boosting
a. Compute boosting predictions for the relapse dataset and compare the ROC curve with the one other classifiers. Try to improve the result with different algorithm parameters.
b. Print a list of the most important variables for prediction


2. Gradient boosting
a. Compute gradient boosting predictions for the relapse dataset and compare the ROC curve with the one other classifiers. Try to improve the result with different algorithm parameters.
b. Print a list of the most important variables for prediction

3. eXtreme gradient boosting
a. Compute X gradient boosting predictions for the relapse dataset and compare the ROC curve with the one other classifiers. Try to improve the result with different algorithm parameters.
b. Print a list of the most important variables for prediction


PART II - Adapt the previous codes to predict the fat percent in the cookies.


Codes: CancerRelapse_EnsembleMethods_2.py, CancerRelapse_EnsembleMethods_2.R, Cookies_EnsembleMethods_2.py, Cookies_EnsembleMethods_2.R