teaching/enseignement

Machine Learning


See also Course material, Other LABS, Course projects

LABS

7. Ensemble Models (I)

Prediction of a categorical variable with 2 levels

We have collected gene expression levels for 4654 genes on 97 early-stage breast cancer samples. After surgical removal of the tumour, some unfortunately relapsed within 5 years (label=+1), while other did not (label=0).
The goal of the lab in to improve predictiion of relapse given gene expressions using ensemble methods.

1. Bagging
a. Compute bagging predictions for the relapse dataset and compare the ROC curve with the one of a single tree.
R users: Try different values for mfinal and maxdepth
Python users: Try different values for n_estimators
b. Print a list of the most important variables for prediction
Idea : use the knn method based on this variables.

2. Random Forest
a. Compute Random Forest predictions for the relapse dataset and compare the ROC curve with the one of a single tree.
R users: Try different values for mtry and maxdepth
Python users: Try different values for n_estimators and max_depth
b. Print a list of the most important variables for prediction


Codes: CancerRelapse_EnsembleMethods_1.py, CancerRelapse_EnsembleMethods_1.R, Cookies_EnsembleMethods_1.py, Cookies_EnsembleMethods_1.R