Machine Learning

See also Course material, Other LABS, Course projects


3. Model linear (I)

In this lab, we study a cookies data set analysed via near infrared spectroscopy. Our goal will be to predict the fat percent in the biscuits. Near infrared spectrometry offers a practical alternative to the time-consuming, wet chemical methods and chromatographic techniques. FT-NIR is non-destructive, requiring no sample preparation or hazardous chemicals, making it quick and reliable for quantitative and qualitative analysis. NIR is ideal for rapid raw material identification and is also a powerful analysis tool capable of accurate multi-component quantitative analysis.

Fit a regression linear model on data after reduction of dimension Use cross-validation to compare models

Examples: Ozone_LinearModel_1.py, Ozone_LinearModel_1.R

1. Download the cookies dataset
2. Because of the large dimension of the data (p=28, n=700), it is not possible to compute beta=(X*X^T)^{-1}(X*Y) where is the matrix of explanatory variables (plus a column of one) and Y the variable to explain.
a. Find the 10 variables with the with highest correlation with the fat percent variable and fit a linear model based on this variables
b. Compute a PCA and fit a linear model on the selected components.
3. Use cross-validation to compare both models/approaches.

Codes: Ozone_LinearModel_1.py, Ozone_LinearModel_1.R, Cookies_LinearModel_1_ToStart.py, Cookies_LinearModel_1_ToStart.R