Machine Learning
See also
Course material,
Other LABS,
Course projects
LABS
3. Model linear (I)
Context
In this lab, we study a cookies data set analysed via near infrared spectroscopy. Our goal will be to predict the fat percent in the biscuits.
Near infrared spectrometry offers a practical alternative to the time-consuming, wet chemical methods and chromatographic techniques. FT-NIR is non-destructive, requiring no sample preparation or hazardous chemicals, making it quick and reliable for quantitative and qualitative analysis. NIR is ideal for rapid raw material identification and is also a powerful analysis tool capable of accurate multi-component quantitative analysis.
Goals
Fit a regression linear model on data after reduction of dimension
Use cross-validation to compare models
Examples: Ozone_LinearModel_1.py, Ozone_LinearModel_1.R
Questions
1. Download the cookies dataset
2. Because of the large dimension of the data (p=28, n=700), it is not possible to compute beta=(X*X^T)^{-1}(X*Y)
where is the matrix of explanatory variables (plus a column of one) and Y the variable to explain.
a. Find the 10 variables with the with highest correlation with the fat percent variable
and fit a linear model based on this variables
b. Compute a PCA
and fit a linear model on the selected components.
3. Use cross-validation to compare both models/approaches.
Codes:
Ozone_LinearModel_1.py,
Ozone_LinearModel_1.R,
Cookies_LinearModel_1_ToStart.py,
Cookies_LinearModel_1_ToStart.R