### Machine Learning

See also

Course material,

Other LABS,

Course projects
#### LABS

**3. Model linear (I)**
Context

In this lab, we study a cookies data set analysed via near infrared spectroscopy. Our goal will be to predict the fat percent in the biscuits.
Near infrared spectrometry offers a practical alternative to the time-consuming, wet chemical methods and chromatographic techniques. FT-NIR is non-destructive, requiring no sample preparation or hazardous chemicals, making it quick and reliable for quantitative and qualitative analysis. NIR is ideal for rapid raw material identification and is also a powerful analysis tool capable of accurate multi-component quantitative analysis.

Goals

Fit a regression linear model on data after reduction of dimension
Use cross-validation to compare models

Examples: Ozone_LinearModel_1.py, Ozone_LinearModel_1.R

Questions

1. Download the cookies dataset

2. Because of the large dimension of the data (p=28, n=700), it is not possible to compute beta=(X*X^T)^{-1}(X*Y)
where is the matrix of explanatory variables (plus a column of one) and Y the variable to explain.

a. Find the 10 variables with the with highest correlation with the fat percent variable
and fit a linear model based on this variables

b. Compute a PCA
and fit a linear model on the selected components.

3. Use cross-validation to compare both models/approaches.

Codes:

Ozone_LinearModel_1.py,

Ozone_LinearModel_1.R,

Cookies_LinearModel_1_ToStart.py,

Cookies_LinearModel_1_ToStart.R