LAB2: SPARSITY-BASED LEARnING

This lab addresses the problem of feature selection within the framework of sparsity based regularization in particular using elastic net regularization.

Follow the instructions below. Think hard before you call the instructors! 

At the end of your work send an email to the instructors answering the questions raised below and commenting briefly the results obtained in the PartII-analysis.

Download the zip file l1l2.zip

Toy problem

We will consider synthetic data generated according to a given probability distribution and affected by noise. We consider a regression problem where the target function is linear. The output is affected by noise and the input is randomly sampled. The number of points, dimensions and relevant features can be predefined.

Part I - warm up

Run the file  e l1l2_gui and the GUI will start. 
Have a look at the various components.

Back to the matlab shell, have a look at the content of directory "PROXIMAL_TOOLBOXES/L1L2_TOOLBOX". There you will find, among the
others, the file l1l2_algorithm (used for variable selection), l1l2_kcv (used for model selection with kcv or loo), and l1l2_pred (for predition on a test set)

Finally, you may want to have a look at file demo_l1l2.m for a complete example of analysis.

Part II - analysis

Carry on the following experiments either with the GUI, when it is possible, by personalizing the file demo_l1l2.m or by writing appropriate scripts.

(1)Prediction: Considering elastic net regularization, observe how the training and test error change

*when we change (increase or decrease) the regularization parameter lambda
*when we change (increase or decrease) the correlation parameter epsilon
* the training set size grows (try various choices of n in [10:....] as long as matlab supports you!) 
* the amount of noise on the generated data grows (the test set is generated with the same parameter of the training)

change one parameter at a time!

(2) Selection: Considering elastic net regularization, observe how the number and values of non zero coefficients in the solution change

*when we change (increase or decrease) the regularization parameter lambda
*when we change (increase or decrease) the correlation parameter epsilon
* the training set size grows (try various choices of n in [10:....] as long as matlab supports you!)
* the amount of noise on the generated data grows

(3) Large p and small n: Perform experiments similar to those above changing p (dimension of points), n (number of training points), s (number of relevant variables)

*set p<<n and s>n

*set p>>n and s>n
*set p>>n and s<n