LAB2: SPARSITY-BASED LEARnING

This lab addresses the problem of feature selection within the framework of sparsity based regularization in particular using elastic net regularization.

Follow the instructions below. Think hard before you call the instructors!

At the end of your work send an email to the instructors answering the questions raised below and commenting briefly the results obtained in the PartII-analysis.

Toy problem

We will consider synthetic data generated according to a given probability distribution and affected by noise. We consider a regression problem where the target function is linear. The output is affected by noise and the input is randomly sampled. The number of points, dimensions and relevant features can be predefined.

Part I - warm up

Run the file  e l1l2_gui and the GUI will start.
Have a look at the various components.

• Generate a training set with the default parameters
• press button "run" to start a training phase with the selected L1_par and L2_par parameters and perform testing
• change values for L1_par and L2_par and have a look at test error and number of selected variables
• first set L2_par=0 and vary L1_par trying to obtain a sparser or denser solution. What do you notice?
• repeat the experiment with a L2_par>0. In what ways test error and number of selected features vary?
• now select KCV for L1_par tuning and observe the KCV error curve.
Back to the matlab shell, have a look at the content of directory "PROXIMAL_TOOLBOXES/L1L2_TOOLBOX". There you will find, among the
others, the file l1l2_algorithm (used for variable selection), l1l2_kcv (used for model selection with kcv or loo), and l1l2_pred (for predition on a test set)

Finally, you may want to have a look at file demo_l1l2.m for a complete example of analysis.

Part II - analysis

Carry on the following experiments either with the GUI, when it is possible, by personalizing the file demo_l1l2.m or by writing appropriate scripts.

(1)Prediction: Considering elastic net regularization, observe how the training and test error change

*when we change (increase or decrease) the regularization parameter lambda
*when we change (increase or decrease) the correlation parameter epsilon
* the training set size grows (try various choices of n in [10:....] as long as matlab supports you!)
* the amount of noise on the generated data grows (the test set is generated with the same parameter of the training)

change one parameter at a time!

(2) Selection: Considering elastic net regularization, observe how the number and values of non zero coefficients in the solution change

*when we change (increase or decrease) the regularization parameter lambda
*when we change (increase or decrease) the correlation parameter epsilon
* the training set size grows (try various choices of n in [10:....] as long as matlab supports you!)
* the amount of noise on the generated data grows

(3) Large p and small n: Perform experiments similar to those above changing p (dimension of points), n (number of training points), s (number of relevant variables)

*set p<<n and s>n

*set p>>n and s>n
*set p>>n and s<n