Lab 1 - BINARY CLASSIFICATION AND MODEL SELECTION

This lab addresses binary classification and model selection on synthetic data.
The aim of the lab is to play with the libraries and to get a practical grasp of what we have discussed in class. Follow the instructions below.
Think hard before you call the instructors!

At the end of your work send an email to the instructors answering the questions raised below in part II.

Download file spectral_filters.zip.

Part I - warm up

Run the file gui_filter.m and a GUI will start. Have a look to the various components.

With the data simulation option generate a dataset of type "linear" [press "load data" to generate]
Observe the generated data [the button "plot training/plot test" will allow you to toggle between training and test set]
choose the "regularized least squares" filter and the "linear" kernel
have a look to the parameter selection part and the various options of KCV; to choose the regularization parameter "t" you can either choose KCV or set a fixed value
press the button "run" to perform training and classification; observe the plot of the KCV error and the balance between training and test errors. Also have a look to the plot area on the left where a separation function has appeared [again the button "plot training/plot test" allows you to switch between the two]

Back on the matlab shell, have a look to the content of directory "spectral_reg_toolbox". There you will find, among the others, the code for command "learn" (used for training), "pattrec" (used for testing), "kcv" (used for model selection on the training set)

Finally, you may want to have a look at the content of directory "dataset_scripts" and in particular to file "create_dataset" that
will allow you to generate data synthetic data of different types.

Part II - analysis

Carry on the following experiments either using the GUI, when it is possible, or writing appropriate scripts.

(1) Generate data of "Linear" type. Considering linear-RLS, observe how the training and test error changes as

we change (increase or decrease) the regularization parameter
the training set size grows (try various choices of n in [10:....] as long as matlab supports you!)
the amount of noise on the generated data grows

(run training and test for various choices of the suggested parameters)

(2) Leaving all the other parameters fixed choose an appropriate range [lambda_min:lambda_step:lambda_max] and plot the training error and the test error for each lambda. Use the KCV option to select the optimal lambda and see how it relates to the previous plot.

(3) Leaving all the other parameters fixed choose an appropriate range [n_min:n_step:n_max] and plot the training and test error (what do you observe as n goes to infty?)

(4) Consider gaussian-RLS and perform parameter tuning in this case -- this time together with lambda you'll have to choose and appropriate sigma

try for some (sigma, lambda) and compare the obtained (trainin_error, test_error)
fix lambda and observe the effect of changing sigma
fix sigma and observe the effect of changing lambda
do you notice (and if so, when) any overfitting/oversmoothing effect?

(5) If you still have some time compare RLS with nu-method on a kernel of your choice (gaussian is better, why?)

tune the parameters with KCV
compare the time needed to obtain a solution (see the Matlab commands tic toc)
compare the training and test errors