**Lab
1 - BINARY CLASSIFICATION AND MODEL SELECTION **

This lab addresses binary classification
and model selection on synthetic data.

The aim of the lab
is
to play with the libraries and to get a practical grasp of what we have
discussed in class. Follow the instructions below.

Think
hard before you call the instructors!

Download file spectral_filters.zip.

Run the file gui_filter.m and a GUI will start. Have a look to the various components.

- With the data simulation option generate a dataset of type "linear" [press "load data" to generate]
- Observe the generated data [the button "plot training/plot test" will allow you to toggle between training and test set]
- choose the "regularized least squares" filter and the "linear" kernel
- have a look to the parameter selection part and the various options of KCV; to choose the regularization parameter "t" you can either choose KCV or set a fixed value
- press the button "run" to perform training and classification; observe the plot of the KCV error and the balance between training and test errors. Also have a look to the plot area on the left where a separation function has appeared [again the button "plot training/plot test" allows you to switch between the two]

Finally, you may want to have a look at the content of directory "dataset_scripts" and in particular to file "create_dataset" that

will allow you to generate data synthetic data of different types.

Part II - analysis

Carry on the following experiments either using the GUI, when it is possible, or writing appropriate scripts.

(1) Generate data of "Linear" type. Considering linear-RLS, observe how the training and test error changes as

- we change (increase or decrease) the regularization parameter
- the training set size grows (try various choices of n in [10:....] as long as matlab supports you!)
- the amount of noise on the generated data grows

(run training and test for various choices of the suggested parameters)

(2) Leaving all the other parameters fixed choose an appropriate range [lambda_min:lambda_step:lambda_max] and plot the training error and the test error for each lambda. Use the KCV option to select the optimal lambda and see how it relates to the previous plot.

(3) Leaving all the other parameters fixed choose an appropriate range [n_min:n_step:n_max] and plot the training and test error (what do you observe as n goes to infty?)

(4) Consider gaussian-RLS and perform parameter tuning in this case -- this time together with lambda you'll have to choose and appropriate sigma

- try for some (sigma, lambda) and compare the obtained (trainin_error, test_error)
- fix lambda and observe the effect of changing sigma
- fix sigma and observe the effect of changing lambda
- do you notice (and if so, when) any overfitting/oversmoothing effect?

(5) If you still have some time compare RLS with nu-method on a kernel of your choice (gaussian is better, why?)

- tune the parameters with KCV
- compare the time needed to obtain a solution (see the Matlab commands tic toc)
- compare the training and test errors