Lab 2 - SPECTRAL FILTERS AND MULTI-CLASS CLASSIFICATION

This lab is about binary and multiclass classification and model selection on synthetic as well as real data, focusing on the role and on the properties of the spectral filters.

Follow the instructions below. Think hard before you call the instructors!

Download file releaseLab2 - this file includes all the code you need!

Overture: warm up

Run the file gui_filter.m and a GUI will start. Have a look at the various components.

Interlude: The Geek Part Back on the matlab shell, have a look to the content of directory "spectral_reg_toolbox". There you will find, among the others, the code for command "learn" (used for training), "pattrec" (used for testing), "kcv" (used for model selection on the training set).

For more informations about the parameters and the usage of those scripts, type:

help learn

help patt_rec

help kcv

Finally, you may want to have a look at the content of directory "dataset_scripts" and in particular to file "create_dataset" that will allow you to generate data synthetic data of different types.

Allegro con brio: Analysis

Carry on the following experiments using the GUI.

(1) Generate data of "Spiral" type. Considering three algorithms, namely RLS, Truncated SVD and NU-Method. Observe how the training and test error changes as

Run training and test for various choices of the suggested parameters.

(2) Leaving all the other parameters fixed, use the KCV option to select the optimal model and see how it relates to the previous plot. Choose an appropriate range for the regularization parameters and their number, and plot the training error and the test error for each regularization parameter.

(3) Leaving all the other parameters fixed choose an appropriate range [n_min:n_step:n_max] and plot the training and test error: what do you observe as n goes to infinity? How the different regularization parameters affect the learning process? Which are the main differences in terms of regularization between the methods?

Crescendo: Advanced Analysis

Carry on the following experiments either using the GUI or the command line interface. In this part you have to focus more on the effects of the regularization and on the correct choice of sigma.

(4) Use the Gaussian kernel and perform parameter tuning - this time together with the regularization parameter you'll have to choose an appropriate sigma

(5) Compare RLS with nu-method on a kernel of your choice:

Finale: Challenge

The challenge consists in a learning task using a real dataset, namely "USPS". This dataset contains a number of handwritten digits images. The problem is to train "the best classifiers" that are able to discriminate between digits "3", "8" and "0".

Have a look at the script "demo_lab2.m". This script contains a code snippet to perform a multi-class classification task using the previously presented MATLAB scripts (see "Interlude").

You should understand what the scripts are supposed to do, and train the classifiers in order to perform a One vs. All classification for all the combination of the digits "3", "8" and "0".

Once the classifiers are trained, the model must be exported in a matrix file by means of the "save_challenge_2.m" script (to see how to use it please try the command 'help save_challenge_2').

By the end of the challenge session you should submit the result of your script by using the link: http://www.dropitto.me/regmet with password regmet2013. The result file is a matlab matrix file named name-surname.mat. The results will be presented during the next class. The score of the challenge is based on the accuracy of the classifiers obtained on an independently sampled test set.