Lab 2 - SPECTRAL FILTERS AND MULTI-CLASS CLASSIFICATION
This lab is about binary and multiclass classification and model selection on synthetic as well as real data, focusing on the role and on the properties of the spectral filters.
Follow the instructions below. Think hard before you call the instructors!
Download file releaseLab2 - this file includes all the code you need!
Overture: warm up
Run the file gui_filter.m and a GUI will start. Have a look at the various components.
With the data simulation option generate a dataset of type "spiral" [press "load data" to generate]
Observe the generated data [the button "plot training/plot test" will allow you to toggle between training and test set]
choose the "Truncated SVD" filter and the "Gaussian" kernel [be sure to check the "autosigma" checkbox as "on"].
have a look to the parameter selection part and the various options of KCV; to choose the regularization parameter "t" [you can either choose KCV or set a fixed value]
Press the button "run" to perform training and classification.
For more informations about the parameters and the usage of those scripts, type:
help learn
help patt_rec
help kcv
Finally, you may want to have a look at the content of directory "dataset_scripts" and in particular to file "create_dataset" that will allow you to generate data synthetic data of different types.
Allegro con brio: Analysis
Carry on the following experiments using the GUI.
(1) Generate data of "Spiral" type. Considering three algorithms, namely RLS, Truncated SVD and NU-Method. Observe how the training and test error changes as
we change (increase or decrease) the cardinality of the training set. For instance: [10,100,1000, as long as matlab supports you!]
we change (increase or decrease) the amount of regularization for each algorithm. For instance: "fixed value".
we change the amount of wrong labels on the generated data grows. For instance: [0.01,0.02,0.05,0.1]
Run training and test for various choices of the suggested parameters.
(2) Leaving all the other parameters fixed, use the KCV option to select the optimal model and see how it relates to the previous plot. Choose an appropriate range for the regularization parameters and their number, and plot the training error and the test error for each regularization parameter.
(3) Leaving all the other parameters fixed choose an appropriate range [n_min:n_step:n_max] and plot the training and test error: what do you observe as n goes to infinity? How the different regularization parameters affect the learning process? Which are the main differences in terms of regularization between the methods?
Crescendo: Advanced Analysis
Carry on the following experiments either using the GUI or the command line interface. In this part you have to focus more on the effects of the regularization and on the correct choice of sigma.
(4) Use the Gaussian kernel and perform parameter tuning - this time together with the regularization parameter you'll have to choose an appropriate sigma
try for some (sigma, reg. par.) and compare the obtained (train error, test error)
fix the regularization parameter and observe the effect of changing sigma
fix sigma and observe the effect of changing lambda
do you notice (and if so, when) any overfitting/oversmoothing effect?
(5) Compare RLS with nu-method on a kernel of your choice:
tune the parameters with KCV
compare the time needed to obtain a solution
compare the training and test errors
Finale: Challenge
The challenge consists in a learning task using a real dataset, namely "USPS". This dataset contains a number of handwritten digits images. The problem is to train "the best classifiers" that are able to discriminate between digits "3", "8" and "0".
Have a look at the script "demo_lab2.m". This script contains a code snippet to perform a multi-class classification task using the previously presented MATLAB scripts (see "Interlude").
You should understand what the scripts are supposed to do, and train the classifiers in order to perform a One vs. All classification for all the combination of the digits "3", "8" and "0".
Once the classifiers are trained, the model must be exported in a matrix file by means of the "save_challenge_2.m" script (to see how to use it please try the command 'help save_challenge_2').
By the end of the challenge session you should submit the result of your script by using the link: http://www.dropitto.me/regmet with password regmet2013. The result file is a matlab matrix file named name-surname.mat. The results will be presented during the next class. The score of the challenge is based on the accuracy of the classifiers obtained on an independently sampled test set.