Lab
1 - BINARY CLASSIFICATION AND MODEL SELECTION
This lab addresses binary classification
and model selection on synthetic data.
The aim of the lab
is
to play with the libraries and to get a practical grasp of what we have
discussed in class. Follow the instructions below.
Think
hard before you call the instructors!
At the end
of your work send an email to the instructors answering the questions
raised below in part II.
Download file spectral_filters.zip.
Part I - warm up
Run the file gui_filter.m and a GUI will
start. Have a look to the various components.
- With
the data simulation option generate a dataset of type "linear" [press
"load data" to generate]
- Observe the generated data
[the button "plot training/plot test" will allow you to toggle between
training
and test set]
- choose the "regularized least
squares" filter and the "linear" kernel
- have a look
to the parameter selection part and the various options of KCV; to
choose the regularization parameter "t" you can
either choose KCV or set a fixed value
- press the
button "run" to perform training and classification; observe the plot
of the KCV error and the balance between training and test errors. Also
have a look to the plot area on the left where a separation function
has appeared [again the button "plot training/plot test" allows you to
switch between the two]
Back
on the matlab shell, have a look to the content of directory
"spectral_reg_toolbox". There you will find, among the others, the code
for command "learn" (used for training), "pattrec" (used for testing),
"kcv" (used for model selection on the training set)
Finally,
you may want to have a look at the content of directory
"dataset_scripts" and in particular to file "create_dataset" that
will allow
you to generate data synthetic data of different types.
Part
II - analysis
Carry on the following experiments either
using the GUI, when it is possible, or writing appropriate scripts.
(1) Generate data of "Linear" type.
Considering linear-RLS, observe how
the training and test error changes as
- we
change (increase or
decrease) the regularization parameter
- the
training set size grows (try various choices of n in [10:....] as long
as matlab supports you!)
- the
amount of noise on the generated data grows
(run training and test for
various choices of the suggested parameters)
(2) Leaving all the other parameters
fixed choose an appropriate range [lambda_min:lambda_step:lambda_max]
and plot the training error and the test error for each lambda. Use the
KCV option to select the optimal lambda and see how it relates to the
previous plot.
(3) Leaving all the other parameters
fixed choose an appropriate range [n_min:n_step:n_max] and plot the
training and test error (what do you observe as n goes to infty?)
(4) Consider gaussian-RLS and perform
parameter tuning in this case -- this time together with lambda you'll
have to choose and appropriate sigma
- try for some (sigma,
lambda) and compare the obtained (trainin_error, test_error)
- fix lambda and observe the effect of
changing sigma
- fix sigma and
observe the effect of changing lambda
- do
you notice (and if so, when) any
overfitting/oversmoothing effect?
(5) If you still
have some time compare RLS with nu-method on a kernel of your choice
(gaussian is better, why?)
- tune the
parameters with KCV
- compare the time needed to obtain a solution
(see the Matlab commands tic toc)
- compare the training and test errors