Lab 4

In this lab we play a bit more with real data: digital images

Download the content of this zip file

In it you will find some data (a selection from the MNIST dataset of handwritten digits) and some scripts. Also you will find again a directory "code" which is exactly the one that you should already have --- filter methods.

Follow the instructions below.

The data and the problem

We will address a problem of image classification. The data provided include images of all digits (0 to 9). Data have already been transformed into data matrices (one per each class) by unwrapping the content of each image (pixel values) in simple feature vectors:

D E F -> A B C D E F G H I

You can play with the data in different ways, building different kinds of binary classification problems: e.g. considering data belonging to two different digits (1vs7) or a set of positive (0) and a set of negative (1 to 9) examples. We suggest you try at least one per each type of classification, comparing performances.

Classification will be performed by means of filter methods. An appropriate choice of the kernel may be somehow useful, in particular to capture correlations between neighbouring elements.

Part I - warm up

To start up you may follow the commands suggested in the script commands.m

Instead of running "commands" from the shell, copy and paste its content on the shell one command at the time, to understand what you're doing.

You may experience some memory problems, in this case call us!

see script commands.m


Part II - analysis

There are various things you may try, here's just a list

Choose one or two filter functions (e.g., nu-method and RLS)

Compute a training set and a test set of a given size

(1) With linear learning machine consider the problem of tuning the parameter lambda. 

(2) With gaussian learning machine consider the problem of choosing the right sigma. You may try the function autosigma (have a look at its code). After you've done it we should maybe discuss what you got

see function code/autosigma

(3) for some reasonable lambda-sigma train and test a classifier. Check that you've not overfit by comparing your performances on the training and on the test. then change lambda or sigma until you overfit

(4) use TSVD to get a feeling of the dimensionality of the problem. Evaluate the performances obtained by using the first two components only.