Seminar Details

Date 15-7-2004
Time 15:00
Room/Location 322, sala conferenze, 3 piano , DISI
Title Noisy Text Categorization
Speaker Dott. Alessandro Vinciarelli
Affiliation IDIAP Research Institute, Rue du Simplon 4, 1920 Martigny (Switzerland), vincia@idiap.ch
Link http://www.idiap.ch/~vincia
Abstract This work presents categorization experiments performed over noisy texts. By noisy it is meant any text obtained through an extraction process (affected by errors) from media other than digital texts (e.g. transcriptions of speech recordings extracted with a recognition system). The performance of a categorization system over the clean and noisy (Word Error Rate between ~10 and ~50 percent) versions of the same documents is compared. The noisy texts are obtained through Handwriting Recognition and simulation of Optical Character Recognition. The results show that the performance loss is acceptable and it is especially low for Recall values lower than 60 percent. New measures of the extraction process performance, allowing a better explanation of the categorization results, are proposed.
