||322, sala conferenze, 3 piano , DISI
||Noisy Text Categorization
||Dott. Alessandro Vinciarelli
||IDIAP Research Institute, Rue du Simplon 4, 1920 Martigny (Switzerland), firstname.lastname@example.org
||This work presents categorization experiments performed over
noisy texts. By noisy it is meant any text obtained through an
extraction process (affected by errors) from media other than
digital texts (e.g. transcriptions of speech recordings
extracted with a recognition system).
The performance of a categorization
system over the clean and noisy (Word Error Rate between ~10
and ~50 percent) versions of the same documents is compared.
The noisy texts are obtained through Handwriting Recognition
and simulation of Optical Character Recognition.
The results show that the performance loss is acceptable
and it is especially low for Recall values lower than
60 percent. New measures of the extraction process performance,
allowing a better explanation of the categorization results,