| Abstract |
In this talk, we propose a new and efficient method for video text extraction
based on the Laplacian operator. The maximum gradient difference value is
computed for each pixel in the Laplacian-filtered image. K-mean clustering is
then used to classify all the pixels into two clusters: text and non text. For
each candidate text region, the corresponding region in the Sobel edge map of
the input image undergoes projection profile analysis to determine the boundary
of text lines. Finally, we employ heuristics to eliminate false positives
based on geometrical properties.
This method works well for horizontal text lines in video images. However, it
fails to handle non-horizontal text lines, which are quite common in scene
images. Therefore, we make use of connected component and skeleton to extend
the Laplacian method. Each candidate text region is classified as simple or
complex depending on the number of end points and intersection points of its
skeleton. Complex connected components, which contain multiple text lines of
different directions, are then split into smaller simple components. Finally,
we use new features such as principal axis, medial axis and straightness of
the components to eliminate false positives. Experimental results show that
the method works well for text lines of different contrast, fonts, directions
and backgrounds. |