| Abstract |
Intelligent Web agents and Web information agents need structured data
from disparate Web sources for decision making. Wrapper technology is used
to extract structured data from unstructured or poorly structured Web
pages of continually changing content. In this talk I will give a survey
of the Lixto approach to Web data extraction. Lixto assists a user in
semi-automatically creating wrapper programs by providing a fully visual
and interactive user interface. Lixto wrappers are able to extract deeply
nested XML data structures from HTML pages. Visual user operations on
example pages are directly translated in logical conditions and rules in a
declarative logic-based language. Basic features of this system will be
demonstrated and theoretical results about its expressive power will be
discussed. Time permitting, we will also discuss some more advanced
features of the system and some industrial applications. Papers on Lixto
are available at www.lixto.com (downloads section). This talk describes
joint work with Robert Baumgartner, Sergio Flesca, Marcus Herzog, and
Christoph Koch. |