|
OVERVIEW
Patterns
represent in a compact and rich in semantics way huge quantity of
heterogeneous data. Due to their characteristics, specific systems are
required for pattern management, in order to model and manipulate patterns,
with a possibly user-defined structure, in an efficient and effective way.
PSYCHO is a customizable system for generating, representing, and
manipulating heterogeneous patterns, possibly user-defined. More precisely,
allows the user to:
- use standard pattern
types or define new ones
- generate or import
patterns, represented according to existing standards
- manipulate possibly
heterogeneous patterns under an integrated environment
PATTERN MODEL
PSYCHO
relies on a pattern model, developed in the context of the European Project
IST/FET Working Group (IST-2001-33058) Patterns
for Next-Generation Database Systems (PANDA, 1/12/2001- 31/5/2003). The
following schema shows the basic components of the model.
The PSYCHO logical model is based on three main concepts: pattern type,
pattern, and class.
- Pattern type. A pattern type gives
a formal description of the pattern structure. It is a record with six
elements: (i) the pattern name n; (ii) the structure schema s, which
defines the structure of the patterns instances of the pattern type;
(iii) the source schema d, which describes the dataset from which patterns,
instances of the pattern type being defined, are constructed; (iv) the
mea- sure schema m, which is a tuple describing the measures which
quantify the quality of the source data representation achieved by the
pattern; (v) the formula f, carrying the semantics of the pattern. f
is a constraint-based formula describing, possibly in an approximated
way, the relation between data represented by the pattern and the
pattern structure. Inside f, attributes are interpreted as free
variables ranging over the components of either the source or the
pattern space; (vi) the validity period schema v, defining the schema
of the temporal validity interval associated with each instance of the
pattern type. Pattern types can also be hierarchically organized.
- Pattern. Patterns are
instances of a specific pattern type. Thus, they are record values
with identifiers containing the proper instantiation of the
corresponding schema elements in the pattern type. In a pattern, the
formula component is obtained from the one in the pattern type by
instantiating each attribute appearing in s with the corresponding
value, and letting the attributes appearing in d range over the source
space. We remark that the data source represents the overall data set
the pattern is related to. On the other hand, the formula represents,
in an intensional and possibly approximated way, the subset of data
represented by the pattern. The extensional set of data exactly
represented by the pattern, when needed, can be stored in the system
as a sort of metadata.
- Class. A class is a set of
semantically related patterns and constitutes the key concept in
defining a pattern query language. A class is defined for a given
pattern type.
Each
pattern type can be associated with one or more mining functions, used to
extract patterns of that type from a given data source, and one or more
measure functions, used to compute the measures associated with a certain
pattern over a given data source.
PATTERN
LANGUAGES
Three
languages for pattern management are supported by PSYCHO:
- Pattern Definition
Language (PDL):
PDL is used for defining new pattern types, classes, mining and
measure functions, used for pattern extraction and synchronization.
- Pattern Manipulation
Language (PML):
PML is used to perform operations such as insertions, extraction,
deletions, updates, synchronization of patterns. Moreover, it allows
the user to insert (remove) a pattern into (from) a certain pattern
class.
- Pattern Query
Language (PQL):
PQL allows the user to query the PBMS in order to retrieve patterns,
possibly using the formula, and correlate them with data they
represent (cross-over queries).
ARCHITECTURE
The
following is the basic PSYCHO architecture, with reference to the used
technologies.
The architecture is composed of three distinct layers:
- Physical layer. The physical layer
contains both the Pattern Base and the Data Source. The Pattern Base
stores pattern types, patterns, and classes; the Data Source stores
all raw data from which patterns have been extracted. It is in general
distributed and various types of repositories (relational, XML, etc.)
can be considered.
- Middle layer. The middle layer,
that we call PBMS Engine, coincides with the kernel of the system, and
it supports all functionalities for pattern manipulation and
retrieval. The PBMS Engine and the Pattern Base represent the core of
the PSYCHO prototype.
- External layer. The external layer
corresponds to a set of user interfaces from which the user can send
requests to the engine and import/export data in other formats.
The
communication between the PBMS engine and the physical layer is performed
in Java. To be more flexible in the implementation of the external layer
modules, communication between the PBMS Engine and the external layer is
established using sockets and implementing requests as serializable
objects. The result is a completely distributed architecture, where the
pattern base, source data, PBMS Engine, and external modules can reside on
different hosts.
What's going on
...
We
are currently following two distinct reseach directions:
·
Moving
towards an open-source PBMS
The
two main steps in this direction are the following:
- From PSYCHO to
PSYCHOlight
PSYCHOlight is primary aimed at
making PSYCHO independent from SicstusProlog. Thus, its basic architecture
is the following:
- From PSYCHOlight to
PSYCHOfree:
PSYCHOfree is primary aimed at
making PSYCHOlight independent from a proprietary DBMS technology (i.e.,
Oracle). PSYCHOfree exploits the open-source DBMS PostgreSQL. Thus, its
basic architecture is the following:
·
Provide
an interoperable solution for pattern management
Similarly
to the interoperability support given for DBMSs through JDBC, we propose an
interoperability solution for PBMSs which relies on a Java API, that we
call JPBC (Java Pattern Base Connectivity) (see the Documents section for technical documents
concerning JPBC).
|