PSYCHO Home Page

 

Overview

People

Publications

Related work

documents

Software

 

 

PSYCHO: a Prototype System for Pattern Management

 

OVERVIEW

Patterns represent in a compact and rich in semantics way huge quantity of heterogeneous data. Due to their characteristics, specific systems are required for pattern management, in order to model and manipulate patterns, with a possibly user-defined structure, in an efficient and effective way. PSYCHO is a customizable system for generating, representing, and manipulating heterogeneous patterns, possibly user-defined. More precisely, allows the user to:

  • use standard pattern types or define new ones
  • generate or import patterns, represented according to existing standards
  • manipulate possibly heterogeneous patterns under an integrated environment

PATTERN MODEL

PSYCHO relies on a pattern model, developed in the context of the European Project IST/FET Working Group (IST-2001-33058) Patterns for Next-Generation Database Systems (PANDA, 1/12/2001- 31/5/2003). The following schema shows the basic components of the model.



PSYCHO model



The PSYCHO logical model is based on three main concepts: pattern type, pattern, and class.

  • Pattern type. A pattern type gives a formal description of the pattern structure. It is a record with six elements: (i) the pattern name n; (ii) the structure schema s, which defines the structure of the patterns instances of the pattern type; (iii) the source schema d, which describes the dataset from which patterns, instances of the pattern type being defined, are constructed; (iv) the mea- sure schema m, which is a tuple describing the measures which quantify the quality of the source data representation achieved by the pattern; (v) the formula f, carrying the semantics of the pattern. f is a constraint-based formula describing, possibly in an approximated way, the relation between data represented by the pattern and the pattern structure. Inside f, attributes are interpreted as free variables ranging over the components of either the source or the pattern space; (vi) the validity period schema v, defining the schema of the temporal validity interval associated with each instance of the pattern type. Pattern types can also be hierarchically organized.
  • Pattern. Patterns are instances of a specific pattern type. Thus, they are record values with identifiers containing the proper instantiation of the corresponding schema elements in the pattern type. In a pattern, the formula component is obtained from the one in the pattern type by instantiating each attribute appearing in s with the corresponding value, and letting the attributes appearing in d range over the source space. We remark that the data source represents the overall data set the pattern is related to. On the other hand, the formula represents, in an intensional and possibly approximated way, the subset of data represented by the pattern. The extensional set of data exactly represented by the pattern, when needed, can be stored in the system as a sort of metadata.
  • Class. A class is a set of semantically related patterns and constitutes the key concept in defining a pattern query language. A class is defined for a given pattern type.

Each pattern type can be associated with one or more mining functions, used to extract patterns of that type from a given data source, and one or more measure functions, used to compute the measures associated with a certain pattern over a given data source.

PATTERN LANGUAGES

Three languages for pattern management are supported by PSYCHO:

  • Pattern Definition Language (PDL): PDL is used for defining new pattern types, classes, mining and measure functions, used for pattern extraction and synchronization.
  • Pattern Manipulation Language (PML): PML is used to perform operations such as insertions, extraction, deletions, updates, synchronization of patterns. Moreover, it allows the user to insert (remove) a pattern into (from) a certain pattern class.
  • Pattern Query Language (PQL): PQL allows the user to query the PBMS in order to retrieve patterns, possibly using the formula, and correlate them with data they represent (cross-over queries).

ARCHITECTURE

The following is the basic PSYCHO architecture, with reference to the used technologies.



PSYCHO Architecture



The architecture is composed of three distinct layers:

  • Physical layer. The physical layer contains both the Pattern Base and the Data Source. The Pattern Base stores pattern types, patterns, and classes; the Data Source stores all raw data from which patterns have been extracted. It is in general distributed and various types of repositories (relational, XML, etc.) can be considered.
  • Middle layer. The middle layer, that we call PBMS Engine, coincides with the kernel of the system, and it supports all functionalities for pattern manipulation and retrieval. The PBMS Engine and the Pattern Base represent the core of the PSYCHO prototype.
  • External layer. The external layer corresponds to a set of user interfaces from which the user can send requests to the engine and import/export data in other formats.

 

The communication between the PBMS engine and the physical layer is performed in Java. To be more flexible in the implementation of the external layer modules, communication between the PBMS Engine and the external layer is established using sockets and implementing requests as serializable objects. The result is a completely distributed architecture, where the pattern base, source data, PBMS Engine, and external modules can reside on different hosts.

What's going on ...

We are currently following two distinct reseach directions:

·         Moving towards an open-source PBMS

The two main steps in this direction are the following:

    • From PSYCHO to PSYCHOlight

PSYCHOlight is primary aimed at making PSYCHO independent from SicstusProlog. Thus, its basic architecture is the following:

PSYCHOlight architecture

    • From PSYCHOlight to PSYCHOfree:

PSYCHOfree is primary aimed at making PSYCHOlight independent from a proprietary DBMS technology (i.e., Oracle). PSYCHOfree exploits the open-source DBMS PostgreSQL. Thus, its basic architecture is the following:

PSYCHOfree architecture

·         Provide an interoperable solution for pattern management

Similarly to the interoperability support given for DBMSs through JDBC, we propose an interoperability solution for PBMSs which relies on a Java API, that we call JPBC (Java Pattern Base Connectivity) (see the Documents section for technical documents concerning JPBC).

 

[Overview] [People] [Publications] [Related work] [Documents] [Software]