Date 21-3-2006
Time 12:00
Room/Location Sala conferenze, DISI, num. 322 - 3 piano
Title Abstraction Networks for Controlled Terminologies
Speaker Dr. Michael Halper , Professor, Department of Mathematics & Computer Science
Affiliation Department of Mathematics & Computer Science , Kean University Union, NJ 07083-0411 USA
Abstract A controlled terminology is a structure that houses knowledge from some domain, such as biomedicine, in the form of concepts, subsumption (IS-A) links, and semantic relationships. Controlled terminologies have been variously referred to as vocabularies, ontologies, or terminological knowledge bases. Among the primary benefits of such systems are their support for information sharing and integration, decision-support, and ad hoc querying of domain knowledge. Controlled terminologies have found widespread acceptance and usage within the biomedical community. Examples include the National Cancer Institute Thesaurus (NCIT), developed as part of NCI's Enterprise Vocabulary Services project, and the Systematized Nomenclature of Medicine, Clinical Terms (SNOMED CT), developed through a joint effort of the College of American Pathologists and the UK's National Health Service. While controlled terminologies have proven to be invaluable resources, they do, however, tend to be very large and complex. For example, NCIT currently comprises over 42,000 concepts, while SNOMED CT has more than 360,000 concepts. This poses serious problems for users and maintenance personnel alike. In particular, quality assurance can be difficult. In this talk, I present automated techniques for partitioning a controlled terminology into smaller groups of concepts based on relationship patterns and subsumption groupings. Two abstraction networks, called the area taxonomy and the p-area taxonomy, are derived from the partitions. The high-level views afforded by these abstraction networks form the basis for systematic auditing. For example, the taxonomies tend to highlight concept errors that manifest themselves as irregularities at the abstract level. Among the kinds of errors are conceptual ambiguity, omission of concepts, and concept misclassification. The partitioning and auditing methodologies are demonstrated on one of NCIT's top-level hierarchies. Errors discovered during the auditing process are presented.
