Connecting pieces of informations from heterogeneous sources sharing the same domain is an open challenge in Semantic Web, Big Data and business communities. The main problem in this research area is to bridge the expressiveness gap between relational databases and ontologies. In general, an ontology is more expressive and captures more semantic information behind data than a relational database does. On the other side, databases are the most common used persistent storage system and they grant benefits such as security and data integrity but they need to be managed by expert users. The problem is quite significant above all when enterprise or corporate ontologies are used to share infomations coming from different databases and where a more efficient data management is auspicable for interoperability purposes. The main motivations on this thesis are related to the database access via ontology, as in the OBDA (Ontology Based Data Access) scenario, wich provides a formal specification of the domain close to the human’s view, while technical details of the database are hidden from end-user, and also the persistent storageof ontologies in databases for facilitating search and retrieval, keeping the benefits of database management systems. In these cases the assertion component (A-Box) is usually stored into a database, and terminological one (T-Box) is mantained in an ontology. So it is more necessary to align schemas than matching instances. The term alignment can be used to define the whole process comprising the mapping process between two existent heterogeneous sources, such as ontology and relational database, and the trasformation process from a representation to the other one, such as ontology-to-database and database-to-ontology. Defining mappings manually is an hard task expecially for large and complex data representations and existing methodologies fail in loosing some contents and several elements are left unaligned. In this thesis are discussed various aspects of the alignment in all these senses. The presented techniques are based on a probabilistic approach that fits well on the uncertain alignment process, where are involved two different representations with a different level of expressiveness. In the methodology ontologies and databases are described in terms of Ontology Web Language (OWL) and Entity-Relationship Diagram (ERD) lexical descriptions. So, the ontologies are represented by a set of OWL axioms while a properly defined Context-Free Grammar (CFG) is used to represent ERDs (Entity-Relationship Diagrams) as a set of sentences. Both the OWL → ERD transformation and the mapping rely on HMMs (Hidden Markov Models) to estimate the most likely sequence of ERD symbols observing OWL symbols. In the model definition OWL constructs are the observable states, while the ERD symbols are the hidden states. The tools developed, one for OWL → ERD transformation purpose, called OMEGA (Ontology → Markov → ERD Generator Application) and one for mapping OWL and ERD, called HOwErd (HMM OWL-ERD) own their own GUI interface for showing the alignment results. Finally, HOwErd is compared with the most widespread tools in the reference literature.

Anastasio, F.Probabilistic techniques for bridging the semantic gap in schema alignment.

Probabilistic techniques for bridging the semantic gap in schema alignment

ANASTASIO, Francesca

Abstract

Connecting pieces of informations from heterogeneous sources sharing the same domain is an open challenge in Semantic Web, Big Data and business communities. The main problem in this research area is to bridge the expressiveness gap between relational databases and ontologies. In general, an ontology is more expressive and captures more semantic information behind data than a relational database does. On the other side, databases are the most common used persistent storage system and they grant benefits such as security and data integrity but they need to be managed by expert users. The problem is quite significant above all when enterprise or corporate ontologies are used to share infomations coming from different databases and where a more efficient data management is auspicable for interoperability purposes. The main motivations on this thesis are related to the database access via ontology, as in the OBDA (Ontology Based Data Access) scenario, wich provides a formal specification of the domain close to the human’s view, while technical details of the database are hidden from end-user, and also the persistent storageof ontologies in databases for facilitating search and retrieval, keeping the benefits of database management systems. In these cases the assertion component (A-Box) is usually stored into a database, and terminological one (T-Box) is mantained in an ontology. So it is more necessary to align schemas than matching instances. The term alignment can be used to define the whole process comprising the mapping process between two existent heterogeneous sources, such as ontology and relational database, and the trasformation process from a representation to the other one, such as ontology-to-database and database-to-ontology. Defining mappings manually is an hard task expecially for large and complex data representations and existing methodologies fail in loosing some contents and several elements are left unaligned. In this thesis are discussed various aspects of the alignment in all these senses. The presented techniques are based on a probabilistic approach that fits well on the uncertain alignment process, where are involved two different representations with a different level of expressiveness. In the methodology ontologies and databases are described in terms of Ontology Web Language (OWL) and Entity-Relationship Diagram (ERD) lexical descriptions. So, the ontologies are represented by a set of OWL axioms while a properly defined Context-Free Grammar (CFG) is used to represent ERDs (Entity-Relationship Diagrams) as a set of sentences. Both the OWL → ERD transformation and the mapping rely on HMMs (Hidden Markov Models) to estimate the most likely sequence of ERD symbols observing OWL symbols. In the model definition OWL constructs are the observable states, while the ERD symbols are the hidden states. The tools developed, one for OWL → ERD transformation purpose, called OMEGA (Ontology → Markov → ERD Generator Application) and one for mapping OWL and ERD, called HOwErd (HMM OWL-ERD) own their own GUI interface for showing the alignment results. Finally, HOwErd is compared with the most widespread tools in the reference literature.
Data Integration
Schema Matching
Hidden Markov Model
OWL Ontology
Entity-Relation Diagram
Database
Anastasio, F.Probabilistic techniques for bridging the semantic gap in schema alignment.
File in questo prodotto:
File Dimensione Formato  
Anastasio Francesca - PhdThesis.pdf

accesso aperto

Descrizione: Tesi di dottorato di Anastasio Francesca - Probabilistic techniques for bridging the semantic gap in schema alignment
Dimensione 5.09 MB
Formato Adobe PDF
5.09 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/163384
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact