In recent years many attempts have been made to design suitable sets of rules aimed at extracting the semantic meaning from plain text, and to achieve annotation, but very few approaches make extensive use of grammars. Current systems are mainly focused on extracting the semantic role of the entities described in the text. This approach has limitations: in such applications the semantic role is conceived merely as the meaning of the involved entities without considering their context. As an example, current semantic annotators often specify a date entity without any annotation regarding the kind of the date itself i.e. a birth date, a book publication date, and so on. Moreover, these systems use ontologies that have been developed specifically for the system’s purposes and have reduced portability. Extensive use of both linguistic resources and semantic representations of the domain are needed in this scenario; the semantic representation of the domain addresses the semantic interpretation of the con- text, while NLP tools can help to solve some linguistic problems related to the semantic annotation, as synonymy, ambiguities, and co-references. A novel framework inspired to Cognitive Linguistics theories is proposed in this work that is aimed at facing the problem outlined above. In particular, our work is based on Construction Grammar (CxG). CxG defines a ”construction” as a form- meaning couple. We use RDF triples in the domain ontology as the ”semantic seeds” to build constructions. A suitable set of rules based on linguistic typology have been designed to infer semantics and syntax from the semantic seed, while combining them as the poles of constructions. A hierarchy of rules to infer syntactic patterns for either single words or sentences using WordNet and FrameNet has been designed to overcome the limitations when expressing the syntactic poles using solely the terms stated in the ontology. As a consequence, semantic annotation of plain text is achieved by computing all possible syntactic forms for the same meaning during the analysis of document corpora. The proposed framework has been finalized to semantic annota- tion of Wikipedia pages; the result is a system for automatic gen- eration of SemanticWeb wiki contents from standard Wikipedia pages, leading to a possible solution of the big challenge to make existing wiki sources semantic wikis.

Pipitone, A., Pirrone, R. (2012). Cognitive Linguistics as the Underlying Framework for Semantic Annotation. In Proc. of the 6th International Conference on Semantic Computing (ICSC 2012) (pp.52-59). Los Alamitos, CA : IEEE Computer Society [10.1109/ICSC.2012.48].

Cognitive Linguistics as the Underlying Framework for Semantic Annotation

PIPITONE, Arianna;PIRRONE, Roberto
2012-01-01

Abstract

In recent years many attempts have been made to design suitable sets of rules aimed at extracting the semantic meaning from plain text, and to achieve annotation, but very few approaches make extensive use of grammars. Current systems are mainly focused on extracting the semantic role of the entities described in the text. This approach has limitations: in such applications the semantic role is conceived merely as the meaning of the involved entities without considering their context. As an example, current semantic annotators often specify a date entity without any annotation regarding the kind of the date itself i.e. a birth date, a book publication date, and so on. Moreover, these systems use ontologies that have been developed specifically for the system’s purposes and have reduced portability. Extensive use of both linguistic resources and semantic representations of the domain are needed in this scenario; the semantic representation of the domain addresses the semantic interpretation of the con- text, while NLP tools can help to solve some linguistic problems related to the semantic annotation, as synonymy, ambiguities, and co-references. A novel framework inspired to Cognitive Linguistics theories is proposed in this work that is aimed at facing the problem outlined above. In particular, our work is based on Construction Grammar (CxG). CxG defines a ”construction” as a form- meaning couple. We use RDF triples in the domain ontology as the ”semantic seeds” to build constructions. A suitable set of rules based on linguistic typology have been designed to infer semantics and syntax from the semantic seed, while combining them as the poles of constructions. A hierarchy of rules to infer syntactic patterns for either single words or sentences using WordNet and FrameNet has been designed to overcome the limitations when expressing the syntactic poles using solely the terms stated in the ontology. As a consequence, semantic annotation of plain text is achieved by computing all possible syntactic forms for the same meaning during the analysis of document corpora. The proposed framework has been finalized to semantic annota- tion of Wikipedia pages; the result is a system for automatic gen- eration of SemanticWeb wiki contents from standard Wikipedia pages, leading to a possible solution of the big challenge to make existing wiki sources semantic wikis.
Settore ING-INF/05 - Sistemi Di Elaborazione Delle Informazioni
set-2012
6th International Conference on Semantic Computing (ICSC 2012)
Palermo, Italy
Sep. 19-21, 2012
2012
8
Pipitone, A., Pirrone, R. (2012). Cognitive Linguistics as the Underlying Framework for Semantic Annotation. In Proc. of the 6th International Conference on Semantic Computing (ICSC 2012) (pp.52-59). Los Alamitos, CA : IEEE Computer Society [10.1109/ICSC.2012.48].
Proceedings (atti dei congressi)
Pipitone, A; Pirrone, R
File in questo prodotto:
File Dimensione Formato  
06337082.pdf

Solo gestori archvio

Descrizione: Articolo principale
Dimensione 609.43 kB
Formato Adobe PDF
609.43 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/76927
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact