2nd STLab Workshop 2012
When 13/03/2012 from 10:00 to 17:00
Where Piaget Room, via San Martino della Battaglia 44, Rome
Info and subscription: email@example.com
This workshop aims at presenting the research program of STLab, the results achieved so far, and related work from other groups with which STLab collaborates. We have planned to have presentations by STLab students plus an introducing keynote presentation.
Welcome and Introduction
- (10:30-11:15) Prof. Fabio Massimo Zanzotto, Università di Roma 3
Machine Learning for Recognizing Textual Entailment
Recognizing Textual Entailment captures a traditional and important problem for Natural Language Processing in an empirical definition. This talk introduces the task as defined in the RTE challenges and describes how machine learning models have been applied to this problem. We analyze limits of simple approaches and we describe the feature spaces of first-order rules that we defined to overcome these limits.
Fabio Massimo Zanzotto is an associate professor at the University of Rome "Tor Vergata". He has been working in building models for robust syntactic parsing and for knowledge acquisition from corpora. In the last years, he worked on textual entailment recognition models where he mainly explored the application of supervised machine learning models to learn inference rules from annotated examples. He participated in the all the Pascal and the NIST Recognizing Textual Entailment challenges with two different systems that explore different models for the use of supervised learning algorithms to the Textual Entailment recognition problem.
- (11:15-11:45) Coffee break
Students' presentations (20 minutes presentation + 10 minutes discussion)
- (11:45-12:15) Silvio Peroni (Ph.D. Student), Università di Bologna
Semantic Publishing: issues, solutions and new trends in scholarly publishing within the Semantic Web era
This work is concerned with the increasing relationships between two distinct multidisciplinary research fields, Semantic Web technologies and scholarly publishing, that in this context converge into one precise research topic: Semantic Publishing. In the spirit of the original aim of Semantic Publishing, i.e. the improvement of scientific communication by means of semantic technologies, this thesis proposes theories, formalisms and applications for opening up semantic publishing to an effective interaction between scholarly documents (e.g., journal articles) and their related semantic and formal descriptions. In fact, the main aim of my work is to increase the users' comprehension of documents and to allow document enrichment, discovery and linkage to document-related resources and contexts, such as other articles and raw scientific data. In order to achieve these goals, I investigate and propose solutions for three of the main issues that semantic publishing promises to address, namely: the need of tools for linking document text to a formal representation of its meaning, the lack of complete metadata schemas for describing documents according to the publishing vocabulary, and absence of effective user interfaces for easily acting on semantic publishing models and theories.
- (12:15-12:45) Francesco Draicchio (Master Student), STLab-ISTC-CNR
FRED: A System for Frame-driven extraction of linked data and ontologies from text
Ontology engineering has become a field of great interest due to the increasing need of domain-specific knowledge bases, that can boost the use of Semantic Web technology either for everyday life or, for instance, to support semantic solutions for large organizations. For building such knowledge resources, state of the art tools for ontology design require a lot of human work. Producing meaningful schemas and populating them with domain-specific data is in fact a very difficult and time-consuming task. Even more if the task consists in modelling knowledge at a web scale. FRED is a novel and flexible methodology for automatically learning ontology from text, lightening the human workload required for conceptualizing domain-specific knowledge and populating the extracted schema with real data, hence speeding up the whole process. Here, computational linguistics (NLP) plays a fundamental role. The main goal of using such technologies is to automatically identify facts from natural language and extract relations among recognized entities to produce linked data, with which knowledge bases can be created or extended. The long-term goal of this study is to collect the larger amount of semantic structured information as possible from the web of crowd sourced data, in order to identify cognitive patterns used by human to organize the knowledge i.e. Knowledge Pattern.
- (12:45-13:15) Andrea Nuzzolese (Ph.D. Student), STLab-ISTC-CNR / Università di Bologna
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage
Linked Data provides the basis for the Semantic Web as an empirical science. In fact, for the first time in the history of the Semantic Web there is the opportunity of performing experiments on large and heterogeneous data sets created by large communities of practice. As an empirical science the Semantic Web should refer to clear research objects. We think that the research objects of the empirical Semantic Web are Knowledge Patterns (KPs). KPs derive from frames as described by Minsky and Fillmore and are small, well connected unit of meanings, which are task-based, well-grounded, and cognitively sound. A clear research direction in empirical Semantic Web is to investigate methods to make KPs emerge. This directly refers to KP extraction over the Semantic Web. So far, we have outlined two possible approaches for KP extraction: (i) the identification and the extraction of KPs from foundational ontologies (e.g. Dolce), frames (e.g. FrameNet), and thesauri. This direction will be referred as the top-down approach. The top-down approach should be the mean for investigating methods for reusing any formal or semi-formal structure and validate it as a KP. In this direction an initial result we have obtained is the extraction of KPs from FrameNet’s frames. (ii) The recognition and discovery of KPs directly from data, e.g., relational databases, Linked Data. This research direction will be referred as the bottom-up approach. Differently from the top-down approach, it should be the mean for investigating methods that allow to discover invariances from data and to validate them as KPs. In this field results have been obtained by analyzing Wikipedia links, i.e., wikilinks. In order to evaluate KP effectiveness in user-oriented tasks we have developed Aemoo, a tool that implements KP-based strategies for allowing users to perform exploratory search on Wikipedia.
- (13:15-14:30) Lunch on site
- (14:30-15:00) Enrico Daga (Ph.D. Student), SI-CNR / Open University / STLab-ISTC-CNR
Towards a theoretical foundation for the harmonization of linked data
While the benefit of reusing linked data is evident, there is a lack of instruments to proof that, given a set of data sources, a task and a mash up process, the resulted graph is the best possible. Can we define a theory for the definition of plans for the integration of linked data whose accuracy is verifiable with respect to the needs of a particular task?
- (15:00-15:30) Alessandro Adamou (Ph.D. Student) STLab-ISTC-CNR / Università di Bologna
Managing multiple virtual ontology networks to scale down reasoning processes.
I will present a software architecture that allows the concurrent management of multiple virtual ontology networks on top of a massive sparse knowledge base. This architecture can combine privileged ontology collectors with volatile sessions, where historical or snapshot data in semantic form can be pushed and fed to knowledge management applications. It can therefore be used to scale down a single knowledge base to a manageable level for reasoners. A reference implementation exists as one of the core knowledge management components of a service platform for content management incubated in Apache. I will illustrate a range of concrete use cases where this approach is being applied, with special attention to its potential impact on interaction with knowledge. I will also present some early results from the adoption of this implementation by small and medium enterprises in the domain of digital libraries and content management.
- (15:30-16:00) Alberto Musetti (Research developer) STLab-ISTC-CNR
Apache Stanbol: an Open Source Framework for Semantic Content Management Systems
Apache Stanbol is an open source software framework that provides a set of reusable components for semantic content management: Any CMS platform can be plugged-in with Stanbol and inherit the capability of extracting, managing, and storing knowledge from content. After describing Stanbol main components, I will show a use case for CMS that can be addressed with Stanbol and that shows strengths and weaknesses of the framework.
- (16:00-17:00) Discussion and Closing