To open up deep web sources for entity retrieval by identifying the entity types a web source provides, by allowing natural (text search) access to structured data, and by combining entities from diverse deep web sources. WP 7 aims to provide possibilities to share and search information, without the need for the service provider (WCC) to crawl all data.
Recent research in entity retrieval has resulted in effective entity ranking if the task is well- defined as in expert search, or if the data is well-organized as, e.g., in Wikipedia. Well-defined and well- organized data is increasingly available on the web—the prime example being the deep web. The deep web is a large part of the web that cannot be accessed by crawlers: mostly dynamic web pages that are returned in response to a web form. The objectives are met in four steps:
- Personal information sharing evaluates a prototype content management and search system based on WCC's matching system ELISE at UT.
- Deep web entity probing identifies what types of entity a deep web service provides by probing queries. A challenge is to identify exactly what entity types a web service specializes in, but general types are of interest as well, for instance “persons.”
- Database natural abstraction layers opens up a (deep) web service by returning dynamic pages based on text queries or natural language questions, combining closed-domain question answering approaches with open-domain approaches, so as to identify question patterns that a deep web source may answer and automatically translate questions to web forms.
- Entity search aggregation will combine deep web information from several sources in a unified search result. This work will focus on the use of standards like OpenSearch and on extensions to support deep web entity retrieval. In the absence of standardized results we investigate wrapper induction techniques for information extraction.
Application personal information management based on WCC's ELISE system; distributed information retrieval.
Description of work
- Task 7.1: personal information sharing (UT, WCC);
- Task 7.2: deep web entity probing (UT,WCC);
- Task 7.3: database natural abstraction layers (UT, WCC);
- Task 7.4: entity search aggregation (UT, WCC).