To improve entity ranking by shifting entity detection from indexing time to query time. To create an open source solution for entity ranking in speech sources (with WP8).
Existing entity ranking solutions depend on a pre-defined (and usually short) list of entities detected at indexing time (e.g., person, location, organization, and other). The resulting semantic annotation (or tagging) of the indexed documents could in principle support better results for information needs involving people, locations or organizations. However, the general nature of these entity types is often a poor match with the information need at hand. One approach to handling this problem is to predict what information needs are to be supported, and build task-dependent parsers (taggers) accordingly. This WP also investigates a novel alternative called entity refinement: shift a (partial) entity detection step from indexing to search. The advantage would be that the tagging can be specialized to the information need at hand, using all the additional information known at query time. The challenge here is to tackle two problems: run-time efficiency and availability of training data.
Entity ranking on speech sources, where the most representative features for entity detection are with high likelihood out-of-vocabulary words.
Description of work
- Task 6.1: incrementally trained task-dependent taggers (Textkernel);
- Task 6.2: entity refinement (CWI);
- Task 6.3: entity ranking on speech sources; apply entity refinement to SHoUT automatic speech recognition (CWI, with WP8);
- Task 6.4: dynamic aspect extraction;
- Task 6.5 evaluation; participation in TREC web entity entity ranking (CWI, Teezir);
- Task 6.6 scalability (CWI, with P20).