To locate and confine spoken entity references (e.g., named entities but in a broader sense for example also quotes) in very large quantities of unstructured audiovisual content. To accumulate structural, semantic and supra-segmental information that can be used for identifying spoken entities in context. To optimize speech and audio tools for the task characteristics (focus on entities, large data). To identify strategies for audiovisual spoken entity exploration and cross-linking.
Locating and confining spoken entities in context is challenging. Being able to identify these important information markers in AV content, allows fine-grained, facetted access that in turn improves the exploitability of rich AV content significantly. In very large and expanding data sets, named-entity occurrences are hard to predict and tend to be out-of-vocabulary. As named-entities have a foreign or historic origin, their pronunciation may diverge from the pronunciation predicted by pronunciation models for the target language. Connecting progress in spoken term detection with trends in Information Retrieval (e.g., Entity Ranking) is an important next step. By exploiting the potential of various types of information that are available in the audio track, spoken entity location could go beyond pinpointing words or word groups, allowing a broader interpretation of entities including speakers, quotes or affect categories.
Search in multimedia archives
Description of work
In collaboration with Sound and Vision and PCM, a demonstrator will be developed that on the basis of large amounts of textual news and AV data (radio and television) shows the added value of multimedia entity exploitation.
- Task 8.1 Baseline audio and speech processing development;
- Task 8.2 Entity occurrence prediction from metadata and context;
- Task 8.3 Combined ASR Search modeling (1-best, term spotting, lattices, hybrid approaches, OOV recovery);
- Task 8.4 Entities in context;
- Task 8.5 Integration.