You are here

VSO: Video Search Optimization Using Multimodal Cues

SEALINCMedia (Socially-enriched access to linked cultural media)

Objectives: to develop robust and efficient techniques that optimize the retrieval of multimedia content items in socially-enriched archives based on multimodal information resources.

Optimization is approached by seeking to understand the user’s intent, i.e., the user’s specific purpose or motivation. The foundation of the work package is provided by the creation of intent models. Optimizations during the three different phases of the search workflow (query, index, retrieval functions) will be performed and improvements by exploiting social and multimodal information in an intent-informed manner will be integrated.

Recently reported initial results on optimizing the retrieval of multimedia content items using multimodal cues have shown that there exists great potential to considerably improve retrieval performance. However, apart from a limited number of initial experiments reported so far, not much is known about the true potential of this methodology. There exists surprisingly little information about

  1. the actual intent behind search and its relationship to the optimization of retrieval of multimedia content items and
  2. the potential existent in social network information in this context.

Intent models encode the categories of intent with which users approach video search. We will develop models for capturing intent as well as a set of relevance criteria related to these categories. The criteria encode what it means to a user for a certain item to be relevant in the case of a particular intent. We will develop query refinement and expansion methods that are specific to predicted intent and optimize the query.

Optimizing indexing will involve developing a new set of indexing features that takes into account the possible types of intent as well as information present in social community that will make it possible to respond to user intent. We aim to revise the set of visual concepts as well as the sorts of temporal patterns that are extracted from the video and added to the index.

Optimizing retrieval functions will involve development of “early fusion” strategies (merge indexing features from various sources first and then generate a final results list) and “late fusion” strategies (generate more results lists and then exploit various sources of information to select, merge or refine them via re-ranking). Besides user intent, information sources also include audio and visual information, and information derived from the social community in which the video is embedded.

WP Leader: