To develop trainable, adaptable Dutch language information extraction technology for named entity recognition, event detection, and time identification, as key components in the project target of Semantic Search. The technology has a broad coverage “default” mode and retrains dynamically to new domains upon being confronted with new (clusters of) news or user-generated data.
So far, most information extraction technologies have been developed with broad coverage in mind. However, in dynamic domains such as news, new texts contain many new events, and introduce relatively many new entities. Adaptation to high degrees of novelty in text is a prerequisite for successful deployment of information extraction technologies. With adaptation comes a need for integration with active time management (and the detection of time expressions), and dynamic semantic networks of world knowledge such as Wikipedia.
Application: Knowledge enrichment (in news and user generated data)
- Task 1.1: dynamic entity recognition (UvT, ANP).
- Task 1.2: dynamic time expression detection and normalization (UvT, ANP).
- Task 1.3: dynamic event detection (UvT). Task 1.4: integration (UvT, ANP).