Objective: The goal of this project is to extend the Krimp data mining algorithms to mine the patterns in large Time- Trail Vault data warehouses efficiently, this is also known as trajectory mining.
The key subjective criteria are geographic cohesion and semantic (as denoted by the attributes) meaningfulness, e.g., patterns that describe geo-temporal unrelated events are deemed unimportant. Mining constraints are used to enforce the subjective application criteria. The objective criteria, based on Minimum Description Length principle (MDL), aim to ensure that the data analyst is confronted with statistically significant results, only.
Krimp has a proven track record to select the statistical meaningful item sets from the large set of frequent item sets. Moreover, the resulting Code Table is known to provide a very precise characterization of the underlying data distribution.
The three main challenges of this project are
- Krimp has to be generalized to spatiotemporal data, this includes the non-trivial generalization to real-valued data,
- user defined constraints have to be integrated with the MDL-based framework, together they provide the technical means to derive meaningful patterns, and
- the efficiency and scalability of the developed algorithms are to be shown on seismic data provided by KNMI and managed by MonetDB (WP-2, WP-4, WP-7, WP-8).
The relevance of the discovered patterns is, of course, a driving force in the algorithm development, next to the technical means, joint research with WP-2, WP-3 & WP-4 is integral to this project. That is, the algorithms will be tested and demonstrated on data sets provided by TomTom and integrated by WP-2 & WP-4 in the Time-Trail Vault.