The ICT challenge of e-Biobanking is to manage the process of scientific discovery that is increasingly reliant on the analysis of large volumes of highly complex data with a large degree of heterogeneity. Because of modern technology, experimental scientific data volumes are becoming increasingly large. Contrary to regular databases, the challenge lies not only in the number of records in the database but also in the complex combination of different data types (e.g. images, measurements / results, survey data and sensor data) from disparate sources. This information explosion gives rise to the need for better analysis and interpretation of data in order to make accurate predictions from it.
But having large amounts of data has both its advantages and its drawbacks. On the positive side, massive data may amplify the inferential power of algorithms that have been shown to be successful on modest-sized data sets. The challenge is to develop the theoretical principles needed to scale inference and learning algorithms to a massive scale. On the negative side, massive data may amplify the errors that are inherent in any inferential algorithm, detecting false positives, relationships that seem causal but that are in fact more coincidental. This noise in the data may become overwhelming, obscuring the structures of interest. There are several challenges, like in management of data: How to efficiently and securely manage the vast amounts of data? How to handle high dimensional, small sample size data? How to exploit the heterogeneity of the data in the analysis? Challenges in discovery: How to robustly learn cause-effect relations from complex data? Challenges in accessibility: How to make the data analysis tools useable for the scientists?
Advanced computer methods combining statistics with optimization are an essential tool to interpret these data. We view building the necessary tools to support the process of scientific exploration as the core target of our research in the COMMIT-project.
The inspiration comes from three main applications in which ICT plays a key role: (1) the detection of cancer in human tissue images viz breast cancer and sarcoma, (2) the generation of knowledge from large human cohort studies, e.g. Emotions from movies, heart disease, genome alterations, ageing processes, and (3) bridging the gap between medical users and advanced ICT resources, e.g. user front-ends for biomedical research and biomedical experiments on distributed infrastructures.