What do climate modeling, astrophysics, and medicine have in common? Big Data. It requires ‘Big’ computers, too big to be just one computer but rather a number of them tied together. Managing all this data and making ‘sense’ out of it puts huge demands on performance, scalability and energy-efficiency of infrastructures. In many cases these high capacity demands are addressed by including one or more types of accelerators to speed up calculations (e.g. Graphics Processing Units), leading to much more heterogeneous systems. Second, virtualization of computer, storage, and network resources increases the flexibility to structure and arrange these resources, known as cloud computing. Virtualization techniques are well known in scientific computing, but there are significant challenges with the application of cloud computing to scientific computing. Third, in recent years industry found a mode to adopt and commercialize cloud computing, which leads to different usage models for computer resources. Tools for scientific computing will have to take into account that computer resources will be spread over a multitude of different parties, computer environments, and interfaces.
The ICT challenge of this project is to ease the management of highly complicated scientific computing infrastructures by effectively shielding the user from the low-level complexity. The project will investigate how to design a programmable e-Science architecture while describing the infrastructure components and optimize them for typical usage scenarios. The architecture will investigate efficient methods to program data-intensive applications on heterogeneous systems and to build workflow-based collaborative problem solving environments. By solving these challenging ICT problems around resource usage and optimization, This project will enable much easier access to e-Infrastructures, despite their growing complexity.
We primarily target our research at the scientific community. We work together with researchers from climate modeling, astrophysics, and medicine that gave us much valuable feedback and helps scientists to bridge the gap to demanding e-Science applications.
The joint research with P23 on distributed reasoning resulted in a cum laude PhD thesis and national recognized (NWO-VENI) award for Jacopo Urbani. The research on climate modelling resulted in an Enlighten Your Research Global award to an international team led by Prof. Henk Dijkstra (Utrecht University) and Dr. Frank Seinstra (Netherlands eScience Center). Another highlight is the two publications we have in the main conference of SC’13, on Exploring Portfolio Scheduling for Long-term Execution of Scientific Workloads in IaaS Clouds and on Scalable Virtual Machine Deployment Using VM Image Caches. Furthermore, the project leader received the Euro-Par Achievement Award in appreciation of the outstanding and sustained contributions to parallel processing in the Netherlands and beyond, including his research on parallel programming environments and his work on the DAS infrastructure.
Biggest results so far
BTWorld: A Large-scale Experiment in Time-Based Analytics
These days, large amounts of data are collected about the operation of many important systems, for instance, traffic systems and the financial system. Extracting meaningful information is very challenging: big data must be processed in time and without error. At TU Delft, for the last four years, we have been collecting data about BitTorrent, a system used by hundreds of millions of people worldwide for sharing videos and other files. For example, musicians use it for the distribution of their work and software developers for the distribution of open source software. More.
ICT science question: despite a large number of empirical and theoretical studies, observing the state of the global information networks remains a grand challenge. The main question we set out to answer was how to reliably analyze large scale time based datasets through different types of queries.
Involved COMMIT/partners: TUDelft.
Predicting the earth’s climate with Graphics Processing Units
In order to predict the earth’s climate, we need to understand the interaction between the atmosphere (air) and the oceans (water). Only at a resolution smaller than two kilometresessential physical phenomena such as ocean eddies are resolved in the ocean models.We develop ways in which climate modellers can use the enormous computing power that they need for high-resolution and long-running modelling. As high-resolution climate models require great computational power, we use Graphics Processing Units (GPUs) to perform the computations. More.
ICT science question: how to optimize data transfers between hosts and GPUs? Real programs contain dozens of kernels, i.e. small computer programs that manage input-output requests. On GPUs, the computational time of these individual kernels can often be optimized and reduced to virtually zero. At that point the transfer times between all these GPU kernels become the next bottleneck. The problem is that there are many different mechanisms for these transfers and the best mechanism depends on details of the algorithm. To solve this problem, we have developed a generic performance model that greatly helps in deciding which mechanism is optimal, thus avoiding the need to implement and measure all alternatives.
Involved COMMIT/partners: IMAU, eScience Center, VU Amsterdam.