Big Data Analysis Algorithms

Domain algorithm development and engineering  

In many of the domains, efficient algorithms will be essential in order to obtain the required efficiency. Thus to a large extent, algorithms work in the domains will focus on engineering algorithms, that is, on using efficient algorithms technology in real applications. For example, consortium researchers have worked extensively on algorithms for efficient movement of data over the internet, including algorithms for reliable transfer of data over unreliable connections using techniques such as loss resilient encoding and distributed computing. Consortium researchers have also worked extensively on so-called I/O-efficient algorithms, which are algorithms designed to minimize transfers of blocks of data between fast main memory and slow external storage such as disk. Since I/O is often the bottleneck when handling Big Data that does not fit in main memory, tremendous speedups can be obtained by designing I/O-efficient algorithms.

The domain work will also require development of new I/O-efficient algorithms, just as new algorithmic challenges without doubt will be exposed during the partnership. For example, the processing of truly big terrain data in the societal data domain will both rely on many recent results on I/O-efficient terrain data processing, and require new I/O-efficient algorithms, such as algorithms for determining which part of an analysis result (such as flood risk estimation) needs to be updated when the underlying terrain data is updated.

Integrating algorithmic techniques with other core areas 

Several challenges in the domains will require the integration of algorithm results with techniques from the other core research areas of machine learning and visual analytics. For example, in the societal data domain we will consider how to produce a hydrologic correction layer efficiently containing features (such as bridges and culverts) that need to be added to or removed from a terrain model in order to facilitate realistic water flow simulation.

Currently, the production is a highly manual process that often requires local knowledge about water flow. Since the lack of corrections often leads to unnatural local flow patterns and since many corrections already have been produced, we believe that with a combination of efficient algorithms for flow computation, learning methods for finding unnatural flow patterns and visual analytics to guide the processs, we will be able to significantly improve the process of producing hydrological corrections.


Our main hypothesis is that uses of algorithm techniques and algorithm engineering can contribute to obtaining the required efficiency for effective Big Data analysis tools both in the other core areas and in the domains. Thus research questions will center around engineering algorithms techniques in the domains and integrating techniques in the core area.