annual report 2014 final web - hasso-plattner-institut · annual&report&2014&!!!! &...

Annual Report 2014

Enterprise Platform and Integration Concepts

Research Group of Prof. Dr. Hasso Plattner

Hasso Plattner Institute August-‐Bebel-‐Str. 88

14482 Potsdam

http://epic.hpi.de

Annual Report | 2014

Enterprise Platform and Integration Concepts | Hasso Plattner Institute

Contact

Dr. Matthias Uflacker Hasso Plattner Institute August-‐Bebel-‐Str. 88 14482 Potsdam, Germany Tel.: +49 (331) 5509-‐566 E-‐Mail: [email protected]



Table of Contents

1. OUR TEAM ................................................................................................................................. 1

2. RESEARCH AREAS AND SELECTED PROJECTS .............................................................................. 4 2.1 IN-‐MEMORY DATA MANAGEMENT FOR ENTERPRISE SYSTEMS ................................................................ 4 2.2 TOOLS AND METHODS FOR ENTERPRISE SYSTEMS DESIGN AND ENGINEERING ........................................... 9 2.3 IN-‐MEMORY DATA MANAGEMENT FOR LIFE-‐SCIENCE .......................................................................... 14

3. SUPERVISED MASTER THESES ................................................................................................... 16

4. COMPLETED PH.D. DISSERTATIONS ......................................................................................... 17

5. PUBLICATIONS ........................................................................................................................ 20 5.1 BOOKS ......................................................................................................................................... 20 5.2 BOOK CHAPTERS ........................................................................................................................... 21 5.3 JOURNAL ARTICLES ........................................................................................................................ 21 5.4 CONFERENCE CHAPTERS ................................................................................................................. 21 5.5 WORKSHOP ARTICLES ..................................................................................................................... 23

6. TEACHING ............................................................................................................................... 24 6.1 SUMMER TERM 2014 .................................................................................................................... 24 6.2 WINTER TERM 2014/2015 ............................................................................................................ 24 6.3 ME310 GLOBAL TEAM-‐BASED PRODUCT INNOVATION & ENGINEERING .................................................. 25 6.4 OPENHPI ..................................................................................................................................... 27

7. EVENTS, SPEECHES, AND PRESENTATIONS ............................................................................... 28

8. INDUSTRY PARTNERSHIPS ........................................................................................................ 31

9. ACADEMIC PARTNERSHIPS ....................................................................................................... 31

1 Annual Report | 2014

1 Enterprise Platform and Integration Concepts | Hasso Plattner Institute

1. OUR TEAM Chair Chair Representative

Prof. Dr. h.c. mult. Hasso Plattner

Dr. Matthias Uflacker

Assistant

PostDoc Researchers

Dr. Mariana Neves

Dr. Matthieu-‐P. Schapranow

Natural Language Processing in In-‐Memory Database

In-‐Memory Data Management for Life Science Applications

Research Assistants

Martin Boissier Lars Butzmann Data Tiering and Access Based Data Partitioning

Business Simulations using In-‐Memory Databases

Andrea Lange

Annual Report | 2014 2

Enterprise Platform and Integration Concepts | Hasso Plattner Institute 2

Martin Faust Cindy Fähnrich Indices for In-‐Memory Column Stores

Application of In-‐Memory Database Technology to Population-‐Specific Genome Data Analysis

Franziska Häger

Stefan Klauck

Design Thinking in Software Development Processes

Generic What-‐If-‐Analyses Using In-‐Memory Column Stores

Thomas Kowark

Martin Lorenz

Query-‐Level Replication of Software Repository Analyses

The Impact of Column-‐Orientation on the Quality of Class Inheritance Mapping

Carsten Meyer Stephan Müller Dynamic Data Tiering for Mixed-‐Workload In-‐Memory Databases

Aggregates Caching for Enterprise Applications



Keven Richly David Schwalb Geo-‐Spatial Analyses on In-‐Memory Column Stores

Leveraging Non-‐Volatile Memory Technologies for In-‐Memory Column Stores

Christian Schwarz

Ralf Teusner

Predictive Analytics on In-‐Memory Databases

Teaching Software Development in Massive Open Online Courses

Arian Treffer Omniscient Debugging in Database Applications Student Assistants Aechtner, Sten Berning, Tim Brauer, Janos Franke, Alexander Hopstock, Michael Horschig, Siegfried Jankrift, Marcel Klauck, Stefan Kotschenreuther, Leo Matthies, Christoph Rehbein, Cornelia Bärtig, Andrea Bock, Cornelius Enderlein, Jonas

Frick, Jakob Höroldt, Carolin Ihrke, Sebastian Kastius, Alexander Kohlhagen, Marco Lehmann, Sven Marten, Jannik Reißaus, Benjamin Benson, Lawrence Bothe, Max Flemming, Pedro Hesse, Hubert Horschig, Friedrich Jacoby, Janusch

Keller, Marvin Koßmann, Jan Liedke, Franz Nack, Tobias Ruhrländer, Rui Paulo Schmidt, Christopher Wolff, Felix Matysik, Jan-‐Tobias Schulze, Alexander Dreseler, Markus Schumann, David Wacke, Markus Illi, Cornelius



2. RESEARCH AREAS AND SELECTED PROJECTS

Our research activities focus on the principles of in-‐memory data management for enter-‐prise systems and the integration of different software systems to meet customer requirements. This involves studying the conceptual and technological aspects of systems for data management and business process support. In customer-‐centered business software development, the focus is on the users. Developing solutions tailored to user needs in a timely manner requires well-‐designed tools and methods for enterprise system design and engineering. We apply our findings to real-‐world scenarios and showcase future enterprise applications by developing and evaluating functional prototypes closely together with our industry partners. One particular domain of interest is the application of in-‐memory data management in life sciences and eHealth systems. 2.1 In-‐Memory Data Management for Enterprise Systems The traditional market division into online transaction processing (OLTP) and online analytical processing (OLAP) has been justified by different workloads of both systems. While OLTP workloads are characterized by a mix of reads and writes of a few rows at a time, OLAP applications are characterized by complex read queries with joins and large sequential scans spanning few columns but many rows of the database. Those two workloads are typically addressed by separate systems: transaction processing systems and business intelligence or data warehousing systems. Our research investigates the re-‐unification of enterprise architectures, uniting transactional and analytical systems to significantly reduce application complexity and data redundancy, to simplify IT landscapes, and to enable real-‐time reporting on the transactional data. The following figure outlines our proposed system architecture.

It is a common belief that a columnar data layout is not well suited for transactional processing and should mainly be used for analytical processing. We postulate that this is not



the case as column-‐based system architectures can even be superior for transactional business processing if a data layout without transaction maintained aggregates is chosen. By dropping all transaction-‐maintained aggregates, indices, and other redundant data structures, we can significantly simplify data entry transactions. There is no need to update additional summarization tables or secondary indices. Consequently, the only necessary steps for the booking of a vendor invoice are the inserts of the accounting document header and its line items as depicted in the following graphic.

Although single inserts into a column store generally take longer, the reduction of complexity eliminates most of the work during data entry and results in significant performance advantages on the simplified data schema. In turn, data entry becomes actually faster on an in-‐memory column store. We measured the runtime of both transactions in a productive setting, finding the simplified data entry transaction on an in-‐memory column store to be 2.5 times faster than the classic data entry transaction on a disk-‐based row-‐store.

In summary, our research investigates the impact of a redundancy-‐free, column-‐based architecture without transaction-‐maintained aggregates on the way enterprise applications are being built. This includes a dramatic simplification of applications, a reduced data footprint, and advanced partitioning techniques based on classifying data into actual and historical.

Selected Publications in this Research Area in 2014 ! Hasso Plattner, Martin Faust, Stephan Müller, David Schwalb, Matthias Uflacker,

Johannes Wust: The Impact of Columnar In-‐Memory Databases on Enterprise Systems, VLDB, 2014

! Hasso Plattner: A Course in In-‐Memory Data Management: The Inner Mechanics of In-‐Memory Databases, Second Edition, ISBN: 978-‐3-‐642-‐55269-‐4, 2014

! Stephan Müller, Lars Butzmann, Stefan Klauck, Hasso Plattner: Materialized View Maintenance Leveraging In-‐Memory Data Structures, International Journal On Advances in Software, vol. 7, no. 3&4, 2014

! Christian Tinnefeld, Donald Kossmann, Joos-‐Hendrik Boese, Hasso Plattner: Parallel Join Executions in RAMCloud, CloudDB -‐ In conjunction with ICDE 2014, 2014



Project Highlight -‐ Bachelor Project 2013/14 Enterprise Workload Analysis for Hot and Cold Data Classification In this project, students addressed the topic of classifying data into hot and cold by analyzing production workload traces. The goal of the project was to get a better understanding of data relevance in a realistic system. Therefore, the ERP system of a large global company was traced for several days, resulting in 50 GB of raw query logs. A framework called EWA (Explorative Workload Analyzer) was implemented to visualize the massive workload traces in a meaningful way. With the help of this framework, we could not only interactively analyze queries and find access patterns in a productive workload. We were furthermore able to execute a full replay on a copy of the production system. With the help of this replay we were able to tell for every single tuple in the database when and how often it has been accessed. Ultimately, this allowed us to analyze which tuples were most relevant and answer questions like `is the date of a tuple correlating with its relevance?’. A SAP UI5 frontend was developed to allow the easy exploration of workload characteristics based on the raw trace data consisting of billions of tuples.

1

Project Highlight -‐ Master Project 2014/15 HANA Load Simulator The HANA Load Simulator is a tool that generates a realistic enterprise workload of thousands of concurrent users and executes that workload on different database configurations simultaneously. A dashboard monitors several performance indicators of each database, incl. data footprint, transaction latencies, throughput, and overall CPU utilization. The dashboard can also be used to configure several workload parameters like OLTP and OLAP query frequencies or the ratio of actual and historical queries. This provides a simple and interactive tool to assess key performance characteristics of different database setups (e.g., single-‐ vs. multi-‐node) side-‐by-‐side and in real-‐time.



2

Using the Load Simulator, we compared a) a single HANA node with b) a multi-‐node setup consisting of a master node (actual data only), one replica node of the master for running OLAP transactions, and a cold node for historical data. Both setups have an equal total amount of cores and main memory. The workload consists of three types of transactions: invoice postings, read-‐only transactions incl. OLTP queries, and OLAP transactions incl. read-‐heavy analytical queries. With the partitioning into actual and historical and replication of the actual data we could see the following improvements (90% actual-‐only OLAP transactions, 100% actual-‐only OLTP transactions, 1% queries being analytical):

Improved performance:

• Transactional processing is improved even without the use of a replica due to the smaller data set. Activating the replica, the multi-‐node setup is faster by a factor of ~4 for mixed workloads.

• The higher the skew tends towards an actual-‐only workload, the more the partitioned system outperforms the traditional setup.

• When adding analytical users to the system, a replica of the actual master node significantly lowers the latency of OLTP transactions compared to the single HANA setup due to better load distribution.

Reduced costs:

• Historical data can be purged and better compressed, thus decreasing the memory footprint, resulting in less main memory usage.

• Overall system costs decrease as smaller servers can be deployed, hence avoiding disproportional prices for large server systems.

The screen shows OLTP latency, OLAP latency, and CPU load for both setups with 10000 transactional users and 500 analytical users. The single-‐node setup on the left shows higher OLTP latencies and significantly

higher OLAP latencies, both violating the SLA (service level agreement) thresholds. The partitioned setup on the right shows significantly better latencies while using only half of its CPU resources.



2.1.1 SSICLOPS – Scalable and Secure Infrastructures for Cloud Operations Starting in February 2015, our research group – together with the research group of Prof. Polze and a consortium of 12 academic and industry partners from 7 countries – will participate in a three-‐year collaboration project titled SSICLOPS – Scalable and Secure Infrastructures for Cloud Operations. The project is funded by the European Commission under the Horizon2020 program. The SSICLOPS project focuses on techniques for the management of federated private cloud infrastructures, in particular cloud networking techniques within software-‐defined data centers and across wide-‐area networks. SSICLOPS will empower enterprises to create and operate high-‐performance private cloud infrastructure that allows flexible scaling through federation with other private clouds without compromising on their service level and security requirements. SSICLOPS federation will support the efficient integration of clouds, no matter, if they are geographically collocated or spread out, belong to the same or different administrative entities or jurisdictions: in all cases, SSICLOPS will enforce legal and security constraints and minimize the overall resource consumption. In such a federation, individual enterprises will be able to dynamically scale in/out their private cloud services. This allows maximizing own infrastructure utilization while minimizing excess capacity needs for each federation member. The project will design, implement, demonstrate, and evaluate three specific use cases, namely a cloud-‐based in-‐memory database, the analysis of physics experiment data, and the prototypical extension of network stacks for a telecom provider.

Partners and countries of the SSICLOPS consortium.



2.1.2 In-‐Memory Research Laboratory The In-‐Memory Research Laboratory supports all research activities on main-‐memory, multi-‐core and coprocessor technology at our research group. The lab offers physical and virtual resources in order to build a solid foundation for experiments and teaching activities around the topic of in-‐memory databases and enterprise applications. We currently maintain a pool of 50 high-‐end servers of different generations and one petabyte of permanent storage. In 2014, our team made several improvements in terms of manageability and availability of those resources. We rolled out a solution for automatic configuration, installation, and maintenance for physical and virtual servers. User accounts are now handled globally, reducing the time to initial system availability to a minimum. Resources to provide a more flexible testing environment have been increased as well. In November 2014 we installed an SGI UV300 for HANA, a system with 240 enterprise level CPU cores, 12 TB of main-‐memory and 50 TB of high-‐performance permanent storage. The machine has been integrated into the existing landscape and was ready to use within one day. The machine operates for several NUMA as well as in-‐memory database and predictive application related research projects.

In-‐Memory Research Laboratory: Bringing the SGI UV300 for HANA into service.

2.2 Tools and Methods for Enterprise Systems Design and Engineering We consider the balance of technological, business, and human factors to be the driver for innovation. Software development often lacks the required emphasis on human values, even though a shift towards more user-‐centricity is noticeable in many companies. Therefore, we are focusing on the influence of the human element in software engineering. We observe, analyze, and understand how individuals, teams, and organizations work and in which ways tools and processes can support them to create better software outcomes for increased user experience. We consider the impact of in-‐memory technology also to be a driver for more



efficient application programming, intelligent tools, and user-‐friendly software, ultimately influencing the way we will design, develop and operate business applications in the future. 2.2.1 Data-‐ and Performance-‐Aware Development of Business Applications on SAP HANA The Bachelor project „Modern Computer-‐aided Software Engineering“ has tested novel concepts to ease the process of creating and debugging enterprise applications built on an in-‐memory database. Students have built a prototype as an extension of a web-‐based development environment that integrates database logic and application code in one single view. Developers get immediate detailed information about queries, such as query plans, estimated performance measures and result set sizes, while writing and modifying code. Those estimations help spotting potential performance bottlenecks early in the development process and thus prevent cost-‐intensive changes later on. Using sampling and clustering approaches, the estimations on query performance are available after fractions of a second. A visual representation of the program flow simplifies program comprehension and effects of code changes. Early results and feedback from professional developers is promising and emphasizes the need for more intelligent, reactive development environments. 2.2.2 Code Better, Run Faster: Tools for Performance-‐Driven Enterprise Application Development In this Master project, we addressed the problem of writing application logic that performs efficiently on columnar main memory databases. The research was based on the open-‐source database Hyrise, which on one hand allowed us to integrate new functionality easily and on the other hand offered access to key performance metrics. Leveraging expected result set sizes, number of cache misses, core utilization, used CPU cycles, and total execution times, our improved Hyrise development environment supported developers in improving query logic and response times. Custom operators can be programmed directly in the browser-‐based IDE using JavaScript, providing a flexible possibility to integrate custom operators in complex query execution plans. The team implemented intuitive visualizations for query result sizes, previews of the expected result sets, and breakdowns of the total query runtimes to identify performance intensive tasks. Based on a tempo-‐spatial dataset of multiple soccer matches, analytical operators were implemented to detect offside situations. The operators were optimized with the developed tools and it was allowed for highly performing scans of the whole data set. 2.2.3 Object-‐Relational Mapping Strategies for New Enterprise Applications Specialization/Generalization relationships are a common pattern in enterprise system domain models. In object-‐oriented programming, such relationships can be expressed as



inheritance between entities. Persisting entities of the domain model that are part of an inheritance relationship is not trivial. Research has proposed three different strategies to map inheritance structures to relational data models. What they have in common is an inherent trade-‐off between memory consumption and query performance. Depending on the actual characteristics of the inheritance structure at hand, each strategy has its strengths and weaknesses. Consequently, the combination of inheritance characteristics and prioritization of non-‐functional requirements (memory consumption and query performance) determines the strategy to implement. Unfortunately, not all characteristics of the inheritance hierarchy can be defined in advance. In this ongoing research project, we look at how column-‐orientation as a means to physically structure data in memory influences the determination of the best mapping strategy for a given data model. 2.2.4 Omniscient Debugging in Database Applications Omniscient debugging is an approach to improve the efficiency of debugging activities, thereby increasing overall developer productivity. While a regular debugger can only show the program state at the current point in time and allows the developer to move execution forward, an omniscient debugger (ODB) can immediately produce the state of any point in time and allows the developer to move in any direction through the execution. In this research project, we want to bring omniscient debugging into the database layer. Typical ODB implementations record every execution step to be able to reproduce previous program states. However, with a stored procedure that touches millions of tuples, this is not feasible. Instead, we only trace scalar variables and leverage the speed of in-‐memory databases to quickly reproduce query results on demand. Combining these dynamic and static analysis techniques, we aim to build a debugger with useful visualizations that help the developer to understand program and data flow in complex stored procedures. Selected Publications in this Research Area in 2014 ! Martin Lorenz, Johannes Albrecht: Object-‐Relational Mapping Strategies revised – A

comparison of Row-‐ and Column-‐ oriented Database Systems, International Conference on Challenges in IT, Engineering and Technology (ICCIET), 2014

! Franziska Häger, Thomas Kowark, Jens Krüger, Christophe Vetterli, Falk Übernickel, Matthias Uflacker: DT@Scrum: Integrating Design Thinking with Software Development Processes, Understanding Innovation -‐ Building Innovators, 2014

! Thomas Kowark, Hasso Plattner: Collective, Incremental Ontology Alignment Through Query Translation, The 8th International Conference On Web Reasoning And Rule Systems, Athens, Greece, 2014

! Franziska Häger, Ralf Teusner: From theory to practice -‐ Using a multi-‐team design thinking workshop to kickstart software projects, DTBIS, 2014



1

Project Highlight HPI Business Simulator Companies invest a significant amount of time in the yearly budgeting process and resulting quarterly or monthly forecasts. This process is often seen by management as inefficient given the volatility of markets and enterprise structures. In this context, the what-‐if analysis has been established with the goal to closely model cause and effect in an enterprise and its environment. This functionality can be used for the budgeting process or as part of a forecast for scenario evaluation in terms of their goal fulfillment. However, while this theoretical model has well-‐defined semantics, it still lacks proper tool support. We have connected with a Fortune 500 company in the consumer goods industry in order to discover their needs for enterprise simulation and create a new simulation tool. The following key requirements were identified: flexibility, i.e., the adaptability to new use cases without additional programming efforts, interactivity, i.e., sufficient performance for interactive decision-‐making during planning runs, and collaboration, i.e., multi-‐user support for collaborative development of joint simulation scenarios. These identified requirements can be addressed effectively with HANA’s capability for direct execution of analytical queries on transaction data. To meet the challenge, we have developed the HPI Business Simulator, a proof-‐of-‐concept tool that allows companies to define and calculate what-‐if analyses in seconds. The main idea behind this tool is to enable companies to flexibly simulate scenarios directly on the transactional data. Users can easily configure and perform their simulations without the development overhead of custom-‐built simulations and pre-‐calculated data cubes. This serves as an enabler for ad-‐hoc decision support, planning, and forecasting – positively impacting multiple areas within a company: Purchasing: Material costs can be simulated in dependencies of commodity prices and currency fluctuations. Production: Costs can be simulated based on production paths, machine allocations, transportation costs, rejection rates, energy consumption, energy prices, and more. Sales: The sales volume can be simulated using drivers such as unit price and economic factors such as buying power and competitive vendors. Controlling: The value drivers can also be consolidated and used for profitability simulation. On the management level, executives gain more process transparency through real-‐time information access and can see the impacts of strategic decisions. The HPI Business Simulator builds on the concept of value driver models. Value driver trees, such as the DuPont model, are well-‐known methodologies to model Key Performance Indicators (KPIs) with independent linear equations or – in the case of input-‐output structures – with systems of linear equations. Using value driver models, activities and decisions in a company are focused on the core factors that drive the KPIs, e.g., the operating profit. Furthermore, their usage increases collaboration across departments, leading to more aligned operations. This results in that less effort is required for planning



2

and the development of more realistic plans, as the KPIs are directly connected to the operational drivers. Drivers can influence multiple other drivers, e.g., an increased sales volume influences both net sales and variable costs. A value driver model is a directed graph consisting of a set of nodes and their connecting edges. Each node is a value driver that either represents a data source or is calculated based on other value drivers. Nodes can drive multiple other nodes. Our model not only supports simple operations like additions and multiplications but also complex equation systems which, e.g., describe the influence of a bill of material, raw material prices and cost center rates on product costs. Models can be filtered, e.g., along the product, customer, location, and time dimensions. The simulation model can be easily configured by domain experts who define the relevant drivers and their dependencies. As an example, for the value driver ‘sales volume’, the attribute ‘quantity’ would be the value that has to be aggregated. Customer, location, product, and time would specify the supported filter dimensions. Once the value driver model has been configured, simulations are initiated by adjusting (overriding) values with the nodes of the driver model. The simulated impact is instantly visualized in relation to the actuals, plans, and forecasts. Users can set filters on the dimensions, e.g., by customer, location, or product, to explore the impact of the simulation run in detail.

Drill down in a Profit & Loss simulation. With the HPI Business Simulator it becomes feasible to directly use the transaction data for enterprise simulations.



2.3 In-‐Memory Data Management for Life Sciences and eHealth Systems In addition to our research activities in the field of "In-‐Memory Data Management for Enterprise Systems", our group focuses on applying in-‐memory technology to the field of Life Sciences. The volume of scientific data in this area typically exceeds all requirements of data sets used in traditional enterprises. Building on our long-‐lasting experience in applying in-‐memory technology to selected enterprise challenges, we also want to improve processing and analyzing large scientific data sets in real-‐time. 2.3.1 EBOKON – Surveillance of Ebola Outbreaks In occasion of the 2014 Ebola outbreak in West Africa, the aim of this ongoing project is to support the identification and management of (suspected) Ebola infections and the follow-‐up surveillance of their contacts to prevent the disease from further spreading. The project is a cooperation between HPI, Helmholtz Centre for Infection Research (HZI), Robert Koch Institute (RKI), Bernhard Nocht Insitute for Tropical Medicine (BNITM), and Nigeria Field Epidemiology & Laboratory Training Program (NFELTP). During several Design Thinking workshops in both Germany and Nigeria, we systematically analyzed experiences from field workers and the Ebola Emergency Operations Centre (EOC) after their successful control of the Ebola outbreak in Nigeria. From those insights, we identified relevant personas and developed process models depicting their interactions during an Ebola outbreak. In one of our ongoing Bachelor projects, we implement these process models and requirements into a software system. The students are building a mobile application for contact tracers, i.e. field workers who visit contacts of a (suspected) Ebola case daily to detect new Ebola cases early and initiate corresponding measures. The mobile application supports contact tracers in contact management and provides interview guides and a simple user interface for data collection. The contact tracing app will run within a private cloud that is based on standard SAP software such as SAP HANA, SAP Afaria, and SAP Mobile Platform. 2.3.2 MediTweet In the winter term 2013/14, students of our seminar “Next Generation Clinical Information Systems” had the task to improve the workflow of clinical personnel. After several cycles of interviewing, brainstorming, prototyping, and testing, one team developed the concept of a messaging system for clinical environments, called MediTweet.

MediTweet is an open messaging system for clinical environments. It connects Clinical Information Systems (CIS) with both medical devices and personnel. With MediTweet, the users are enabled to send structured messages to other users in order to automate



documentation and task synchronization. Additionally, medical devices automatically inform users about their status and results. During the summer term 2014, a working prototype of MediTweet has been implemented, including a mobile client for iOS, a messaging server, and a connector for the SAP IS-‐H and i.s.h.med system. Semi-‐structured messages containing event and context information from clinical devices, IT systems, sensors as well as users are broadcasted automatically in predefined streams within the MediTweet network. Users can subscribe and unsubscribe to streams that are relevant for them. MediTweet fully integrates into all clinical processes processed within the SAP ISH and i.s.h.med system.

MediTweet also allows the clinical staff to create tasks (actions) based on received messages. These tasks and their status are synchronized between all recipients of the message stream, making it easy to coordinate their work and improving the clinical information flow. The Solution Experience Infrastructure and Healthcare Demo Team of SAP supported the student’s initiative by providing a copy of the actual SAP Demo Cloud Healthcare System. Our prototype allows to automatically publish messages on the patients’ channel as soon as new information about the patients’ medical results enters the IS-‐H system. The SAP Healthcare Demo Team will continue the cooperation with the HPI team and is planning to implement MediTweet into the SAP Demo Cloud starting in February 2015.

Selected Publications in this Research Area in 2014 ! Hasso Plattner, Matthieu-‐P. Schapranow: High-‐Performance In-‐Memory Genome Data

Analysis: How In-‐Memory Database Technology Accelerates Personalized Medicine, In-‐Memory Data Management Research, ISBN: 978-‐3-‐319-‐03034-‐0, 2014

! Matthieu-‐P. Schapranow, Franziska Häger, Cindy Fähnrich, Emanuel Ziegler, Hasso Plattner: In-‐Memory Computing Enabling Real-‐time Genome Data Analysis, International Journal on Advances in Life Sciences, Vol 6, Nr 1-‐2, 2014

! Cindy Fähnrich, Matthieu-‐P. Schapranow, Hasso Plattner: Towards Integrating the Detection of Genetic Variants into an In-‐Memory Database, Proceedings of the International Conference on Big Data, 2014



3. SUPERVISED MASTER’S THESES The following Master’s theses have been supervised, submitted, and successfully defended in our research group in 2014:

! Tim Berning: nvm_malloc: Memory Allocation in the NVRAM Era. ! Lars Butzmann: Efficient Aggregate Cache Revalidation. ! Ralf Diestelkämper: Cache Management for Aggregates in Columnar In-‐Memory

Databases. ! Markus Dreseler: Leveraging NVRAM for the In-‐Memory Database HYRISE. ! Ekaterina Gavrilova: Alternative Data Models to Leverage the Features of In-‐Memory

Column-‐oriented Databases. ! Philipp Giese: Eliciting Expertise based on Time Series Analyses of Code Complexity

Metrics. ! Sebastian Hillig : HyDispatch: Type Dispatch for Performance and Extensibility. ! Kai Höwelmeyer: Pipelining Parallelism for Main Memory Databases. ! Cornelius Illi: Understanding Information Sharing and the Development of Shared

Understanding in Virtual New Product Development Teams. ! Sebastian Meyer: Testing Mobile Prototypes in Enterprise-‐Scale Software

Development Processes. ! Paul Möller: Leveraging Enterprise Application Characteristics to Optimize Incremental

Materialized View Maintenance on Columnar In-‐Memory Databases. ! Stefan Schäfer: A Cost Model for Optimized Coprocessor Integration. ! Björn Wagner: Mixed Workload Processing in a RDMA-‐Enabled Parallel Main-‐Memory

DBMS. ! Johannes Albrecht: Mapping Inheritance Hierarchies. A Cost Model for mapping

Object Inheritance Hierarchies to Relational Databases. ! Sebastian Oergel: The Integration of Relational Languages into Object-‐Oriented

Programming Languages. ! Daniel Taschik: Elastic In-‐Memory Computing . Quantifying the Elasticity of Relational

Database Management Systems.



4. COMPLETED PH.D. DISSERTATIONS ! Jens Krüger: Enterprise-‐specific In-‐Memory Data Management: HYRISEC – An In-‐

Memory Column Store Engine for OLXP

Abstract: Enterprise applications are presently built on a 20-‐year-‐old data management infrastructure that was designed to meet a specific set of requirements for OLTP systems. In the meantime, enterprise applications have become more sophisticated, data set sizes have increased, requirements on the freshness of data have been strengthened, and the time allotted for completing business processes has been reduced. To meet these challenges, enterprise applications have become increasingly complicated to make up for shortcomings in the data management infrastructure. These complications increase the total cost of ownership of the applications and make them harder to use. This thesis pursues the idea of designing an enterprise application-‐specific database engine, which is better optimized for the observed workload and data characteristics, while leveraging latest hardware trends and advances in data processing algorithms. As a result, the actual requirements, data characteristics, and as workloads from today’s enterprise applications are extracted, a novel workload category called Online Mixed Workload Processing (OLXP) is defined and the enterprise application-‐specific database engine HYRISEc is presented. HYRISEc facilitates read-‐optimized in-‐memory data structures since today’s database systems are designed for a more update intensive workload than they are actually facing. Traditional read-‐optimized databases use a dictionary encoded compressed column-‐oriented approach, especially in combination with an in-‐memory architecture. Inserting a tuple in such a compressed store is as complex as inserting a value in a sorted column, because the entire compression has to be rebuilt. Furthermore, traditional index structures cannot be applied efficiently as these databases are not based on a page structure.

To handle updates in a compressed storage efficiently, HYRISEc implements a technique called differential store, maintaining a small write-‐optimized delta partition that accumulates all updates. Periodically, this delta partition is combined with the read-‐optimized main partition. This merge process involves decompressing the compressed main partition, merging the delta and main partitions, and re-‐compressing the resulting main partition. For transactional enterprise applications it is crucial that their data is always available in a 24/7 environment and system downtimes are not allowed. As such, the merge process must be performed online and fast enough, so as not to degrade the update throughput. The update performance of such a system is limited by two factors: a) the insert rate for the write-‐optimized structure, and b) the speed with which the system can merge the accumulated updates back into the read optimized partition, while keeping the system online without any downtime. The merge process becomes the main bottleneck for the system, and needs to be optimized by orders of magnitude to support fast updates. Consequently, a fast attribute merging algorithm is introduced that performs a linear-‐time update of the compressed main partition, and performs multi-‐core aware optimizations to exploit the underlying high compute and bandwidth resources of modern multi-‐core CPUs.

With regard to fast lookups, memory bandwidth and latency are limiting the execution speed of queries and, therefore, this scarce resource has to be used economically to



maximize performance. Scanning a complete column results in the transfer of the entire column from memory to the processor and the costs depend linearly on the column length. A common approach to speed up the access to highly selective subsets of the data is to use indices which enable searches in logarithmical time. Thus, an index is introduced that leverages the proposed architecture. This includes the compressed column-‐oriented storage for the actual index data structures and the attribute merge algorithm for index maintenance.

To summarize, this thesis presents research results illustrating how an in-‐memory database engine can be implemented for OLXP by introducing data structures and algorithms to enable fast updates and lookups by leveraging the potential of a read-‐optimized store at the same time.

! Christian Tinnefeld: Building a Columnar Database on Shared Main Memory-‐Based

Storage: Database Operator Placement in a Shared Main Memory-‐Based Storage System that Supports Data Access and Code Execution

Abstract: In the field of disk-‐based parallel database management systems exists a great variety of solutions based on a shared-‐storage or a shared-‐nothing architecture. In contrast, main memory-‐based parallel database management systems are dominated solely by the shared-‐nothing approach as it preserves the in-‐memory performance advantage by processing data locally on each server. We argue that this unilateral development is going to cease due to the combination of the following two trends: a) nowadays network technology features remote direct memory access (RDMA) and narrows the performance gap between accessing main memory inside a server and of a remote server to and even below a single order of magnitude. b) Modern storage systems are elastic, provide durability as well as high-‐availability and — e.g. in the case of Stanford’s RAMCloud — keep all data resident in main memory. Exploiting these characteristics in the context of a main-‐memory parallel database management system is desirable, the advent of RDMA-‐enabled network technology makes the creation of a parallel main memory DBMS based on a shared-‐ storage approach feasible. This thesis describes building a columnar database on shared main memory-‐based storage. The thesis discusses the resulting architecture (Part I), the implications on query processing (Part II), and presents an evaluation (Part III) of the resulting solution in terms of performance, high-‐availability, and elasticity.

In our architecture we use Stanford’s RAMCloud as shared-‐storage and the self-‐ designed and developed in-‐memory AnalyticsDB as relational query processor on top: AnalyticsDB encapsulates data access and operator execution via an interface which allows to seamlessly switch between local and remote main memory, RAM-‐ Cloud provides not only storage capacity, but also processing power: combining both aspects allows for pushing-‐down the execution of database operators into the storage system. We describe how the columnar data processed by AnalyticsDB is mapped to RAMCloud’s key-‐value data model and how the performance advantages of columnar data storage can be preserved.

The combination of fast network technology and the possibility to execute database operators in the storage system opens the discussion for site selection. We construct a



system model that allows the estimation of operator execution costs in terms of network transfer, data processed in memory, and wall time. This can be used for database operators that work on one relation at a time — such as a scan or materialize operation — to discuss the site selection problem (data pull vs. operator push). Since a database query translates to the execution of several database operators, it is possible that the optimal site selection varies per operator. For the execution of a database operator that works on two (or more) relations at a time — such as a join — the system model is enriched by additional factors such as the chosen algorithm (e.g. Grace-‐ vs. Distributed Block Nested Loop Join vs. Cyclo-‐Join), the data partitioning of the respective relations, and their overlapping as well as the allowed resource allocation.

We present an evaluation on a cluster with 60 nodes where all nodes are connected via RDMA-‐enabled network equipment. We show that query processing performance is about 2.4x slower if everything is done via the data pull operator execution strategy (i.e. RAMCloud is being used only for data access) and about 27% slower if operator execution is also supported inside RAMCloud (in comparison to opera-‐ ting only on main memory inside a server without any network communication at all). The fast-‐crash recovery feature of RAMCloud can be leveraged for providing high-‐availability, e.g. a server crash during query execution only delays the query response for about one second. Our solution is elastic in a way that it can adapt to changing workloads a) within seconds b) without interruption of the ongoing query processing and c) without manual intervention.



5. PUBLICATIONS

5.1 Books ! Hasso Plattner: A Course in In-‐Memory Data Management: The Inner Mechanics of

In-‐Memory Databases, Second Edition, ISBN: 978-‐3-‐642-‐55269-‐4, 2014

Recent achievements in hardware and software development, such as multi-‐core CPUs and DRAM capacities of multiple terabytes per server, enabled the introduction of a revolutionary technology: in-‐memory data management. This technology supports the flexible and extremely fast analysis of massive amounts of enterprise data. Professor Hasso Plattner and his research group at the Hasso Plattner Institute in Potsdam, Germany, have been investigating and teaching the corresponding concepts and their adoption in the

software industry for years. This book is based on the first online course on the openHPI e-‐learning platform, which was launched in autumn 2012 with more than 13,000 learners. The book is designed for students of computer science, software engineering, and IT related subjects. However, it addresses business experts, decision makers, software developers, technology experts, and IT analysts alike. Plattner and his group focus on exploring the inner mechanics of a column-‐oriented dictionary-‐encoded in-‐memory database. Covered topics include -‐ amongst others -‐ physical data storage and access, basic database operators, compression mechanisms, and parallel join algorithms. Beyond that, implications for future enterprise applications and their development are discussed. Readers are lead to understand the radical differences and advantages of the new technology over traditional row-‐oriented disk-‐based databases.

! Hasso Plattner, Matthieu-‐P. Schapranow: High-‐Performance In-‐Memory Genome Data Analysis: How In-‐Memory Database Technology Accelerates Personalized Medicine, In-‐Memory Data Management Research, ISBN: 978-‐3-‐319-‐03034-‐0, 2014

Recent achievements in hardware and software developments have enabled the introduction of a revolutionary technology: in-‐memory data management. This technology supports the flexible and extremely fast analysis of massive amounts of data, such as diagnoses, therapies, and human genome data. This book shares the latest research results of applying in-‐memory data management to personalized medicine, changing it from



computational possibility to clinical reality. The authors provide details on innovative approaches to enabling the processing, combination, and analysis of relevant data in real-‐time. The book bridges the gap between medical experts, such as physicians, clinicians, and biological researchers, and technology experts, such as software developers, database specialists, and statisticians. Topics covered in this book include -‐ amongst others -‐ modeling of genome data processing and analysis pipelines, high-‐throughput data processing, exchange of sensitive data and protection of intellectual property. Beyond that, it shares insights on research prototypes for the analysis of patient cohorts, topology analysis of biological pathways, and combined search in structured and unstructured medical data, and outlines completely new processes that have now become possible due to interactive data analyses.

5.2 Book Chapters ! Franziska Häger, Thomas Kowark, Jens Krüger, Christophe Vetterli, Falk Übernickel,

Matthias Uflacker: DT@Scrum: Integrating Design Thinking with Software Development Processes, Understanding Innovation -‐ Building Innovators, 2014

5.3 Journal Articles

! Hasso Plattner, Martin Faust, Stephan Müller, David Schwalb, Matthias Uflacker, Johannes Wust: The Impact of Columnar In-‐Memory Databases on Enterprise Systems, VLDB, 2014

! Matthieu-‐P. Schapranow, Franziska Häger, Cindy Fähnrich, Emanuel Ziegler, Hasso Plattner: In-‐Memory Computing Enabling Real-‐time Genome Data Analysis, International Journal on Advances in Life Sciences, Vol 6, Nr 1-‐2, 2014

! Stephan Müller, Lars Butzmann, Stefan Klauck, Hasso Plattner: Materialized View Maintenance Leveraging In-‐Memory Data Structures, International Journal On Advances in Software, vol. 7, no. 3&4, 2014

5.4 Conference Articles ! Stephan Müller, Lars Butzmann, Stefan Klauck, Hasso Plattner: An Adaptive Aggregate

Maintenance Approach for Mixed Workloads in Columnar In-‐Memory Databases, The 37th Australasian Computer Science Conference (ACSC), Auckland, New Zealand, 2014

! Johannes Wust, Carsten Meyer, Hasso Plattner: DAC: Database Application Context Analysis applied to Enterprise Applications, The 37th Australasian Computer Science Conference (ACSC), Auckland, New Zealand, 2014

! Christian Tinnefeld, Donald Kossmann, Joos-‐Hendrik Boese, Hasso Plattner: Parallel Join Executions in RAMCloud, CloudDB -‐ In conjunction with ICDE 2014, 2014



! Christian Tinnefeld, Daniel Taschik, Hasso Plattner: Quantifying the Elasticity of a Database Management System, DBKDA, 2014

! Stephan Müller, Hasso Plattner: Aggregates Caching for Enterprise Applications, 30th International Conference on Data Engineering (ICDE), PHD Symposium, Chicago, USA, 2014

! Johannes Wust, Martin Grund, Kai Hoewelmeyer, David Schwalb, Hasso Plattner: Concurrent Execution of Mixed Enterprise Workloads on In-‐Memory Databases, DASFAA, 2014

! Stephan Müller, Ralf Diestelkämper, Hasso Plattner: Cache Management for Aggregates in Columnar In-‐Memory Databases, The 6th International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA), Chamonix, France, 2014

! Stephan Müller, Lars Butzmann, Hasso Plattner: Efficient Aggregate Cache Revalidation in an In-‐Memory Column Store, The Sixth International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA), Chamonix, France, 2014

! Martin Faust, Martin Grund, Tim Berning, David Schwalb, Hasso Plattner: Vertical Bit-‐Packing: Optimizing Operations on Bit-‐Packed Vectors Leveraging SIMD Instructions, BDMA in conjunction with DASFAA, 2014

! Franziska Häger, Ralf Teusner: From theory to practice -‐ Using a multi-‐team design thinking workshop to kickstart software projects, DTBIS, 2014

! David Schwalb, Markus Dreseler, Martin Faust, Johannes Wust, Hasso Plattner: Split Dictionaries for In-‐Memory Column Stores in Mixed Workload Environments, ADC, 2014

! Martin Lorenz, Johannes Albrecht: Object-‐Relational Mapping Strategies revised – A comparison of Row-‐ and Column-‐ oriented Database Systems, International Conference on Challenges in IT, Engineering and Technology (ICCIET), 2014

! Franziska Häger, Thomas Kowark, Matthias Uflacker: Pay it forward -‐ Planning and Assessment of a Coaching Seminar for Global-‐Design Team Alumni, The 10th NordDesign Conference, 2014

! Matthieu-‐P. Schapranow, Konrad Klinghammer, Cindy Fähnrich, Hasso Plattner: An Optimized Research Process for Real-‐time Drug Response Analysis, The 3rd International Conference on Global Health Challenges, 2014

! Thomas Kowark, Hasso Plattner: Collective, Incremental Ontology Alignment Through Query Translation, The 8th International Conference On Web Reasoning And Rule Systems, Athens, Greece, 2014

! Matthieu-‐P. Schapranow, Konrad Klinghammer, Cindy Fähnrich, Hasso Plattner: In-‐Memory Technology Enables Interactive Drug Response Analysis, 16th International Conference on e-‐Health Networking, Applications and Services (Healthcom 2014), 2014



! Cindy Fähnrich, Matthieu-‐P. Schapranow, Hasso Plattner: Towards Integrating the Detection of Genetic Variants into an In-‐Memory Database, Proceedings of the International Conference on Big Data, 2014

! Martin Boissier, Jens Krüger, Johannes Wust, Hasso Plattner: An Integrated Data Management for Enterprise Systems, Proceedings of the 16th International Conference on Enterprise Information Systems (ICEIS), 2014

5.5 Workshop Articles

! David Schwalb, Martin Faust, Jens Krüger, Hasso Plattner: Leveraging In-‐Memory Technology for Interactive Analyses of Point-‐of-‐Sales Data, BDCA in conjunction with ICDE 2014, 2014

! Stephan Müller, Paul Möller, Hasso Plattner: Leveraging Enterprise Application Characteristics to Optimize Incremental Aggregate Maintenance in a Columnar In-‐Memory Database, Second International DASFAA Workshop on Big Data Management and Analytics (BDMA), in conjunction with DASFAA, Bali, Indonesia, 2014

! Mariana Neves, Konrad Herbst, Matthias Uflacker, Hasso Plattner: Preliminary evaluation of passage retrieval in biomedical multilingual question answering, BioTxtM 2014, Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing, 2014

! David Schwalb, Martin Faust, Johannes Wust, Martin Grund, Hasso Plattner: Efficient Transaction Processing for Hyrise in Mixed Workload Environments, IMDM in conjunction with VLDB, 2014

! Martin Faust, David Schwalb, Hasso Plattner: Composite Group-‐Keys: Space-‐efficient Indexing of Multiple Columns for Compressed In-‐Memory Column Stores, IMDM in conjunction with VLDB, 2014

! Ralf Teusner, Malte Appeltauer, Michael Perscheid, Jonas Enderlein, Thomas Klingbeil, Michael Kusber: PopulAid: In-‐Memory Data Generation for Customized Benchmarks, Workshop on Big Data Benchmarking (WBDB), 2014

! Martin Boissier: Optimizing Main Memory Utilization of Columnar In-‐Memory Databases Using Data Eviction, Proceedings of Phd Workshop @ VLDB 2014, Hangzhou, 2014

! Konrad Herbst, Cindy Fähnrich, Mariana Neves, Matthieu-‐P. Schapranow: Applying In-‐Memory Technology for Automatic Template Filling in the Clinical Domain, CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, 2014

! Mariana Neves: HPI in-‐memory-‐based database system in Task 2b of BioASQ, Working Notes for the CLEF BioASQ Challenge, 2014

! Thomas Kowark, Hasso Plattner: One Query at a Time: Incremental, Collective Ontology Matching, The Ninth International Workshop on Ontology Matching, Riva del Garda, Trentino, Italy, 2014



6. TEACHING In 2014, our group has been again responsible for numerous teaching activities, covering the full range of our three research areas “In-‐Memory Data Management for Enterprise Systems”, “Tools & Methods for Enterprise Systems Design and Engineering”, and “In-‐Memory Data Management for Life Sciences and eHealth Systems”. Prof. Plattner has taught the “Trends and Concepts” series, consisting of a lecture in the summer term and a seminar held in the winter term. During the lecture, Prof. Plattner covers the basic principles and advanced use case scenarios of in-‐memory databases as well as current trends in enterprise computing. In the seminar, the students are given a concrete enterprise application scenario and design challenge that they need to address with prototypes as well as to incorporate end-‐user feedback. In the following, we summarize our teaching activities in 2014.

6.1 Summer Term 2014 Bachelor ! Enterprise Software Systems: Programming Concepts and Application Characteristics

(Seminar) ! Enterprise Workload Analysis for Hot and Cold Data Classification (Bachelor Project) ! Data and Performance Aware Development of Business Applications on SAP HANA

(Bachelor Project)

Master ! Trends and Concepts in the Software Industry I – Principals of In-‐Memory Databases

(Lecture) ! In-‐Memory Data Management Research (Seminar) ! In-‐Memory Computing for Life Science (Seminar) ! Designing and Programming Applications for In-‐Memory Databases (Exercise) ! ME310: Global Team-‐based Product Innovation & Engineering (Project Seminar) ! Code Better, Run Faster – Tools for Performance-‐Driven Enterprise Application

Development (Master Project)

6.2 Winter Term 2014/2015 Bachelor ! Software Engineering II (Lecture) ! Advanced Enterprise Applications using In-‐Memory Databases (Bachelor Project) ! Real-‐time Analysis of Big Medical Data (Bachelor Project)

Master ! Trends and Concepts in the Software Industry II – Exploiting Point-‐of-‐Sales Data

(Seminar) ! Advanced Topics on In-‐Memory Database Servers (Seminar)



! ME310: Global Team-‐based Product Innovation & Engineering (Project Seminar) ! ME310: Global Team-‐based Product Innovation & Engineering – Coaching Research

(Seminar) ! HOT or NOT? Data Aging Re-‐defined (Master Project)

6.3 ME310: Global Team-‐based Product Innovation & Engineering

Our group again participated in co-‐teaching Stanford’s ME310 class, where students have the opportunity to work on real design challenges posed by industry partners in globally distributed teams. In 2013/2014, twelve HPI students participated in this 9-‐months project-‐based innovation course. In cooperation with Stanford University and Siemens AG one team created the boardroom of the future. Another team partnered with École des Ponts ParisTech and redesigned the bathroom for the elderly. This project was undertaken in cooperation with the furniture manufacturer Lapeyre. How to use self-‐tracking devices for pharmaceutical research, has been tackled by a global team in collaboration with Aalto University and Bayer AG. ! Boardroom of the Future (with Siemens AG and Stanford University)

Meetings are a fundamental part of corporate culture, and they are necessary to disseminate information, exchange ideas, formulate strategy, and make executive decisions that “steer a company’s fortune.” Siemens AG has given us the task to “redesign the experience for decision makers” and create the boardroom of the future. Our findings from user research revealed that executives are frustrated with the inefficiencies in today’s meetings, like tedious technical setup and a non-‐collaborative environment. Furthermore, decision makers seek to leverage data in order to provide quantifiable insight into past and current operations. As a solution, we propose “The Q”, a meeting experience that reinvents executive decision making to be more productive, data-‐centric, and enjoyable. We have created a physical environment that



fosters teamwork, combined with screens, tablet devices, and a voice-‐controlled software system that provides instant access to a company’s live data. ! Bathroom for the Elderly (with Lapeyre and ENPC Paris)

An average French person spends one hour per day in the bathroom for daily hygiene, dressing, and wellness. There are currently 9 million French people over 75 years old. At this age, physical impairments increase. The muscles are weaker, the balance is less steady, the body is stiffer and the senses degenerate gradually. The devices, materials or items in the bathroom are often not adapted to deal with such impairments. Since the bathroom is the place with the highest number of accidents at home, the autonomy of the elderly is linked considerably with the adaptation of their bathroom. This need for adaptation and the demographic changes provide a significant business opportunity for Lapeyre -‐ one of the major bathroom distributors and manufacturers in France. By interviewing medical experts and users, we quickly understood that a bathroom for the elderly not only needs to address functional needs but also has to be aesthetically desirable. It’s difficult to accept physical difficulties when getting older and thus products should not stigmatize with a poor design and a purely clinical look. We developed Intemporel, a new bathroom furniture that provides comfort by offering a relaxing seating position while giving easy access to all products of daily use. Unlike common clinic-‐like "elderly furniture“, our product is for all age groups, whether they simply want to relax and sit back while they are brushing their teeth or wish to rest after standing for a long time. Due to some reluctance in the bathroom furniture market and a need for a simplistic and feasible product we have chosen a clean and unobtrusive solution that integrates into a traditional dressing table, thus combining the comfort of a coiffeuse with the functional and hygienic requirements of a sink. Intemporel is going to be produced commercially by Lapeyre and will be available in stores early 2015. ! Real-‐life Evidence for Pharmaceutics (with Bayer AG and Aalto University)

Drug development is a time and cost consuming process for pharmaceutical vendors, posing considerable risks to their business. Billion dollar investments and more than 10 years of



research, development, and clinical testing are typical. Once a product has entered the market, health insurances, clinicians, and patients demand proof of efficacy, information on long-‐term consequences and interferences with other medication. Therefore, drug manufacturers need to monitor how their products function in a real-‐life environment. Leading pharma companies like Bayer put much effort into the collection of "real-‐life evidence data", but managing and analyzing this data is difficult. That is why Bayer Healthcare has challenged us with the task to improve gathering, managing, and merging real-‐life evidence data. After exploring the problem space, we found that data structure and quality varies heavily among different sources, institutions, and countries. Thus, instead of trying to merge existing sources, we propose to create a crowd-‐powered, open and combined data source by leveraging the increasing popularity of self-‐tracking (“quantified self”). With LINK, we propose a platform for everyone to contribute to healthcare research by participating in studies posed by researchers. LINK makes it easy for people to contribute data through interactive questionnaires, blood samples, and self-‐tracking devices. Study participants can easily track their medicine intake, provide feedback, and share data with researchers. With appropriate incentives provided, researchers are able to reach out to the right participants at very low costs and short cycles. LINK shortens the distance and disconnects between patients and drug vendors and suggests real-‐life evidence monitoring, where patients and pharma companies together improve the development of better drugs.

6.4 openHPI – Online Courses In 2014, our research group conducted and supervised a number of online courses on openHPI. For the first time, Prof. Plattner’s course on In-‐Memory Data Management was not only offered in English, but also in Chinese.

! In-‐Memory Data Management – Implications on Enterprise Systems Date: 1st September – 3rd November Language: English

! In-‐Memory Data Management Date: 16th February – 14th April Language: Chinese

! In-‐Memory Data Management – Implications on Enterprise Systems Date: 3rd November – 31st December Language: Chinese



7. EVENTS, SPEECHES, AND PRESENTATIONS Additionally to presenting our work at international conferences and workshops, Prof. Plattner and members of our research group have attended several events, delivered speeches, or represented HPI at special occasions. Please find a selection below.

! Plenary Keynote at VLDB Conference

Prof. Plattner gave the plenary keynote at the Very Large Data Bases Conference (VLDB) on September 2nd, 2014 in Hangzhou, China. The title of his talk was “The impact of In-‐Memory Databases on Enterprise Systems”.

Prof. Dr. Hasso Platter at VLDB’14 in Hangzhou, China.

! Keynote Talk at the Technical University of Munich

On July 7, 2014, Prof. Plattner visited the Technical University of Munich (TUM) and gave a colloquium speech at the computer science department. Prof. Krcmar was hosting the event that was attended by the Professors of the CS department and ca. 100 IT students.

Prof. Dr. Helmut Krcmar (TUM), Prof. Dr. Hasso Plattner (HPI)



! SAPPHIRE NOW 2014

SAPPHIRE NOW, one of the world’s premier business technology conferences for senior executives, line of business, and IT decision makers, business managers, and project managers involved in deploying business technology initiatives took place on June 3 -‐ 5, 2014 at the Orange County Convention Center, Orlando, Florida. The Hasso Plattner Institute was again well represented on the show floor. We met new and old project partners at the HPI booth and presented our latest work results, parts of which have been referenced in the keynote speech of Prof. Plattner.

Our team presenting at the HPI booth on the SAPPHIRE showfloor.

! SAP TechEd and d-‐code

Our team again attended the SAP TechEd and d-‐code event in Las Vegas on October 20 – 24, 2014, where we have presented our work on the show floor and in technical talks. We also presented at the TechEd and d-‐code in Berlin, which took place on November 11-‐13, 2014. There, our team members Lars Butzmann and Stefan Klauck, together with our students Michael Weisz, Stephan Schulz, and Leo Kotschenreuther have won the SAP InnoJam and the SAP DemoJam 2014 contest with their concept and HANA-‐based prototype for “Remote Farming”.

Winners of the SAP InnoJam and DemoJam 2014: Michael Weisz, Lars Butzmann, Stephan Schulz, Leo

Kotschreuther, Stefan Klauck (from left to right)



! Fifth Workshop on Big Data Benchmarking

We have been hosting the Fifth Workshop on Big Data Benchmarking (5th WBDB) on August 5-‐6, 2014. The objective of the WBDB workshops is to make progress towards the development of industry standard application-‐level benchmarks for evaluating hardware and software systems for big data applications. Dr. Uflacker, local organization chair of the event, welcomed the international attendees on the HPI campus.



8. INDUSTRY PARTNERSHIPS

We would like to thank our industry partners for the trustful and fruitful collaboration in 2014. ! Audi AG ! Bayer AG ! Charité -‐ Universitätsmedizin Berlin ! Colgate-‐Palmolive ! Intel ! Lapeyre ! SAP SE ! Siemens AG ! ThyssenKrupp AG

9. ACADEMIC PARTNERSHIPS In 2014, our research group was in close collaboration with the following universities and institutes. ! Stanford University, USA

Joint Class – Global Project-‐based Engineering Design, Innovation and Development (ME310), Prof. Larry Leifer and Prof. Mark Cutkosky

HPI-‐Stanford Design Thinking Research Program, Prof. Larry Leifer

! Paris, École des Ponts Business School, France

Joint Class – Global Project-‐based Engineering Design, Innovation and Development (ME310) ! Aalto University, Finland

Joint Class – Global Project-‐based Engineering Design, Innovation and Development (ME310) ! University of St. Gallen, Switzerland

Joint Class – Global Project-‐based Engineering Design, Innovation and Development (ME310)



http://epic.hpi.de

annual report 2014 final web - hasso-plattner-institut · annual&report&2014&!!!! &...

Documents