architecture for stream olap exploiting spe and olap engine...we propose a novel architecture for...

Post on 10-Sep-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Architecture for Stream OLAP Exploiting SPE and OLAP Engine

University of Tsukuba | Center for Computational Sciences

http://www.ccs.tsukuba.ac.jp/

Database Group

Explosive increase of real-time data sources, so-called "data streams" (or just "steams") and increasing demands for real-time analysis over streams give rise to real-time analysis over streams. However, developing tailor-made systems for suchapplications is not always desirable due to high developing costs and long developing periods. To cope with this problem,we propose a novel architecture for online analytical processing (OLAP) over streams exploiting off-the-shelf streamprocessing engine (SPE) combined with OLAP engine. It allows users to perform OLAP analysis over streams for the latesttime period, called Interval of Interest (IoI). The system in the meantime processes multiple continuous query language(CQL) queries corresponding to different aggregation levels in cube lattice. To cover arbitrary aggregation levels usinglimited system's memory, we propose to partially deploy CQL queries for those with higher reference frequencies, whereasthe results are dynamically calculated using existing aggregation results with the help of OLAP engine. For optimal CQLquery deployment, we propose a cost-based optimization method that maximizes the performance. The experimentalresults show that the proposed architecture is feasible enough to realize stream OLAP by combining an SPE and an OLAPengine. Also, the proposed system significantly outperforms other comparative methods by generating optimized querydeployment plans.

Fig. 1: Architecture of stream OLAP system. Fig. 2: A GUI Screenshot of Stream OLAP system analyzing “People Flow Data.”

GPU Acceleration of Set Similarity Joins

We propose a scheme of efficient set similarity joins on Graphics Processing Units (GPUs). Due to the rapid growth anddiversification of data, there is an increasing demand for fast execution of set similarity joins in applications that vary fromdata integration to plagiarism detection. To tackle this problem, our solution takes advantage of the massive parallelprocessing offered by GPUs. Additionally, we employ MinHash to estimate the similarity between two sets in terms ofJaccard similarity. By exploiting the high parallelism of GPUs and the space efficiency provided by MinHash, we can achievehigh performance without renouncing accuracy. Experimental results show that our proposed method is more than twoorders of magnitude faster than the serial version of CPU implementation, and 25 times faster than the parallel version ofCPU implementation, while generating highly precise query results.

Fig. 3: Similarity estimation using MinHash.

Fig. 4: Process flow.

Fig. 5:

top related