Download - inferior para acompanhar o corte. A lombada só tem bleed ...zavoral/papers/13-DATA-SortMergeJoin.pdf · A lombada só tem bleed superior e inferior para acompanhar o corte. ... Corrado

LOMBADAA lombada só tem bleed superior e inferior para acompanhar o corte.

SPONSORED BY: CO-ORGANIZED BY:

REYKJAVÍK, ICELAND

29 - 31 JULY, 2013

Proceedings

2nd INTERNATIONAL CONFERENCE ON

DATA MANAGEMENT TECHNOLOGIES AND APPLICATIONS

DATA 2013Proceedings of the

2nd International Conference onData Technologies and Applications

Reykjavík, Iceland

29 - 31 July, 2013

Sponsored byINSTICC – Institute for Systems and Technologies of Information, Control

and Communication

Co-organized byReykjavik University

Copyright c© 2013 SCITEPRESS – Science and Technology PublicationsAll rights reserved

Edited by Markus Helfert, Chiara Francalanci and Joaquim Filipe

Printed in Portugal

ISBN: 978-989-8565-67-9

Depósito Legal: 361497/13

http://www.dataconference.org

[email protected]

ORGANIZING AND STEERING COMMITTEES

CONFERENCE CHAIR

Joaquim Filipe, Polytechnic Institute of Setúbal / INSTICC, Portugal

PROGRAM CO-CHAIRS

Markus Helfert, Dublin City University, Ireland

Chiara Francalanci, Politecnico di Milano, Italy

PROCEEDINGS PRODUCTION

Marina Carvalho, INSTICC, Portugal

Helder Coelhas, INSTICC, Portugal

Bruno Encarnação, INSTICC, Portugal

Ana Guerreiro, INSTICC, Portugal

Andreia Moita, INSTICC, Portugal

Raquel Pedrosa, INSTICC, Portugal

Vitor Pedrosa, INSTICC, Portugal

Cátia Pires, INSTICC, Portugal

Sara Santiago, INSTICC, Portugal

José Varela, INSTICC, Portugal

CD-ROM PRODUCTION

Pedro Varela, INSTICC, Portugal

GRAPHICS PRODUCTION AND WEBDESIGNER

André Lista, INSTICC, Portugal

Mara Silva, INSTICC, Portugal

SECRETARIAT

Marina Carvalho, INSTICC, Portugal

WEBMASTER

Susana Ribeiro, INSTICC, Portugal

V

PROGRAM COMMITTEE

Foto Afrati , National Technical University ofAthens, Greece

Hamideh Afsarmanesh, University of Amsterdam,The Netherlands

Markus Aleksy, ABB Corporate Research Center,Germany

Kenneth Anderson, University of Colorado,U.S.A.

Farhad Arbab , CWI, The Netherlands

Mortaza S. Bargh, Rotterdam University ofApplied Sciences, The Netherlands

Rudolf Bayer, Technische Universität München,Germany

Fevzi Belli, University of Paderborn, Germany

Orlando Belo, University of Minho, Portugal

Jorge Bernardino, Institute Polytechnic ofCoimbra - ISEC, Portugal

Marko Boškovic, Research Studios AustriaForschungsgesellschaft mbH, Austria

Omar Boussaid, Eric Laboratory, University ofLyon 2, France

Manfred Broy , Technische Universität München,Germany

Dumitru Burdescu, University of Craiova,Romania

Rui Cai, Microsoft Research, Asia, China

Cinzia Cappiello, Politecnico di Milano, Italy

Krzysztof Cetnarowicz, AGH - University ofScience and Technology, Poland

Kung Chen, National Chengchi University, Taiwan

Chia-Chu Chiang, University of Arkansas at LittleRock, U.S.A.

Christine Collet, Grenoble Institute of Technology,France

Stefan Conrad, Heinrich-Heine UniversityDuesseldorf, Germany

Kendra Cooper, The University of Texas at Dallas,U.S.A.

Theodore Dalamagas, Athena Research Center,Greece

Jeffrey Dalton, University of Massachusetts,U.S.A.

Stefan Dessloch, Kaiserslautern University ofTechnology, Germany

Zhiming Ding , Chinese Academy of Science,China

Peter Dolog, Aalborg University, Denmark

Habiba Drias, USTHB, LRIA, Algeria

Anton Dries, KU Leuven, Belgium

Artur Dubrawski , The Robotics Institute CarnegieMellon University, U.S.A.

Juan C. Dueñas, Universidad Politécnica deMadrid, Spain

Mohamed Y. Eltabakh, Worcester PolytechnicInstitute, U.S.A.

Fikret Ercal , Missouri University of Science &Technology, U.S.A.

Barry Floyd , California Polytechnic StateUniversity, U.S.A.

Chiara Francalanci, Politecnico di Milano, Italy

Rita Francese, Università degli Studi di Salerno,Italy

Helena Galhardas, Technical University of Lisbon,Portugal

Faiez Gargouri, Miracl Laboratory, Tunisia

Paola Giannini, Universita’ del PiemonteOrientale, Italy

J. Paul Gibson, TSP - Telecom SudParis, France

Matteo Golfarelli , University of Bologna, Italy

Cesar Gonzalez-Perez, Institute of HeritageSciences (Incipit), Spanish National ResearchCouncil (CSIC), Spain

Mohand-Said Hacid, Université Claude BernardLyon 1, France

Moustafa Hammad, Google Inc., U.S.A.

Slimane Hammoudi, ESEO, MODESTE, France

VI

PROGRAM COMMITTEE (CONT.)

Markus Helfert , Dublin City University, Ireland

Jose Luis Arciniegas Herrera, Universidad delCauca, Colombia

Melanie Herschel, Université Paris Sud / INRIASaclay, France

Jang-eui Hong, Chungbuk National University,Korea, Republic of

Stratos Idreos, CWI, The Netherlands

Ivan Ivanov, SUNY Empire State College, U.S.A.

Sanpawat Kantabutra, Chiang Mai University,Thailand

Dimitris Karagiannis , University of Vienna,Austria

Panagiotis Karras, Rutgers University, U.S.A.

Maurice van Keulen, University of Twente, TheNetherlands

Foutse Khomh, École Polytechnique, Canada

Roger (Buzz) King, University of Colorado, U.S.A.

Jeffrey W. Koch, Tarrant County College NortheastCampus, U.S.A.

Mieczyslaw Kokar, Northeastern University,U.S.A.

Konstantin Läufer , Loyola University Chicago,U.S.A.

Wolfgang Lehner, Dresden University ofTechnology, Germany

Domenico Lembo, Sapienza Università di Roma,Italy

Raimondas Lencevicius, NuanceCommunications, U.S.A.

Ming Li , Nanjing University, China

Ziyu Lin , Xiamen University, China

Hua Liu , Xerox Research Center at Webster,U.S.A.

Ricardo J. Machado, Universidade do Minho,Portugal

Leszek Maciaszek, Wroclaw University ofEconomics, Poland and Macquarie University,Sydney, Australia

Zaki Malik , Wayne State University, U.S.A.

Tiziana Margaria , University of Potsdam,Germany

Brahim Medjahed, University of Michigan,Dearborn, U.S.A.

Marian Cristian Mihaescu , University of Craiova,Romania

Dimitris Mitrakos , Aristotle University ofThessaloniki, Greece

Valérie Monfort , SOIE Tunis, Tunisia

Mirella M. Moro , Federal University of MinasGerais (UFMG), Brazil

Paolo Nesi, University of Florence, Italy

Erich Neuhold, Universität Wien, Austria

Paulo Novais, Universidade do Minho, Portugal

Rory O’Connor , Dublin City University, Ireland

Pasi Ojala, University of Oulu, Finland

Thanasis Papaioannou, EPFL, Switzerland

George Papastefanatos, RC "Athena", Greece

José R. Paramá, Universidade da Coruña, Spain

Andreas Polze, Hasso-Plattner-Institute forSoftware Engineering at University Potsdam,Germany

Christoph Quix , RWTH Aachen University,Germany

Sudha Ram, University of Arizona, U.S.A.

Alexander Rasin, DePaul University, U.S.A.

Matthias Renz, Ludwig-Maximilians-UniversityMunich, Germany

Werner Retschitzegger, Johannes KeplerUniversity, Austria

Claudio de la Riva, University of Oviedo, Spain

Gustavo Rossi, Lifia, Argentina

Gunter Saake, Institute of Technical and BusinessInformation Systems, Germany

Krzysztof Sacha, Warsaw University ofTechnology, Poland

VII

PROGRAM COMMITTEE (CONT.)

Manuel Filipe Santos, University of Minho,Portugal

M. Saravanan, Ericsson India Global Services Pvt.Ltd, India

Heiko Schuldt, University of Basel, Switzerland

Jean-Marc Seigneur, University of Geneva,Switzerland

Damian Serrano, University of Grenoble - LIG,France

Jie Shao, National University of Singapore,Singapore

Alkis Simitsis, HP Labs, U.S.A.

Harvey Siy, University of Nebraska at Omaha,U.S.A.

Yeong-tae Song, Towson University, U.S.A.

Cosmin Stoica Spahiu, University of Craiova -Faculty of Automation, Computers and Electronics,Romania

Claus Stadler, University of Leipzig, Germany

Peter Stanchev, Kettering University, U.S.A.

Jörg Unbehauen, University of Leipizg, Germany

Corrado Aaron Visaggio, RCOST - University ofSannio, Italy

Gianluigi Viscusi, Università Di Milano-bicocca,Italy

Fan Wang, Microsoft, U.S.A.

Martijn Warnier , Delft University of Technology,The Netherlands

Dietmar Wikarski , FH Brandenburg University ofApplied Sciences, Germany

Leandro Krug Wives, Universidade Federal do RioGrande do Sul, Brazil

Alexander Woehrer, Vienna Science andTechnology Fund, Austria

Yun Xiong, Fudan University, China

Amrapali Zaveri , Universität Leipzig, Germany

Xiaokun Zhang, Athabasca University, Canada

Jiakui Zhao, State Grid Electric Power ResearchInstitute, China, China

Wenchao Zhou, Georgetown University, U.S.A.

Xiangmin Zhou, CSIRO, Australia

Hong Zhu, Oxford Brookes University, U.K.

Yangyong Zhu, Fudan University, China

AUXILIARY REVIEWERS

Estrela Cruz, Instituto Politécnico Viana doCastelo, Portugal

Jonas Eckhardt, Technische Universität München,Germany

Theodora Galani, National Technical University ofAthens, Greece

Long Guo, National University of Singapore,Singapore

Hadia Mosteghanemi, University of Science andTechnology Houari Boumédiene - USTHB, Algeria

Deolinda Rasteiro, Coimbra Institute ofEngineering, Portugal

Juliana Teixeira, Minho University, Portugal

VIII

FOREWORD

This volume contains the proceedings of the 2nd International Conference on Data Tech-nologies and Applications - DATA 2013. The conference is co-organized by the ReykjavikUniversity (RU) and sponsored by the Institute for Systems and Technologies of Informa-tion, Control and Communication (INSTICC).

The purpose of DATA is to bring together researchers and practitioners interested in data-bases, data warehousing, data mining, data management, data security and other aspectsof information systems and technology involving advanced applications of data.

DATA 2013 received 62 paper submissions from 27 countries. In order to evaluate eachsubmission, a double blind paper review was performed by the Program Committee. Intotal, 34 papers are published in these proceedings and presented at the conference. Ofthese, 7 were selected to be published as full papers and 27 were selected as short papers.The full paper acceptance ratio was 11%, and the short paper acceptance ratio was 44%.

DATA’s program includes a panel to discuss aspects of data technologies and applicationsfrom both theoretical and practical perspectives, with the participation of distinguishedworld-class researchers and practitioners; furthermore, the program is enriched by severalkeynote lectures delivered by renowned experts in their areas of knowledge. These highpoints in the conference program definitely contribute to reinforce the overall quality of theDATA conference.

The program for this conference required the dedicated effort of many people. Firstly, wemust thank the authors, whose research efforts are herewith recorded. Next, we thank themembers of the Program Committee and the auxiliary reviewers for their diligent and profes-sional reviewing. We would also like to deeply thank the invited speakers for their invaluablecontribution and for taking the time to prepare their talks. Special thanks to the leadership,faculty and staff of Reykjavik University (RU) for hosting the conference in its campus andproviding RU facilities and resources. Finally, a word of appreciation for the hard work ofthe INSTICC team; organizing a conference of this level is a task that can only be achievedby the collaborative effort of a dedicated and highly capable team.

A successful conference involves more than paper presentations; it is also a meeting place,where ideas about new research projects and other ventures are discussed and debated.Therefore, a social event & banquet has been arranged for the evening of July 30 (Tuesday)in order to promote this kind of social networking.

We wish you all an exciting conference and an unforgettable stay in the city of Reykjavík.We hope to meet you again next year at DATA 2014 in Vienna, Austria.

Markus HelfertDublin City University, Ireland

Chiara FrancalanciPolitecnico di Milano, Italy

Joaquim FilipePolytechnic Institute of Setúbal / INSTICC, Portugal

IX

CONTENTS

INVITED SPEAKERS

KEYNOTE SPEAKERS

Main-Memory Centric Data Management – Open Problems and Some Solutions

Wolfgang LehnerIS-5

Context-Aware Decision Support in Dynamic Environments - Theoretical & Technological

Foundations

Alexander SmirnovIS-7

BUSINESS ANALYTICS

FULL PAPERS

Parameterised Fuzzy Petri Nets for Knowledge Representation and Reasoning

Zbigniew Suraj5

Subspace Clustering with Distance-density Function and Entropy in High-dimensional Data

Jiwu Zhao and Stefan Conrad14

Enhancing Collaboration in Big Biomedical Data Settings - Knowledge Visualization, Data Mining

and Decision Making Issues

Nikos Karacapilidis, Georgia Tsiliki and Manolis Tzagarakis23

Rating of Discrimination Networks for Rule-based Systems

Fabian Ohler, Kai Schwarz, Karl-Heinz Krempels and Christoph Terwelp32

SHORT PAPERS

A MapReduce Architecture for Web Site User Behaviour Monitoring in Real Time

Bill Karakostas and Babis Theodoulidis45

Enhancing News Articles Clustering using Word N-Grams

Christos Bouras and Vassilis Tsogkas53

Advanced Analytics with the SAP HANA Database

Philipp Große, Wolfgang Lehner and Norman May61

Predicting Cases of Ambulatory Care Sensitive Conditions

W. Haque and D. C. Finke72

Consistency of Incomplete Data

Patrick G. Clark and Jerzy Grzymala-Busse80

Database Functionalities for Evolving Monitoring Applications

Philip Schmiegelt, Jingquan Xie, Gereon Schüller and Andreas Behrend88

Effective Business Plan Evaluation using an Evolutionary Ensemble

G. Dounias, A. Tsakonas, D. Charalampakis and E. Vasilakis97

R-Pref: Rapid Prototyping of Database Preference Queries in R

Patrick Roocks and Werner Kießling104

XI

Estimate the Market Share from the Search Engine Hit Counts

Robert Viseur112

A New Addressing Scheme for Discrimination Networks easing Development and Testing

Karl-Heinz Krempels, Fabian Ohler and Christoph Terwelp118

DATA MANAGEMENT AND QUALITY

FULL PAPERS

A Generic and Flexible Framework for Selecting Correspondences in Matching and Alignment

Problems

Fabien Duchateau129

Automatic Synthesis of Data Cleansing Activities

Mario Mezzanzanica, Roberto Boselli, Mirko Cesarini and Fabio Mercorio138

SHORT PAPERS

A Clustering Topology for Wireless Sensor Networks - New Semantics over Network Topology

Paul Cotofrei, Ionel Tudor Calistru and Kilian Stoffel153

Data Management for M2M Communication using Telecom Mediation Systems

Sandeep Akhouri and Kirti Girdhar161

Data Quality Evaluation of Scientific Datasets - A Case Study in a Policy Support Context

Antonella Zanzi and Alberto Trombetta167

Social Data Sentiment Analysis in Smart Environments - Extending Dual Polarities for Crowd

Pulse Capturing

Athena Vakali, Despoina Chatzakou, Vassiliki Koutsonikola and Georgios Andreadis175

An Elastic Cache Infrastructure through Multi-level Load-balancing

Carlos Lübbe and Bernhard Mitschang183

Approaching ETL Conceptual Modelling and Validation using BPMN and BPEL

Bruno Oliveira and Orlando Belo191

Towards a Second Generation of Computer Interpretable Guidelines

Paolo Terenziani, Alessio Bottrighi, Laura Giordano, Giuliana Franceschinis, Stefania Montani,Luigi Portinale and Daniele Theseider Dupre

199

Citable by Design - A Model for Making Data in Dynamic Environments Citable

Stefan Pröll and Andreas Rauber206

Data Curation Framework for Facilities Science

Vasily Bunakov and Brian Matthews211

ONTOLOGIES AND THE SEMANTIC WEB

SHORT PAPERS

EDEX: Entity Preserving Data Exchange

Yoones A. Sekhavat and Jeffrey Parsons221

XII

Semantic Copyright Management of Media Fragments

Roberto García, David Castellà and Rosa Gil230

Designing a Farmer Centred Ontology for Social Life Network

Anusha Indika Walisadeera, Gihan Nilendra Wikramanayake and Athula Ginige238

Extraction of Biographical Data from Wikipedia

Robert Viseur248

DATABASES AND DATA SECURITY

FULL PAPER

Automata Theory based Approach to the Join Ordering Problem in Relational Database Systems

Miguel Rodríguez, Daladier Jabba, Elias Niño, Carlos Ardila and Yi-Cheng Tu257

SHORT PAPERS

Frame Time and Cardinality Indeterminacy in Temporal Relational Databases

Paolo Terenziani269

Design and Evaluation of a Graph Codec System for Software Watermarking

Maria Chroni and Stavros D. Nikolopoulos277

SylvaDB: A Polyglot and Multi-backend Graph Database Management System

Javier de la Rosa, Juan Luis Suárez and Fernando Sancho Caparrini285

Highly Scalable Sort-merge Join Algorithm for RDF Querying

Zbynek Falt, MiroslavCermák and Filip Zavoral293

AUTHOR INDEX 301

XIII

Highly Scalable Sort-merge Join Algorithm for RDF Querying

Zbynek Falt, MiroslavCermak and Filip ZavoralFaculty of Mathematics and Physics, Charles University, Prague, Czech Republic

{falt, cermak, zavoral}@ksi.mff.cuni.cz

Keywords: Merge Join, Parallel, Bobox, RDF.

Abstract: In this paper, we introduce a highly scalable sort-merge join algorithm for RDF databases. The algorithm isdesigned especially for streaming systems; besides task and data parallelism, it also tries to exploit the pipelineparallelism in order to increase its scalability. Additionally, we focused on handling skewed data correctly andefficiently; the algorithm scales well regardless of the data distribution.

1 INTRODUCTION

Join is one of the most important database opera-tion. The overall performance of data evaluation en-gines depends highly on the performance of particularjoin operations. Since the multiprocessor systems arewidely available, there is a need for the parallelizationof database operations, especially joins.

In our previous work, we focused on paralleliza-tion of SPARQL operations such as filter, nested-loops join, etc. (Cermak et al., 2011; Falt et al.,2012a). In this paper, we complete the portfolio ofparallelized SPARQL operations by proposing an ef-ficient algorithm for merge and sort-merge join.

The main target of our research is the area ofstreaming systems, since they seem to be appro-priate for a certain class of data intensive prob-lems (Bednarek et al., 2012b). Streaming systemsnaturally introduce task, data and pipeline paral-lelism (Gordon et al., 2006). Therefore, an efficientand scalable algorithm for these systems should takethese properties into account.

Our contribution is the introduction of a highlyscalable merge and sort-merge join algorithm.The algorithm also deals well with skewed datawhich may cause load imbalances during the par-allel execution (DeWitt et al., 1992). We usedSP2Bench (Schmidt et al., 2008) data generator andbenchmark to show the behaviour of our algorithm inmultiple test scenarios and to compare our RDF en-gine which uses this algorithm to other modern RDFengines such as Jena (Jena, 2013), Virtuoso (Virtuoso,2013) and Sesame (Broekstra et al., 2002).

The rest of the paper is organized as follows. Sec-tion 2 examines relevant related work on merge joins,

Section 3 shortly describes Bobox framework that isused for a pilot implementation and evaluation of thealgorithm. Most important is Section 4 containing adetailed description of the sort-merge join algorithm.Performance evaluation is described in Section 5, andSection 6 concludes the paper.

2 RELATED WORK

Parallel algorithms greatly improve the performanceof the relational join in shared-nothing systems (Liuand Rundensteiner, 2005; Schneider and DeWitt,1989) or shared memory systems (Cieslewicz et al.,2006; Lu et al., 1990).

Liu et al. (Liu and Rundensteiner, 2005) investi-gated the pipelined parallelism for multi-join queries.In comparison, we focus on exploiting the paral-lelism within a single join operation. Schneider etal. (Schneider and DeWitt, 1989) evaluated onesort-merge and three hash-based join algorithms in ashared-nothing system. In the presence of data skews,techniques such as bucket tuning (Schneider and De-Witt, 1989) and partition tuning (Hua and Lee, 1991)are used to balance loads among processor nodes.

Family of non-blocking algoritms, i.e. (Minget al., 2004; Dittrich and Seeger, 2002) is introducedto deal with pipeline processing where blocking be-haviour of network traffic makes the traditional joinoperators unsuitable (Schneider and DeWitt, 1989).The progressive-merge join (PMJ) algorithm (Dittrichand Seeger, 2002; Dittrich et al., 2003) is a non-blocking version of the traditional sort-merge join.For our parallel stream execution, we adopted the ideaof producing join results as soon as first sorted data

293

are available, even when sorting is not yet finished.(Albutiu et al., 2012) introduced a suite of new

massive parallel sort-merge (MPSM) join algorithmsbased on partial partition-based sorting to avoid ahard-to-paralellize final merge step to create one com-plete sort order. MPSM are also NUMA1-affine, asall sorting is carried on local memory partitions and itscales almost linearly with a number of used cores.

One of the specific areas of parallel join compu-tations are semantic frameworks using SPARQL lan-guage. In (Groppe and Groppe, 2011) authors pro-posed parallel algorithms for join computations ofSPARQL queries, with main focus on partitioning ofthe input data.

Although all the above mentioned papers dealwith merge join parallelization, none of them focuseson streaming systems and exploiting data, task andpipeline parallelism and data skewness at once.

3 Bobox

Bobox is a parallelization framework which simplifieswriting parallel, data intensive programs and servesas a testbed for the development of generic and es-pecially data-oriented parallel algorithms (Falt et al.,2012c; Bednarek et al., 2012a).

It provides a run-time environment which is usedto execute a non-linear pipeline (we denote it asthe execution plan) in parallel. The execution planconsists of computational units (we denote them asthe boxes) which are connected together by directededges. The task of each box is to receive data fromits incoming edges (i.e. from itsinputs) and to sendthe resulting data to its outgoing edges (i.e. to itsout-puts). The user provides the execution units and theexecution plan (i.e. the implementation of boxes andtheir mutual connections) and passes it to the frame-work which is responsible for the evaluation of theplan.

The only communication between boxes is doneby sendingenvelopes (communication units contain-ing data) along their outgoing edges. Each envelopeconsists of several columns and each column containsa certain number of data items. The data type ofitems in one column must be the same in all envelopestransferred along one particular edge; however, differ-ent columns in one envelope may have different datatypes. The data types of these columns are definedby the execution plan. Additionally, all columns inone envelope must have the same length; therefore,we can consider envelopes to be sequences of tuples.

1Non-Uniform Memory Access

The total number of tuples in an envelope is cho-sen according to the size of cache memories in thesystem. Therefore, the communication may takeplace completely in cache memory. This increases theefficiency of processing of incoming envelopes by abox.

In addition to data envelopes, Bobox distinguishso called posioned envelopes. These envelopes do notcontain any data and they just indicate the end the ofa stream.

Currently, only shared-memory architectures aresupported; therefore, only shared pointers to the en-velopes are transferred. This speeds up operationssuch as broadcast box (i.e., the box which resendsits input to its outputs) significantly since they do nothave to access data stored in envelopes.

Although the body of boxes must be strictlysingle-threaded, Bobox introduces three types of par-allelism:1. Task parallelism - independent streams are pro-

cessed in parallel.2. Pipeline parallelism - the producer of a stream

runs in parallel with its consumer.3. Data parallelism - independent parts of one

streams are processed in parallel.The first two types of parallelism are exploited im-

plicitly during the evaluation of a plan. Therefore,even an application which does not contain any ex-plicit parallelism may benefit from multiple proces-sors in the system. Data parallelism must be explicitlystated in the execution plan by the user; however, it isstill much easier to modify the execution plan than towrite the parallel code by hand.

Due to the Bobox properties and especially itssuitability for pipelined stream data processing weused the Bobox platform for a pilot implementationof the SPARQL processing engine.

4 ALGORITHMS

Contemporary merge join algorithms mentioned inSection 2 do not fit well into the streaming modelof computation (Gordon et al., 2006). Therefore,we developed an algorithm which takes into accounttask, data and pipeline parallelism. The main idea ofthe algorithm is splitting the input streams into manysmaller parts which can be processed in parallel.

The sort-merge join consists of two independentphases – sorting phase that sorts the input stream byjoin attributes and joining phase. We have utilizedthe highly scalable implementation of a stream sortingalgorithm (Falt et al., 2012b); it is briefly described inSection 4.2

DATA�2013�-�2nd�International�Conference�on�Data�Management�Technologies�and�Applications

294

4.1 Merge Join

Merge join in general has two inputs – left and right.It assumes that both inputs are sorted by the join at-tribute in an ascending order. It reads its inputs andfinds sequences of the same values of join attributesin the left and right input and then performs the crossproduct of these sequences. The pseudocode of thestandard implementation of merge join is as follows:

while le f t.hasnext∧ right.hasnextdole f t tuple← le f t.currentright tuple← right.currentif le f t tuple = right tuple then

appendle f t tuple to le f t seqle f t.movenext()while le f t.hasnext∧le f t.current=le f t tuple do

appendle f t.current tole f t seqle f t.movenext()

end whileappendright tuple to right seqright.movenext()while right.hasnext∧right.current=right tuple do

appendright.current toright seqright.movenext()

end whileoutput cross product(le f t seq, right seq)

else if le f t tuple < right tuple thenle f t.movenext()

elseright.movenext()

end ifend while

If we take any valueV of the join attribute, thenall tuples less thanV from both inputs can be pro-cessed independently on the tuples which are greateror equal toV . A common approach to merge join par-allelization is splitting the inputs into multiple partsby P− 1 valuesVi and process them in parallel inPworker threads (Groppe and Groppe, 2011).

However, there are two problems with the selec-tion of appropriate valuesVi:1. The inputs of the join are data streams; therefore,

we do not know how many input tuples are thereuntil we receive all of them. Because of the samereason, we do not know the distribution of the in-put data in advance. Therefore, we cannot easilyselectVi in order that the resulting parts have ap-proximately the same size.

2. The distribution of data could be very non-uniform (Li et al., 2002); therefore, it might beimpossible to utilize worker threads uniformly.For the sake of simplicity, we first describe a sim-

plified algorithm for joining inputs without duplicatedjoin attribute values in Section 4.1.1. Then we extendthe algorithm to take duplicities into account in Sec-tion 4.1.2.

4.1.1 Parallel Merge Join without Duplicities

In this section, we describe the algorithm which as-sumes that the input streams do not contain duplicatedjoin attributes. The execution plan of this algorithm isdepicted in Figure 1.

The algorithm makes use of the fact that thestreams are represented as a flow of envelopes. Thetask of preprocess box is to transform the flow ofinput envelopes into the flow of pairs of envelopes.The tuples in these pairs can be joined independently(i.e., in parallel).Dispatch boxes dispatch these pairsamongjoin boxes which perform the operation. Whenjoin box receives a pair of envelopes, it joins themand creates the substream of their results. Therefore,the outputs ofjoin boxes are sequences of such sub-streams which subsequently should be consolidated ina round robin manner byconsolidate box.

Now, we describe the idea and the algorithm of thepreprocess box. Consider the first envelopele f t envfrom the left input and the first enveloperight envfrom the right input. Denote the last tuple (the high-est value) inle f t env aslast le f t and the last tuple inright env aslast right.

Now, one of these three cases occurs:1. last le f t is greater thanlast right. In this case,

we can splitle f t env into two parts. The firstpart contains tuples which are less or equal tolast right and the second part contains the rest.Now, the first part ofle f t env can be joined withtheright env.

2. last le f t is less thanlast right. In this case, wecan do analogous operation as in the former case.

3. last le f t is equal tolast right. This means, thatthe wholele f t env and the wholeright env mightbe joined together.The pseudocode ofpreprocess box is as follows:

le f t env← next envelope from left inputright env← next envelope from right inputwhile le f t env 6= NIL∧ right env 6= NIL do

last le f t← le f t env[le f t env.size−1]last right← right env[right env.size−1]if le f t last > right last then

split le f t env to le f t f irst andle f t secondsendright env to the right outputsendle f t f irst to the left outputle f t env← le f t secondright env← next envelope from right input

else if le f t last < right last thensplit right env to right f irst andright secondsendright f irst to the right outputsendle f t env to the left outputle f t env← next envelope from left inputright env← right second

elsesendright env to the right outputsendle f t env to the left output

Highly�Scalable�Sort-merge�Join�Algorithm�for�RDF�Querying

295

join0

consolidate

dispatch

dispatch

join1

join2

join3

preprocess

Le

Right

Figure 1: Execution plan of parallel merge join.

le f t env← next envelope from left inputright env← next envelope from right input

end ifend whileclose the right outputclose the left output

The boxespreprocess, dispatch and consolidatemight seem to be bottlenecks of the algorithm.Dis-patch andconsolidate do not access data in envelopes,they just forward them from the input to the out-put. Since the envelope typically contains hundredsor thousands of tuples, these two boxes work in sev-eral orders of magnitude faster thanjoin box.

On the other hand,preprocess box accesses datain the envelope since it has to find the position whereto split the envelope. This can be done by a binarysearch which has time complexityO(log(L)) whereLis the number of tuples in the envelope. However, itdoes not access all tuples in the envelope; therefore, itis still much faster thanjoin box.

4.1.2 Join with Duplicities

Without duplicities,preprocess box is able to gener-ate pairs of envelopes which can be processed inde-pendently. However, the possibility of their existencecomplicates the algorithm. Consider a situation de-picted in Figure 2.

2

2

2

1

2

3

3

3

3

3

3

3

4

5

3

3

3

4

5 6

Le Right

1st pair

2nd pair

Figure 2: Duplicities of join attributes.

If join box receives the pair number 2, it needs toprocess also the pair number 1. The reason is, thatit has to perform cross products of parts which aredenoted in the figure.

Therefore,join boxes have to receive all pairs ofenvelopes for the case when there are sequences of thesame tuples across multiple envelopes. This compli-

cates the algorithm ofjoin box, since each join has tokeep track of such sequences. When we processed anenvelope (from either input), there is a possibility thatits last tuple is a part of such sequence. Therefore, wehave to keep already processed envelopes for the casethey will be needed in the future. When the last tupleof the envelope changes, the new sequence begins andwe can drop all stored envelopes except the last one.

The execution plan for the algorithm is the sameas in the previous case, the only difference is thatdis-patch box does not forward its input envelopes in around robin manner but it broadcasts them to all itsoutputs. Since a box receives and sends only sharedpointers to the envelopes, the overhead of the broad-cast operation is negligible in comparison to the joinoperation and therefore it does not limit the scalabil-ity.

Because of this modification, all boxes receive thesame envelopes. Therefore, the algorithm should dis-tinguish among them so that they generate the outputin the same manner as in Section 4.1.1. Eachjoin boxgets its own unique indexPi,0≤ Pi < P. If we denoteeach pair of envelopes sequentially by non-negativeintegersj; thenjoin box with indexPi processes suchpairs j for which it holdsj modP= Pi. This conceptof parallelization is described in (Falt et al., 2012a)in more detail.

The complete pseudocode ofjoin box is as fol-lows:

le f t env← next envelope from left inputright env← next envelope from right inputj← 0le f t seq← emptyright seq← emptywhile le f t env 6= NIL∧ right env 6= NIL do

if j modP = Pi thendo the join ofle f t env andright env

end ifj← j+1if le f t env.size > 0 then

last le f t← le f t env[le f t env.size−1]if last le f t 6= le f t seq then

le f t seq← last le f tdrop all left envelopes exceptle f t env

elsestorele f t env

end if


296

end ifif right env.size > 0 then

last right← right env[right env.size−1]if last right 6= right seq then

right seq← last rightdrop all right envelopes exceptright env

elsestoreright env

end ifend ifle f t env← next envelope from left inputright env← next envelope from right input

end while

The performance evaluation in Section 5.1.3shows that such concept of parallelization allows bet-ter scalability than other contemporary solutions.

4.2 Sort

If one or both input streams need to be sorted, we usethe approach based on algorithm described in (Faltet al., 2012b). Basically, the sorting of the stream isdivided into three phases:

1. Splitting the input stream into several equal sizedsubstreams,

2. Sorting of the substreams in parallel,3. Merging of the sorted substreams in parallel.

The algorithm scales very well; moreover, it startsto produce its output very shortly after the receptionof the last tuple. Therefore, the consecutive mergejoin can start working as soon as possible which en-ables pipeline processing and increases scalability.

However, the memory becomes indispensable bot-tleneck when sorting tuples instead of scalars, since atuple typically contains multiple items. Thus, the sort-ing of tuples needs more memory accesses especiallywhen sorting in parallel.

Therefore, we replaced the merge algorithm (usedin the second and the third phase) by a merge algo-rithm used in Funnelsort (Frigo et al., 1999). We usedthe implementation available on (Vinther, 2006). Thisalgorithm utilizes cache memories as much as possi-ble in order to decrease the number of accesses to themain memory. According to our experiments, this al-gorithm speeds up the merging phase by 20–30%.

5 EVALUATION

Since one of the main goals is efficient evaluationof SPARQL (Prud’hommeaux and Seaborne, 2008)queries, we used a standardized SP2Bench benchmarkfor the performance evaluation of our algorithm ina parallel environment. Moreover, in order to show

skewness resistance of our algorithm, we used addi-tional synthetic queries.

All experiments were performed on a server run-ning Redhat 6.0 Linux; server configuration is 2x In-tel Xeon E5310, 1.60Ghz (L1: 32kB+32kB L2: 4MBshared) and 8GB RAM. Each processor has 4 cores;therefore, we used 8 worker threads for the evalua-tion of queries. The server was dedicated specially tothe testing; no other applications were running duringmeasurements.

5.1 Scalability of the Algorithm

In this set of experiments we examined the behaviourof the join algorithm in multiple scenarios. We used5M dataset of SP2Bench.

We measured the performance of the queries inmultiple settings. The setting ST uses just one workerthread and the execution plan uses operations with-out any intraoperator parallelization (i.e., joining andsorting was performed by one box). The setting MT1uses also one worker thread; however, the executionplan uses operations with intraoperator parallelization(we use 8 worker boxes both for joining and sorting).The purpose of this setting is to show the overheadcaused by the parallelization. The MT2, MT4 andMT8 are analogous to the setting MT1; however, theyuse 2, 4 and 8 worker threads respectively. These set-tings show the scalability of the algorithm.

5.1.1 Scalability of the Merge Join

The first experiment shows the scalability of themerge join algorithm when its inputs contain long se-quences of tuples with the same join attribute (i.e., thejoin produces high number of tuples) and with the joincondition with very high selectivity (i.e., the numberof resulting tuples is relatively low). Since both inputsof the join are sorted by join attribute, this algorithmshows only the scalability of merge join and does notinclude eventual sorting.

For this experiment, we used this query E1:SELECT ?article1 ?article2WHERE {?article1 swrc:journal ?journal .?article2 swrc:journal ?journalFILTER (STR(?article1) = STR(?article2))

}

The query generates all pairs of articles whichwere published in the same journal and then selectsthe pairs which have the same URI (in fact, it returnsall articles in the dataset). The execution plan of thequery is depicted in Figure 3. The numbers in the bot-tom of boxes denote the numbers of tuples producedby the them.


297

Select

MergeJoin by ?journal(STR(?article1) = STR(?article2) )

207818

IndexScan [POS]?article1 swrc:journal ?journal

207818


207818

Figure 3: Query E1 execution plan.

Figure 4: Results for query E1.

The settings MT1 is slightly slower than ST, sincethe query plan contains in fact more boxes (see Fig-ure 1) which causes higher overhead with their man-agement. Moreover, thepreprocess box does use-less job in this setting. However, when increasingthe number of worker threads, the algorithm scalesalmost linearly with the number of threads.

5.1.2 Scalability of the Sort-merge Join

The scalability of the sort-merge join is shown in thefollowing experiment. In the contrast to the previ-ous experiment, the inputs of merge joins (the secondphase of sort-merge join) need to be sorted.

For this experiment, we used this query E2:SELECT ?article1 ?article2WHERE {?article1 swrc:journal ?journal .?article2 swrc:journal ?journal .?article1 dc:title ?title1 .?article2 dc:title ?title2FILTER(?title1 < ?title2)

}

This plan generates a large number of tupleswhich have to be sorted before they can be finallyjoined with the second input. The execution plan isdepicted in Figure 5.

We measured the runtime in the same settings asthe previous experiment and the results are shown inFigure 6.

In this experiment, the difference between ST andMT1 setting is bigger than in the previous experiment.This is caused by the fact that the parallel sort al-gorithm has some overhead (see (Falt et al., 2012b)

Select

MergeJoin on ?article2(?title1 < ?title2)

4913461

Sort by ?article210034740

MergeJoin on ?journal10034740

Sort by ?journal209387

MergeJoin on ?article1207818

IndexScan [PSO]?article1 swrc:journal ?journal

207818

IndexScan [PSO]?article1 dc:title ?title1

475059


207818

IndexScan [PSO]?article2 dc:title ?title2

475059



for more information). However, the more workerthreads are used, the bigger speed-up we gain. Thescalability is not as linear as in the previous exper-iment since the number of memory accesses duringsorting is much higher than during merging. There-fore, the memory becomes the bottleneck with highernumber of threads.

5.1.3 Data-skewness Resistance

To show the resistance of the algorithm to the non-uniform distribution of data, we used this query E3:SELECT ?artcl1 ?artcl2 ?artcl3 ?artcl4WHERE {?artcl1 rdf:type bench:PhDThesis .?artcl1 rdf:type ?type .?artcl2 rdf:type ?type .?artcl3 rdf:type ?type .?artcl4 rdf:type ?type

}

The execution plan for this query is shown inFigure 7. The variable?type has just one value


298

Select

MergeJoin on ?type3208542736



Sort by ?type238

MergeJoin on ?artcl1238

IndexScan?artcl1 rdf:type bench:PhDThesis

238

IndexScan?artcl1 rdf:type ?type

911482


911482


911482


911482



(bench:PhDThesis); therefore, all joins on that vari-able are impossible to be parallelized by partitioningtheir inputs. Despite this fact, our algorithm accord-ing to the results (Figure 8) scales very well and al-most linearly.

5.2 Comparison to other Engines

The last set of experiments compares the BoboxSPARQL engine which uses new sort-merge joinalgorithm to other mainstream SPARQL engines,such as Sesame v2.0 (Broekstra et al., 2002), Jenav2.7.4 with TDB v0.9.4 (Jena, 2013) and Virtuosov6.1.6.3127-multithreaded (Virtuoso, 2013). Theyfollow client-server architecture and we provide asum of the times of client and server processes. TheBobox engine was compiled as a single application.We omitted the time spent by loading dataset to becomparable with a server that has the data already pre-pared.

We evaluated queries multiple times over datasets5M triples and we provide the average times. Eachtest run was also limited to 30 minutes (the same time-

out as in the original SP2Bench paper). All data werestored in-memory, as our primary interest is to com-pare the basic performance of the approaches ratherthan caching etc.

Table 1: Results of SP2Bench benchmark.

ST MT8 Jena Virtuoso SesameQ1 0.01 0.01 0.01 0.00 0.54Q2 1.32 0.39 242.80 39.03 16.11Q3a 0.01 0.01 20.84 7.00 2.09Q3b 0.00 0.00 1.89 0.04 0.54Q3c 0.00 0.00 1.31 0.03 0.55Q4 43.69 6.48 TO 1740.84 TOQ5a 3.08 0.77 TO 30.89 TOQ5b 1.23 0.23 38.97 28.03 11.02Q6 TO 1119.3 TO 61.53 TOQ7 54.89 6.99 TO 23.06 TOQ8 6.73 1.21 0.26 0.24 17.37Q9 3.19 0.50 12.25 16.56 7.58Q10 0.00 0.00 0.30 0.03 1.28Q11 0.42 0.12 1.50 3.12 0.53

The results are shown in Table 1 (TO means time-out, i.e., 30 min). Queries Q1, Q3a, Q3b, Q3c andQ10 operate on few tuples and they all fit into severalenvelopes. Therefore, the parallelization is insignif-icant. However, the important feature is that despitethe more complex execution plans in settings MT8,the run time is not higher than for non-parallelizedversion.

Queries Q8 and Q6 are slower than other frame-works, since our SPARQL compiler does not performsome optimizations useful for these queries.

The most important result is that queries Q2, Q3a,Q3b, Q3c, Q4, Q5a, Q5b, Q9 and Q11 significantlyoutperform other engines. All these queries benefitfrom extensive parallelization; therefore, much largerdata can be processed in reasonable time. The signif-icant slowdown of Virtuoso in Q4 is probably causedby extensive swapping, since the result set is too big.

6 CONCLUSIONS AND FUTUREWORK

In the paper, we proposed a new method of paral-lelization of sort-merge join operation for RDF data.Such algorithm is especially designed for streamingsystems; moreover, the algorithm behaves well alsowith skewed data. The pilot implementation withinthe Bobox SPARQL engine significantly outperformsother RDF engines such as Jena, Virtuoso and Sesamein all relevant queries.

In our future research we want to focus on fur-ther optimizations such as the influence of granular-ity of data stream units (envelopes) on overall perfor-


299

mance. Additionally, the other research direction is touse these ideas for other than RDF processing, e.g.,SQL.

ACKNOWLEDGEMENTS

The authors would like to thank the GACR 103/13/08195, GAUK 277911, GAUK 472313, and SVV-2013-267312 which supported this paper.

REFERENCES

Albutiu, M.-C., Kemper, A., and Neumann, T. (2012).Massively parallel sort-merge joins in main memorymulti-core database systems.Proc. VLDB Endow.,5(10):1064–1075.

Bednarek, D., Dokulil, J., Yaghob, J., and Zavoral, F.(2012a). Bobox: Parallelization Framework for DataProcessing. InAdvances in Information Technologyand Applied Computing.

Bednarek, D., Dokulil, J., Yaghob, J., and Zavoral, F.(2012b). Data-Flow Awareness in Parallel Data Pro-cessing. In6th International Symposium on IntelligentDistributed Computing - IDC 2012. Springer-Verlag.

Broekstra, J., Kampman, A., and Harmelen, F. v. (2002).Sesame: A generic architecture for storing and query-ing RDF and RDF schema. InISWC ’02: Proceed-ings of the First International Semantic Web Confer-ence on The Semantic Web, pages 54–68, London,UK. Springer-Verlag.

Cermak, M., Dokulil, J., Falt, Z., and Zavoral, F. (2011).SPARQL Query Processing Using Bobox Framework.In SEMAPRO 2011, The Fifth International Confer-ence on Advances in Semantic Processing, pages 104–109. IARIA.

Cieslewicz, J., Berry, J., Hendrickson, B., and Ross, K. A.(2006). Realizing parallelism in database operations:insights from a massively multithreaded architecture.In Proceedings of the 2nd international workshop onData management on new hardware, DaMoN ’06,New York, NY, USA. ACM.

DeWitt, D. J., Naughton, J. F., Schneider, D. A., and Se-shadri, S. (1992). Practical skew handling in paralleljoins. In Proceedings of the 18th International Con-ference on Very Large Data Bases, VLDB ’92, pages27–40, San Francisco, CA, USA. Morgan KaufmannPublishers Inc.

Dittrich, J.-P. and Seeger, B. (2002). Progressive mergejoin: A generic and non-blocking sort-based join al-gorithm. InVLDB, pages 299–310.

Dittrich, J.-P., Seeger, B., Taylor, D. S., and Widmayer, P.(2003). On producing join results early. InProceed-ings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database sys-tems, PODS ’03, pages 134–142, New York, NY,USA. ACM.

Falt, Z., Bednarek, D., Cermak, M., and Zavoral, F. (2012a).On Parallel Evaluation of SPARQL Queries. InDBKDA 2012, The Fourth International Conference

on Advances in Databases, Knowledge, and Data Ap-plications, pages 97–102. IARIA.

Falt, Z., Bulanek, J., and Yaghob, J. (2012b). On ParallelSorting of Data Streams. InADBIS 2012 - 16th EastEuropean Conference in Advances in Databases andInformation Systems.

Falt, Z., Cermak, M., Dokulil, J., and Zavoral, F. (2012c).Parallel sparql query processing using bobox.Inter-national Journal On Advances in Intelligent Systems,5(3 and 4):302–314.

Frigo, M., Leiserson, C. E., Prokop, H., and Ramachandran,S. (1999). Cache-Oblivious Algorithms. InFOCS,pages 285–298.

Gordon, M. I., Thies, W., and Amarasinghe, S. (2006). Ex-ploiting coarse-grained task, data, and pipeline paral-lelism in stream programs.SIGARCH Comput. Archit.News, 34(5):151–162.

Groppe, J. and Groppe, S. (2011). Parallelizing join com-putations of sparql queries for large semantic webdatabases. InProceedings of the 2011 ACM Sympo-sium on Applied Computing, SAC ’11, pages 1681–1686, New York, NY, USA. ACM.

Hua, K. A. and Lee, C. (1991). Handling data skew in mul-tiprocessor database computers using partition tuning.In Proceedings of the 17th International Conferenceon Very Large Data Bases, VLDB ’91, pages 525–535, San Francisco, CA, USA. Morgan KaufmannPublishers Inc.

Jena (2013). Jena – a semantic web framework for Java.Available at: http://jena.apache.org/, [Online; Ac-cessed February 4, 2013].

Li, W., Gao, D., and Snodgrass, R. T. (2002). Skew han-dling techniques in sort-merge join. InProceedings ofthe 2002 ACM SIGMOD international conference onManagement of data, pages 169–180. ACM.

Liu, B. and Rundensteiner, E. A. (2005). Revisitingpipelined parallelism in multi-join query processing.In Proceedings of the 31st international conferenceon Very large data bases, VLDB ’05, pages 829–840.VLDB Endowment.

Lu, H., Tan, K.-L., and Sahn, M.-C. (1990). Hash-basedjoin algorithms for multiprocessor computers withshared memory. InProceedings of the sixteenth in-ternational conference on Very large databases, pages198–209, San Francisco, CA, USA. Morgan Kauf-mann Publishers Inc.

Ming, M. M., Lu, M., and Aref, W. G. (2004). Hash-mergejoin: A non-blocking join algorithm for producing fastand early join results. InIn ICDE, pages 251–263.

Prud’hommeaux, E. and Seaborne, A. (2008). SPARQLQuery Language for RDF. W3C Recommendation.

Schmidt, M., Hornung, T., Lausen, G., and Pinkel, C.(2008). Sp2bench: A sparql performance benchmark.CoRR, abs/0806.4627.

Schneider, D. A. and DeWitt, D. J. (1989). A performanceevaluation of four parallel join algorithms in a shared-nothing multiprocessor environment.SIGMOD Rec.,18(2):110–121.

Vinther, K. (2006). The Funnelsort Project. Availableat: http://kristoffer.vinther.name/projects/funnelsort/,[Online; Accessed February 4, 2013].

Virtuoso (2013). Virtuoso data server. Available at:http://virtuoso.openlinksw.com, [Online; AccessedFebruary 4, 2013].


300

AUTHOR INDEX

Akhouri, S. . . . . . . . . . . . . . . . . . . . . . . . . . . . 161Andreadis, G. . . . . . . . . . . . . . . . . . . . . . . . . . 175Ardila, C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257Behrend, A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Belo, O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191Boselli, R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138Bottrighi, A. . . . . . . . . . . . . . . . . . . . . . . . . . . 199Bouras, C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Bunakov, V. . . . . . . . . . . . . . . . . . . . . . . . . . . .211Calistru, I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Caparrini, F. . . . . . . . . . . . . . . . . . . . . . . . . . . 285Castellà, D. . . . . . . . . . . . . . . . . . . . . . . . . . . . 230Cermák, M. . . . . . . . . . . . . . . . . . . . . . . . . . . . 293Cesarini, M. . . . . . . . . . . . . . . . . . . . . . . . . . . 138Charalampakis, D. . . . . . . . . . . . . . . . . . . . . . . 97Chatzakou, D. . . . . . . . . . . . . . . . . . . . . . . . . .175Chroni, M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277Clark, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80Conrad, S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Cotofrei, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Dounias, G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Duchateau, F. . . . . . . . . . . . . . . . . . . . . . . . . . 129Dupre, D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199Falt, Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293Finke, D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Franceschinis, G. . . . . . . . . . . . . . . . . . . . . . . 199García, R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230Gil, R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .230Ginige, A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238Giordano, L. . . . . . . . . . . . . . . . . . . . . . . . . . . 199Girdhar, K. . . . . . . . . . . . . . . . . . . . . . . . . . . . 161Große, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Grzymala-Busse, J. . . . . . . . . . . . . . . . . . . . . . 80Haque, W. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Jabba, D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257Karacapilidis, N. . . . . . . . . . . . . . . . . . . . . . . . 23Karakostas, B. . . . . . . . . . . . . . . . . . . . . . . . . . .45Kießling, W. . . . . . . . . . . . . . . . . . . . . . . . . . . 104Koutsonikola, V. . . . . . . . . . . . . . . . . . . . . . . .175Krempels, K. . . . . . . . . . . . . . . . . . . . . . . 32, 118Lehner, W. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Lübbe, C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183Matthews, B. . . . . . . . . . . . . . . . . . . . . . . . . . .211May, N. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Mercorio, F. . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Mezzanzanica, M. . . . . . . . . . . . . . . . . . . . . . 138Mitschang, B. . . . . . . . . . . . . . . . . . . . . . . . . . 183Montani, S. . . . . . . . . . . . . . . . . . . . . . . . . . . . 199Nikolopoulos, S. . . . . . . . . . . . . . . . . . . . . . . 277Niño, E. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257Ohler, F. . . . . . . . . . . . . . . . . . . . . . . . . . . 32, 118Oliveira, B. . . . . . . . . . . . . . . . . . . . . . . . . . . . 191Parsons, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221Portinale, L. . . . . . . . . . . . . . . . . . . . . . . . . . . 199Pröll, S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206Rauber, A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206Rodríguez, M. . . . . . . . . . . . . . . . . . . . . . . . . .257Roocks, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Rosa, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .285Schmiegelt, P. . . . . . . . . . . . . . . . . . . . . . . . . . . 88Schüller, G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Schwarz, K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Sekhavat, Y. . . . . . . . . . . . . . . . . . . . . . . . . . . 221Stoffel, K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Suárez, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285Suraj, Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Terenziani, P. . . . . . . . . . . . . . . . . . . . . . 199, 269Terwelp, C. . . . . . . . . . . . . . . . . . . . . . . . .32, 118Theodoulidis, B. . . . . . . . . . . . . . . . . . . . . . . . .45Trombetta, A. . . . . . . . . . . . . . . . . . . . . . . . . . 167Tsakonas, A. . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Tsiliki, G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23Tsogkas, V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Tu, Y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257Tzagarakis, M. . . . . . . . . . . . . . . . . . . . . . . . . . 23Vakali, A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .175Vasilakis, E. . . . . . . . . . . . . . . . . . . . . . . . . . . . .97Viseur, R. . . . . . . . . . . . . . . . . . . . . . . . . 112, 248Walisadeera, A. . . . . . . . . . . . . . . . . . . . . . . . 238Wikramanayake, G. . . . . . . . . . . . . . . . . . . . .238Xie, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Zanzi, A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167Zavoral, F. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293Zhao, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14

301

Proceedings of DATA 2013

2nd International Conference on Data Management Technologies and Applications

ISBN: 978-989-8565-67-9 | www.dataconference.org

Copyright 2013 SCITEPRESS

Science and Technology Publications

All Rights Reserved

VISIT REYKJAVÍK:INSTICC IS MEMBER OF: LOGISTICS PARTNER:

PROCEEDINGS WILL BE SUBMITTED FOR INDEXATION BY:

Download - inferior para acompanhar o corte. A lombada só tem bleed ...zavoral/papers/13-DATA-SortMergeJoin.pdf · A lombada só tem bleed superior e inferior para acompanhar o corte. ... Corrado

Top Related