LOMBADAA lombada só tem bleed superior e inferior para acompanhar o corte.
SPONSORED BY: CO-ORGANIZED BY:
REYKJAVÍK, ICELAND
29 - 31 JULY, 2013
Proceedings
2nd INTERNATIONAL CONFERENCE ON
DATA MANAGEMENT TECHNOLOGIES AND APPLICATIONS
DATA 2013Proceedings of the
2nd International Conference onData Technologies and Applications
Reykjavík, Iceland
29 - 31 July, 2013
Sponsored byINSTICC – Institute for Systems and Technologies of Information, Control
and Communication
Co-organized byReykjavik University
Copyright c© 2013 SCITEPRESS – Science and Technology PublicationsAll rights reserved
Edited by Markus Helfert, Chiara Francalanci and Joaquim Filipe
Printed in Portugal
ISBN: 978-989-8565-67-9
Depósito Legal: 361497/13
http://www.dataconference.org
ORGANIZING AND STEERING COMMITTEES
CONFERENCE CHAIR
Joaquim Filipe, Polytechnic Institute of Setúbal / INSTICC, Portugal
PROGRAM CO-CHAIRS
Markus Helfert, Dublin City University, Ireland
Chiara Francalanci, Politecnico di Milano, Italy
PROCEEDINGS PRODUCTION
Marina Carvalho, INSTICC, Portugal
Helder Coelhas, INSTICC, Portugal
Bruno Encarnação, INSTICC, Portugal
Ana Guerreiro, INSTICC, Portugal
Andreia Moita, INSTICC, Portugal
Raquel Pedrosa, INSTICC, Portugal
Vitor Pedrosa, INSTICC, Portugal
Cátia Pires, INSTICC, Portugal
Sara Santiago, INSTICC, Portugal
José Varela, INSTICC, Portugal
CD-ROM PRODUCTION
Pedro Varela, INSTICC, Portugal
GRAPHICS PRODUCTION AND WEBDESIGNER
André Lista, INSTICC, Portugal
Mara Silva, INSTICC, Portugal
SECRETARIAT
Marina Carvalho, INSTICC, Portugal
WEBMASTER
Susana Ribeiro, INSTICC, Portugal
V
PROGRAM COMMITTEE
Foto Afrati , National Technical University ofAthens, Greece
Hamideh Afsarmanesh, University of Amsterdam,The Netherlands
Markus Aleksy, ABB Corporate Research Center,Germany
Kenneth Anderson, University of Colorado,U.S.A.
Farhad Arbab , CWI, The Netherlands
Mortaza S. Bargh, Rotterdam University ofApplied Sciences, The Netherlands
Rudolf Bayer, Technische Universität München,Germany
Fevzi Belli, University of Paderborn, Germany
Orlando Belo, University of Minho, Portugal
Jorge Bernardino, Institute Polytechnic ofCoimbra - ISEC, Portugal
Marko Boškovic, Research Studios AustriaForschungsgesellschaft mbH, Austria
Omar Boussaid, Eric Laboratory, University ofLyon 2, France
Manfred Broy , Technische Universität München,Germany
Dumitru Burdescu, University of Craiova,Romania
Rui Cai, Microsoft Research, Asia, China
Cinzia Cappiello, Politecnico di Milano, Italy
Krzysztof Cetnarowicz, AGH - University ofScience and Technology, Poland
Kung Chen, National Chengchi University, Taiwan
Chia-Chu Chiang, University of Arkansas at LittleRock, U.S.A.
Christine Collet, Grenoble Institute of Technology,France
Stefan Conrad, Heinrich-Heine UniversityDuesseldorf, Germany
Kendra Cooper, The University of Texas at Dallas,U.S.A.
Theodore Dalamagas, Athena Research Center,Greece
Jeffrey Dalton, University of Massachusetts,U.S.A.
Stefan Dessloch, Kaiserslautern University ofTechnology, Germany
Zhiming Ding , Chinese Academy of Science,China
Peter Dolog, Aalborg University, Denmark
Habiba Drias, USTHB, LRIA, Algeria
Anton Dries, KU Leuven, Belgium
Artur Dubrawski , The Robotics Institute CarnegieMellon University, U.S.A.
Juan C. Dueñas, Universidad Politécnica deMadrid, Spain
Mohamed Y. Eltabakh, Worcester PolytechnicInstitute, U.S.A.
Fikret Ercal , Missouri University of Science &Technology, U.S.A.
Barry Floyd , California Polytechnic StateUniversity, U.S.A.
Chiara Francalanci, Politecnico di Milano, Italy
Rita Francese, Università degli Studi di Salerno,Italy
Helena Galhardas, Technical University of Lisbon,Portugal
Faiez Gargouri, Miracl Laboratory, Tunisia
Paola Giannini, Universita’ del PiemonteOrientale, Italy
J. Paul Gibson, TSP - Telecom SudParis, France
Matteo Golfarelli , University of Bologna, Italy
Cesar Gonzalez-Perez, Institute of HeritageSciences (Incipit), Spanish National ResearchCouncil (CSIC), Spain
Mohand-Said Hacid, Université Claude BernardLyon 1, France
Moustafa Hammad, Google Inc., U.S.A.
Slimane Hammoudi, ESEO, MODESTE, France
VI
PROGRAM COMMITTEE (CONT.)
Markus Helfert , Dublin City University, Ireland
Jose Luis Arciniegas Herrera, Universidad delCauca, Colombia
Melanie Herschel, Université Paris Sud / INRIASaclay, France
Jang-eui Hong, Chungbuk National University,Korea, Republic of
Stratos Idreos, CWI, The Netherlands
Ivan Ivanov, SUNY Empire State College, U.S.A.
Sanpawat Kantabutra, Chiang Mai University,Thailand
Dimitris Karagiannis , University of Vienna,Austria
Panagiotis Karras, Rutgers University, U.S.A.
Maurice van Keulen, University of Twente, TheNetherlands
Foutse Khomh, École Polytechnique, Canada
Roger (Buzz) King, University of Colorado, U.S.A.
Jeffrey W. Koch, Tarrant County College NortheastCampus, U.S.A.
Mieczyslaw Kokar, Northeastern University,U.S.A.
Konstantin Läufer , Loyola University Chicago,U.S.A.
Wolfgang Lehner, Dresden University ofTechnology, Germany
Domenico Lembo, Sapienza Università di Roma,Italy
Raimondas Lencevicius, NuanceCommunications, U.S.A.
Ming Li , Nanjing University, China
Ziyu Lin , Xiamen University, China
Hua Liu , Xerox Research Center at Webster,U.S.A.
Ricardo J. Machado, Universidade do Minho,Portugal
Leszek Maciaszek, Wroclaw University ofEconomics, Poland and Macquarie University,Sydney, Australia
Zaki Malik , Wayne State University, U.S.A.
Tiziana Margaria , University of Potsdam,Germany
Brahim Medjahed, University of Michigan,Dearborn, U.S.A.
Marian Cristian Mihaescu , University of Craiova,Romania
Dimitris Mitrakos , Aristotle University ofThessaloniki, Greece
Valérie Monfort , SOIE Tunis, Tunisia
Mirella M. Moro , Federal University of MinasGerais (UFMG), Brazil
Paolo Nesi, University of Florence, Italy
Erich Neuhold, Universität Wien, Austria
Paulo Novais, Universidade do Minho, Portugal
Rory O’Connor , Dublin City University, Ireland
Pasi Ojala, University of Oulu, Finland
Thanasis Papaioannou, EPFL, Switzerland
George Papastefanatos, RC "Athena", Greece
José R. Paramá, Universidade da Coruña, Spain
Andreas Polze, Hasso-Plattner-Institute forSoftware Engineering at University Potsdam,Germany
Christoph Quix , RWTH Aachen University,Germany
Sudha Ram, University of Arizona, U.S.A.
Alexander Rasin, DePaul University, U.S.A.
Matthias Renz, Ludwig-Maximilians-UniversityMunich, Germany
Werner Retschitzegger, Johannes KeplerUniversity, Austria
Claudio de la Riva, University of Oviedo, Spain
Gustavo Rossi, Lifia, Argentina
Gunter Saake, Institute of Technical and BusinessInformation Systems, Germany
Krzysztof Sacha, Warsaw University ofTechnology, Poland
VII
PROGRAM COMMITTEE (CONT.)
Manuel Filipe Santos, University of Minho,Portugal
M. Saravanan, Ericsson India Global Services Pvt.Ltd, India
Heiko Schuldt, University of Basel, Switzerland
Jean-Marc Seigneur, University of Geneva,Switzerland
Damian Serrano, University of Grenoble - LIG,France
Jie Shao, National University of Singapore,Singapore
Alkis Simitsis, HP Labs, U.S.A.
Harvey Siy, University of Nebraska at Omaha,U.S.A.
Yeong-tae Song, Towson University, U.S.A.
Cosmin Stoica Spahiu, University of Craiova -Faculty of Automation, Computers and Electronics,Romania
Claus Stadler, University of Leipzig, Germany
Peter Stanchev, Kettering University, U.S.A.
Jörg Unbehauen, University of Leipizg, Germany
Corrado Aaron Visaggio, RCOST - University ofSannio, Italy
Gianluigi Viscusi, Università Di Milano-bicocca,Italy
Fan Wang, Microsoft, U.S.A.
Martijn Warnier , Delft University of Technology,The Netherlands
Dietmar Wikarski , FH Brandenburg University ofApplied Sciences, Germany
Leandro Krug Wives, Universidade Federal do RioGrande do Sul, Brazil
Alexander Woehrer, Vienna Science andTechnology Fund, Austria
Yun Xiong, Fudan University, China
Amrapali Zaveri , Universität Leipzig, Germany
Xiaokun Zhang, Athabasca University, Canada
Jiakui Zhao, State Grid Electric Power ResearchInstitute, China, China
Wenchao Zhou, Georgetown University, U.S.A.
Xiangmin Zhou, CSIRO, Australia
Hong Zhu, Oxford Brookes University, U.K.
Yangyong Zhu, Fudan University, China
AUXILIARY REVIEWERS
Estrela Cruz, Instituto Politécnico Viana doCastelo, Portugal
Jonas Eckhardt, Technische Universität München,Germany
Theodora Galani, National Technical University ofAthens, Greece
Long Guo, National University of Singapore,Singapore
Hadia Mosteghanemi, University of Science andTechnology Houari Boumédiene - USTHB, Algeria
Deolinda Rasteiro, Coimbra Institute ofEngineering, Portugal
Juliana Teixeira, Minho University, Portugal
VIII
FOREWORD
This volume contains the proceedings of the 2nd International Conference on Data Tech-nologies and Applications - DATA 2013. The conference is co-organized by the ReykjavikUniversity (RU) and sponsored by the Institute for Systems and Technologies of Informa-tion, Control and Communication (INSTICC).
The purpose of DATA is to bring together researchers and practitioners interested in data-bases, data warehousing, data mining, data management, data security and other aspectsof information systems and technology involving advanced applications of data.
DATA 2013 received 62 paper submissions from 27 countries. In order to evaluate eachsubmission, a double blind paper review was performed by the Program Committee. Intotal, 34 papers are published in these proceedings and presented at the conference. Ofthese, 7 were selected to be published as full papers and 27 were selected as short papers.The full paper acceptance ratio was 11%, and the short paper acceptance ratio was 44%.
DATA’s program includes a panel to discuss aspects of data technologies and applicationsfrom both theoretical and practical perspectives, with the participation of distinguishedworld-class researchers and practitioners; furthermore, the program is enriched by severalkeynote lectures delivered by renowned experts in their areas of knowledge. These highpoints in the conference program definitely contribute to reinforce the overall quality of theDATA conference.
The program for this conference required the dedicated effort of many people. Firstly, wemust thank the authors, whose research efforts are herewith recorded. Next, we thank themembers of the Program Committee and the auxiliary reviewers for their diligent and profes-sional reviewing. We would also like to deeply thank the invited speakers for their invaluablecontribution and for taking the time to prepare their talks. Special thanks to the leadership,faculty and staff of Reykjavik University (RU) for hosting the conference in its campus andproviding RU facilities and resources. Finally, a word of appreciation for the hard work ofthe INSTICC team; organizing a conference of this level is a task that can only be achievedby the collaborative effort of a dedicated and highly capable team.
A successful conference involves more than paper presentations; it is also a meeting place,where ideas about new research projects and other ventures are discussed and debated.Therefore, a social event & banquet has been arranged for the evening of July 30 (Tuesday)in order to promote this kind of social networking.
We wish you all an exciting conference and an unforgettable stay in the city of Reykjavík.We hope to meet you again next year at DATA 2014 in Vienna, Austria.
Markus HelfertDublin City University, Ireland
Chiara FrancalanciPolitecnico di Milano, Italy
Joaquim FilipePolytechnic Institute of Setúbal / INSTICC, Portugal
IX
CONTENTS
INVITED SPEAKERS
KEYNOTE SPEAKERS
Main-Memory Centric Data Management – Open Problems and Some Solutions
Wolfgang LehnerIS-5
Context-Aware Decision Support in Dynamic Environments - Theoretical & Technological
Foundations
Alexander SmirnovIS-7
BUSINESS ANALYTICS
FULL PAPERS
Parameterised Fuzzy Petri Nets for Knowledge Representation and Reasoning
Zbigniew Suraj5
Subspace Clustering with Distance-density Function and Entropy in High-dimensional Data
Jiwu Zhao and Stefan Conrad14
Enhancing Collaboration in Big Biomedical Data Settings - Knowledge Visualization, Data Mining
and Decision Making Issues
Nikos Karacapilidis, Georgia Tsiliki and Manolis Tzagarakis23
Rating of Discrimination Networks for Rule-based Systems
Fabian Ohler, Kai Schwarz, Karl-Heinz Krempels and Christoph Terwelp32
SHORT PAPERS
A MapReduce Architecture for Web Site User Behaviour Monitoring in Real Time
Bill Karakostas and Babis Theodoulidis45
Enhancing News Articles Clustering using Word N-Grams
Christos Bouras and Vassilis Tsogkas53
Advanced Analytics with the SAP HANA Database
Philipp Große, Wolfgang Lehner and Norman May61
Predicting Cases of Ambulatory Care Sensitive Conditions
W. Haque and D. C. Finke72
Consistency of Incomplete Data
Patrick G. Clark and Jerzy Grzymala-Busse80
Database Functionalities for Evolving Monitoring Applications
Philip Schmiegelt, Jingquan Xie, Gereon Schüller and Andreas Behrend88
Effective Business Plan Evaluation using an Evolutionary Ensemble
G. Dounias, A. Tsakonas, D. Charalampakis and E. Vasilakis97
R-Pref: Rapid Prototyping of Database Preference Queries in R
Patrick Roocks and Werner Kießling104
XI
Estimate the Market Share from the Search Engine Hit Counts
Robert Viseur112
A New Addressing Scheme for Discrimination Networks easing Development and Testing
Karl-Heinz Krempels, Fabian Ohler and Christoph Terwelp118
DATA MANAGEMENT AND QUALITY
FULL PAPERS
A Generic and Flexible Framework for Selecting Correspondences in Matching and Alignment
Problems
Fabien Duchateau129
Automatic Synthesis of Data Cleansing Activities
Mario Mezzanzanica, Roberto Boselli, Mirko Cesarini and Fabio Mercorio138
SHORT PAPERS
A Clustering Topology for Wireless Sensor Networks - New Semantics over Network Topology
Paul Cotofrei, Ionel Tudor Calistru and Kilian Stoffel153
Data Management for M2M Communication using Telecom Mediation Systems
Sandeep Akhouri and Kirti Girdhar161
Data Quality Evaluation of Scientific Datasets - A Case Study in a Policy Support Context
Antonella Zanzi and Alberto Trombetta167
Social Data Sentiment Analysis in Smart Environments - Extending Dual Polarities for Crowd
Pulse Capturing
Athena Vakali, Despoina Chatzakou, Vassiliki Koutsonikola and Georgios Andreadis175
An Elastic Cache Infrastructure through Multi-level Load-balancing
Carlos Lübbe and Bernhard Mitschang183
Approaching ETL Conceptual Modelling and Validation using BPMN and BPEL
Bruno Oliveira and Orlando Belo191
Towards a Second Generation of Computer Interpretable Guidelines
Paolo Terenziani, Alessio Bottrighi, Laura Giordano, Giuliana Franceschinis, Stefania Montani,Luigi Portinale and Daniele Theseider Dupre
199
Citable by Design - A Model for Making Data in Dynamic Environments Citable
Stefan Pröll and Andreas Rauber206
Data Curation Framework for Facilities Science
Vasily Bunakov and Brian Matthews211
ONTOLOGIES AND THE SEMANTIC WEB
SHORT PAPERS
EDEX: Entity Preserving Data Exchange
Yoones A. Sekhavat and Jeffrey Parsons221
XII
Semantic Copyright Management of Media Fragments
Roberto García, David Castellà and Rosa Gil230
Designing a Farmer Centred Ontology for Social Life Network
Anusha Indika Walisadeera, Gihan Nilendra Wikramanayake and Athula Ginige238
Extraction of Biographical Data from Wikipedia
Robert Viseur248
DATABASES AND DATA SECURITY
FULL PAPER
Automata Theory based Approach to the Join Ordering Problem in Relational Database Systems
Miguel Rodríguez, Daladier Jabba, Elias Niño, Carlos Ardila and Yi-Cheng Tu257
SHORT PAPERS
Frame Time and Cardinality Indeterminacy in Temporal Relational Databases
Paolo Terenziani269
Design and Evaluation of a Graph Codec System for Software Watermarking
Maria Chroni and Stavros D. Nikolopoulos277
SylvaDB: A Polyglot and Multi-backend Graph Database Management System
Javier de la Rosa, Juan Luis Suárez and Fernando Sancho Caparrini285
Highly Scalable Sort-merge Join Algorithm for RDF Querying
Zbynek Falt, MiroslavCermák and Filip Zavoral293
AUTHOR INDEX 301
XIII
Highly Scalable Sort-merge Join Algorithm for RDF Querying
Zbynek Falt, MiroslavCermak and Filip ZavoralFaculty of Mathematics and Physics, Charles University, Prague, Czech Republic
{falt, cermak, zavoral}@ksi.mff.cuni.cz
Keywords: Merge Join, Parallel, Bobox, RDF.
Abstract: In this paper, we introduce a highly scalable sort-merge join algorithm for RDF databases. The algorithm isdesigned especially for streaming systems; besides task and data parallelism, it also tries to exploit the pipelineparallelism in order to increase its scalability. Additionally, we focused on handling skewed data correctly andefficiently; the algorithm scales well regardless of the data distribution.
1 INTRODUCTION
Join is one of the most important database opera-tion. The overall performance of data evaluation en-gines depends highly on the performance of particularjoin operations. Since the multiprocessor systems arewidely available, there is a need for the parallelizationof database operations, especially joins.
In our previous work, we focused on paralleliza-tion of SPARQL operations such as filter, nested-loops join, etc. (Cermak et al., 2011; Falt et al.,2012a). In this paper, we complete the portfolio ofparallelized SPARQL operations by proposing an ef-ficient algorithm for merge and sort-merge join.
The main target of our research is the area ofstreaming systems, since they seem to be appro-priate for a certain class of data intensive prob-lems (Bednarek et al., 2012b). Streaming systemsnaturally introduce task, data and pipeline paral-lelism (Gordon et al., 2006). Therefore, an efficientand scalable algorithm for these systems should takethese properties into account.
Our contribution is the introduction of a highlyscalable merge and sort-merge join algorithm.The algorithm also deals well with skewed datawhich may cause load imbalances during the par-allel execution (DeWitt et al., 1992). We usedSP2Bench (Schmidt et al., 2008) data generator andbenchmark to show the behaviour of our algorithm inmultiple test scenarios and to compare our RDF en-gine which uses this algorithm to other modern RDFengines such as Jena (Jena, 2013), Virtuoso (Virtuoso,2013) and Sesame (Broekstra et al., 2002).
The rest of the paper is organized as follows. Sec-tion 2 examines relevant related work on merge joins,
Section 3 shortly describes Bobox framework that isused for a pilot implementation and evaluation of thealgorithm. Most important is Section 4 containing adetailed description of the sort-merge join algorithm.Performance evaluation is described in Section 5, andSection 6 concludes the paper.
2 RELATED WORK
Parallel algorithms greatly improve the performanceof the relational join in shared-nothing systems (Liuand Rundensteiner, 2005; Schneider and DeWitt,1989) or shared memory systems (Cieslewicz et al.,2006; Lu et al., 1990).
Liu et al. (Liu and Rundensteiner, 2005) investi-gated the pipelined parallelism for multi-join queries.In comparison, we focus on exploiting the paral-lelism within a single join operation. Schneider etal. (Schneider and DeWitt, 1989) evaluated onesort-merge and three hash-based join algorithms in ashared-nothing system. In the presence of data skews,techniques such as bucket tuning (Schneider and De-Witt, 1989) and partition tuning (Hua and Lee, 1991)are used to balance loads among processor nodes.
Family of non-blocking algoritms, i.e. (Minget al., 2004; Dittrich and Seeger, 2002) is introducedto deal with pipeline processing where blocking be-haviour of network traffic makes the traditional joinoperators unsuitable (Schneider and DeWitt, 1989).The progressive-merge join (PMJ) algorithm (Dittrichand Seeger, 2002; Dittrich et al., 2003) is a non-blocking version of the traditional sort-merge join.For our parallel stream execution, we adopted the ideaof producing join results as soon as first sorted data
293
are available, even when sorting is not yet finished.(Albutiu et al., 2012) introduced a suite of new
massive parallel sort-merge (MPSM) join algorithmsbased on partial partition-based sorting to avoid ahard-to-paralellize final merge step to create one com-plete sort order. MPSM are also NUMA1-affine, asall sorting is carried on local memory partitions and itscales almost linearly with a number of used cores.
One of the specific areas of parallel join compu-tations are semantic frameworks using SPARQL lan-guage. In (Groppe and Groppe, 2011) authors pro-posed parallel algorithms for join computations ofSPARQL queries, with main focus on partitioning ofthe input data.
Although all the above mentioned papers dealwith merge join parallelization, none of them focuseson streaming systems and exploiting data, task andpipeline parallelism and data skewness at once.
3 Bobox
Bobox is a parallelization framework which simplifieswriting parallel, data intensive programs and servesas a testbed for the development of generic and es-pecially data-oriented parallel algorithms (Falt et al.,2012c; Bednarek et al., 2012a).
It provides a run-time environment which is usedto execute a non-linear pipeline (we denote it asthe execution plan) in parallel. The execution planconsists of computational units (we denote them asthe boxes) which are connected together by directededges. The task of each box is to receive data fromits incoming edges (i.e. from itsinputs) and to sendthe resulting data to its outgoing edges (i.e. to itsout-puts). The user provides the execution units and theexecution plan (i.e. the implementation of boxes andtheir mutual connections) and passes it to the frame-work which is responsible for the evaluation of theplan.
The only communication between boxes is doneby sendingenvelopes (communication units contain-ing data) along their outgoing edges. Each envelopeconsists of several columns and each column containsa certain number of data items. The data type ofitems in one column must be the same in all envelopestransferred along one particular edge; however, differ-ent columns in one envelope may have different datatypes. The data types of these columns are definedby the execution plan. Additionally, all columns inone envelope must have the same length; therefore,we can consider envelopes to be sequences of tuples.
1Non-Uniform Memory Access
The total number of tuples in an envelope is cho-sen according to the size of cache memories in thesystem. Therefore, the communication may takeplace completely in cache memory. This increases theefficiency of processing of incoming envelopes by abox.
In addition to data envelopes, Bobox distinguishso called posioned envelopes. These envelopes do notcontain any data and they just indicate the end the ofa stream.
Currently, only shared-memory architectures aresupported; therefore, only shared pointers to the en-velopes are transferred. This speeds up operationssuch as broadcast box (i.e., the box which resendsits input to its outputs) significantly since they do nothave to access data stored in envelopes.
Although the body of boxes must be strictlysingle-threaded, Bobox introduces three types of par-allelism:1. Task parallelism - independent streams are pro-
cessed in parallel.2. Pipeline parallelism - the producer of a stream
runs in parallel with its consumer.3. Data parallelism - independent parts of one
streams are processed in parallel.The first two types of parallelism are exploited im-
plicitly during the evaluation of a plan. Therefore,even an application which does not contain any ex-plicit parallelism may benefit from multiple proces-sors in the system. Data parallelism must be explicitlystated in the execution plan by the user; however, it isstill much easier to modify the execution plan than towrite the parallel code by hand.
Due to the Bobox properties and especially itssuitability for pipelined stream data processing weused the Bobox platform for a pilot implementationof the SPARQL processing engine.
4 ALGORITHMS
Contemporary merge join algorithms mentioned inSection 2 do not fit well into the streaming modelof computation (Gordon et al., 2006). Therefore,we developed an algorithm which takes into accounttask, data and pipeline parallelism. The main idea ofthe algorithm is splitting the input streams into manysmaller parts which can be processed in parallel.
The sort-merge join consists of two independentphases – sorting phase that sorts the input stream byjoin attributes and joining phase. We have utilizedthe highly scalable implementation of a stream sortingalgorithm (Falt et al., 2012b); it is briefly described inSection 4.2
DATA�2013�-�2nd�International�Conference�on�Data�Management�Technologies�and�Applications
294
4.1 Merge Join
Merge join in general has two inputs – left and right.It assumes that both inputs are sorted by the join at-tribute in an ascending order. It reads its inputs andfinds sequences of the same values of join attributesin the left and right input and then performs the crossproduct of these sequences. The pseudocode of thestandard implementation of merge join is as follows:
while le f t.hasnext∧ right.hasnextdole f t tuple← le f t.currentright tuple← right.currentif le f t tuple = right tuple then
appendle f t tuple to le f t seqle f t.movenext()while le f t.hasnext∧le f t.current=le f t tuple do
appendle f t.current tole f t seqle f t.movenext()
end whileappendright tuple to right seqright.movenext()while right.hasnext∧right.current=right tuple do
appendright.current toright seqright.movenext()
end whileoutput cross product(le f t seq, right seq)
else if le f t tuple < right tuple thenle f t.movenext()
elseright.movenext()
end ifend while
If we take any valueV of the join attribute, thenall tuples less thanV from both inputs can be pro-cessed independently on the tuples which are greateror equal toV . A common approach to merge join par-allelization is splitting the inputs into multiple partsby P− 1 valuesVi and process them in parallel inPworker threads (Groppe and Groppe, 2011).
However, there are two problems with the selec-tion of appropriate valuesVi:1. The inputs of the join are data streams; therefore,
we do not know how many input tuples are thereuntil we receive all of them. Because of the samereason, we do not know the distribution of the in-put data in advance. Therefore, we cannot easilyselectVi in order that the resulting parts have ap-proximately the same size.
2. The distribution of data could be very non-uniform (Li et al., 2002); therefore, it might beimpossible to utilize worker threads uniformly.For the sake of simplicity, we first describe a sim-
plified algorithm for joining inputs without duplicatedjoin attribute values in Section 4.1.1. Then we extendthe algorithm to take duplicities into account in Sec-tion 4.1.2.
4.1.1 Parallel Merge Join without Duplicities
In this section, we describe the algorithm which as-sumes that the input streams do not contain duplicatedjoin attributes. The execution plan of this algorithm isdepicted in Figure 1.
The algorithm makes use of the fact that thestreams are represented as a flow of envelopes. Thetask of preprocess box is to transform the flow ofinput envelopes into the flow of pairs of envelopes.The tuples in these pairs can be joined independently(i.e., in parallel).Dispatch boxes dispatch these pairsamongjoin boxes which perform the operation. Whenjoin box receives a pair of envelopes, it joins themand creates the substream of their results. Therefore,the outputs ofjoin boxes are sequences of such sub-streams which subsequently should be consolidated ina round robin manner byconsolidate box.
Now, we describe the idea and the algorithm of thepreprocess box. Consider the first envelopele f t envfrom the left input and the first enveloperight envfrom the right input. Denote the last tuple (the high-est value) inle f t env aslast le f t and the last tuple inright env aslast right.
Now, one of these three cases occurs:1. last le f t is greater thanlast right. In this case,
we can splitle f t env into two parts. The firstpart contains tuples which are less or equal tolast right and the second part contains the rest.Now, the first part ofle f t env can be joined withtheright env.
2. last le f t is less thanlast right. In this case, wecan do analogous operation as in the former case.
3. last le f t is equal tolast right. This means, thatthe wholele f t env and the wholeright env mightbe joined together.The pseudocode ofpreprocess box is as follows:
le f t env← next envelope from left inputright env← next envelope from right inputwhile le f t env 6= NIL∧ right env 6= NIL do
last le f t← le f t env[le f t env.size−1]last right← right env[right env.size−1]if le f t last > right last then
split le f t env to le f t f irst andle f t secondsendright env to the right outputsendle f t f irst to the left outputle f t env← le f t secondright env← next envelope from right input
else if le f t last < right last thensplit right env to right f irst andright secondsendright f irst to the right outputsendle f t env to the left outputle f t env← next envelope from left inputright env← right second
elsesendright env to the right outputsendle f t env to the left output
Highly�Scalable�Sort-merge�Join�Algorithm�for�RDF�Querying
295
join0
consolidate
dispatch
dispatch
join1
join2
join3
preprocess
Le
Right
Figure 1: Execution plan of parallel merge join.
le f t env← next envelope from left inputright env← next envelope from right input
end ifend whileclose the right outputclose the left output
The boxespreprocess, dispatch and consolidatemight seem to be bottlenecks of the algorithm.Dis-patch andconsolidate do not access data in envelopes,they just forward them from the input to the out-put. Since the envelope typically contains hundredsor thousands of tuples, these two boxes work in sev-eral orders of magnitude faster thanjoin box.
On the other hand,preprocess box accesses datain the envelope since it has to find the position whereto split the envelope. This can be done by a binarysearch which has time complexityO(log(L)) whereLis the number of tuples in the envelope. However, itdoes not access all tuples in the envelope; therefore, itis still much faster thanjoin box.
4.1.2 Join with Duplicities
Without duplicities,preprocess box is able to gener-ate pairs of envelopes which can be processed inde-pendently. However, the possibility of their existencecomplicates the algorithm. Consider a situation de-picted in Figure 2.
2
2
2
1
2
3
3
3
3
3
3
3
4
5
3
3
3
4
5 6
Le Right
1st pair
2nd pair
Figure 2: Duplicities of join attributes.
If join box receives the pair number 2, it needs toprocess also the pair number 1. The reason is, thatit has to perform cross products of parts which aredenoted in the figure.
Therefore,join boxes have to receive all pairs ofenvelopes for the case when there are sequences of thesame tuples across multiple envelopes. This compli-
cates the algorithm ofjoin box, since each join has tokeep track of such sequences. When we processed anenvelope (from either input), there is a possibility thatits last tuple is a part of such sequence. Therefore, wehave to keep already processed envelopes for the casethey will be needed in the future. When the last tupleof the envelope changes, the new sequence begins andwe can drop all stored envelopes except the last one.
The execution plan for the algorithm is the sameas in the previous case, the only difference is thatdis-patch box does not forward its input envelopes in around robin manner but it broadcasts them to all itsoutputs. Since a box receives and sends only sharedpointers to the envelopes, the overhead of the broad-cast operation is negligible in comparison to the joinoperation and therefore it does not limit the scalabil-ity.
Because of this modification, all boxes receive thesame envelopes. Therefore, the algorithm should dis-tinguish among them so that they generate the outputin the same manner as in Section 4.1.1. Eachjoin boxgets its own unique indexPi,0≤ Pi < P. If we denoteeach pair of envelopes sequentially by non-negativeintegersj; thenjoin box with indexPi processes suchpairs j for which it holdsj modP= Pi. This conceptof parallelization is described in (Falt et al., 2012a)in more detail.
The complete pseudocode ofjoin box is as fol-lows:
le f t env← next envelope from left inputright env← next envelope from right inputj← 0le f t seq← emptyright seq← emptywhile le f t env 6= NIL∧ right env 6= NIL do
if j modP = Pi thendo the join ofle f t env andright env
end ifj← j+1if le f t env.size > 0 then
last le f t← le f t env[le f t env.size−1]if last le f t 6= le f t seq then
le f t seq← last le f tdrop all left envelopes exceptle f t env
elsestorele f t env
end if
DATA�2013�-�2nd�International�Conference�on�Data�Management�Technologies�and�Applications
296
end ifif right env.size > 0 then
last right← right env[right env.size−1]if last right 6= right seq then
right seq← last rightdrop all right envelopes exceptright env
elsestoreright env
end ifend ifle f t env← next envelope from left inputright env← next envelope from right input
end while
The performance evaluation in Section 5.1.3shows that such concept of parallelization allows bet-ter scalability than other contemporary solutions.
4.2 Sort
If one or both input streams need to be sorted, we usethe approach based on algorithm described in (Faltet al., 2012b). Basically, the sorting of the stream isdivided into three phases:
1. Splitting the input stream into several equal sizedsubstreams,
2. Sorting of the substreams in parallel,3. Merging of the sorted substreams in parallel.
The algorithm scales very well; moreover, it startsto produce its output very shortly after the receptionof the last tuple. Therefore, the consecutive mergejoin can start working as soon as possible which en-ables pipeline processing and increases scalability.
However, the memory becomes indispensable bot-tleneck when sorting tuples instead of scalars, since atuple typically contains multiple items. Thus, the sort-ing of tuples needs more memory accesses especiallywhen sorting in parallel.
Therefore, we replaced the merge algorithm (usedin the second and the third phase) by a merge algo-rithm used in Funnelsort (Frigo et al., 1999). We usedthe implementation available on (Vinther, 2006). Thisalgorithm utilizes cache memories as much as possi-ble in order to decrease the number of accesses to themain memory. According to our experiments, this al-gorithm speeds up the merging phase by 20–30%.
5 EVALUATION
Since one of the main goals is efficient evaluationof SPARQL (Prud’hommeaux and Seaborne, 2008)queries, we used a standardized SP2Bench benchmarkfor the performance evaluation of our algorithm ina parallel environment. Moreover, in order to show
skewness resistance of our algorithm, we used addi-tional synthetic queries.
All experiments were performed on a server run-ning Redhat 6.0 Linux; server configuration is 2x In-tel Xeon E5310, 1.60Ghz (L1: 32kB+32kB L2: 4MBshared) and 8GB RAM. Each processor has 4 cores;therefore, we used 8 worker threads for the evalua-tion of queries. The server was dedicated specially tothe testing; no other applications were running duringmeasurements.
5.1 Scalability of the Algorithm
In this set of experiments we examined the behaviourof the join algorithm in multiple scenarios. We used5M dataset of SP2Bench.
We measured the performance of the queries inmultiple settings. The setting ST uses just one workerthread and the execution plan uses operations with-out any intraoperator parallelization (i.e., joining andsorting was performed by one box). The setting MT1uses also one worker thread; however, the executionplan uses operations with intraoperator parallelization(we use 8 worker boxes both for joining and sorting).The purpose of this setting is to show the overheadcaused by the parallelization. The MT2, MT4 andMT8 are analogous to the setting MT1; however, theyuse 2, 4 and 8 worker threads respectively. These set-tings show the scalability of the algorithm.
5.1.1 Scalability of the Merge Join
The first experiment shows the scalability of themerge join algorithm when its inputs contain long se-quences of tuples with the same join attribute (i.e., thejoin produces high number of tuples) and with the joincondition with very high selectivity (i.e., the numberof resulting tuples is relatively low). Since both inputsof the join are sorted by join attribute, this algorithmshows only the scalability of merge join and does notinclude eventual sorting.
For this experiment, we used this query E1:SELECT ?article1 ?article2WHERE {?article1 swrc:journal ?journal .?article2 swrc:journal ?journalFILTER (STR(?article1) = STR(?article2))
}
The query generates all pairs of articles whichwere published in the same journal and then selectsthe pairs which have the same URI (in fact, it returnsall articles in the dataset). The execution plan of thequery is depicted in Figure 3. The numbers in the bot-tom of boxes denote the numbers of tuples producedby the them.
Highly�Scalable�Sort-merge�Join�Algorithm�for�RDF�Querying
297
Select
MergeJoin by ?journal(STR(?article1) = STR(?article2) )
207818
IndexScan [POS]?article1 swrc:journal ?journal
207818
IndexScan [POS]?article2 swrc:journal ?journal
207818
Figure 3: Query E1 execution plan.
Figure 4: Results for query E1.
The settings MT1 is slightly slower than ST, sincethe query plan contains in fact more boxes (see Fig-ure 1) which causes higher overhead with their man-agement. Moreover, thepreprocess box does use-less job in this setting. However, when increasingthe number of worker threads, the algorithm scalesalmost linearly with the number of threads.
5.1.2 Scalability of the Sort-merge Join
The scalability of the sort-merge join is shown in thefollowing experiment. In the contrast to the previ-ous experiment, the inputs of merge joins (the secondphase of sort-merge join) need to be sorted.
For this experiment, we used this query E2:SELECT ?article1 ?article2WHERE {?article1 swrc:journal ?journal .?article2 swrc:journal ?journal .?article1 dc:title ?title1 .?article2 dc:title ?title2FILTER(?title1 < ?title2)
}
This plan generates a large number of tupleswhich have to be sorted before they can be finallyjoined with the second input. The execution plan isdepicted in Figure 5.
We measured the runtime in the same settings asthe previous experiment and the results are shown inFigure 6.
In this experiment, the difference between ST andMT1 setting is bigger than in the previous experiment.This is caused by the fact that the parallel sort al-gorithm has some overhead (see (Falt et al., 2012b)
Select
MergeJoin on ?article2(?title1 < ?title2)
4913461
Sort by ?article210034740
MergeJoin on ?journal10034740
Sort by ?journal209387
MergeJoin on ?article1207818
IndexScan [PSO]?article1 swrc:journal ?journal
207818
IndexScan [PSO]?article1 dc:title ?title1
475059
IndexScan [POS]?article2 swrc:journal ?journal
207818
IndexScan [PSO]?article2 dc:title ?title2
475059
Figure 5: Query E2 execution plan.
Figure 6: Results for query E2.
for more information). However, the more workerthreads are used, the bigger speed-up we gain. Thescalability is not as linear as in the previous exper-iment since the number of memory accesses duringsorting is much higher than during merging. There-fore, the memory becomes the bottleneck with highernumber of threads.
5.1.3 Data-skewness Resistance
To show the resistance of the algorithm to the non-uniform distribution of data, we used this query E3:SELECT ?artcl1 ?artcl2 ?artcl3 ?artcl4WHERE {?artcl1 rdf:type bench:PhDThesis .?artcl1 rdf:type ?type .?artcl2 rdf:type ?type .?artcl3 rdf:type ?type .?artcl4 rdf:type ?type
}
The execution plan for this query is shown inFigure 7. The variable?type has just one value
DATA�2013�-�2nd�International�Conference�on�Data�Management�Technologies�and�Applications
298
Select
MergeJoin on ?type3208542736
MergeJoin on ?type13481272
MergeJoin on ?type56644
Sort by ?type238
MergeJoin on ?artcl1238
IndexScan?artcl1 rdf:type bench:PhDThesis
238
IndexScan?artcl1 rdf:type ?type
911482
IndexScan?artcl4 rdf:type ?type
911482
IndexScan?artcl2 rdf:type ?type
911482
IndexScan?artcl3 rdf:type ?type
911482
Figure 7: Query E3 execution plan.
Figure 8: Results for query E3.
(bench:PhDThesis); therefore, all joins on that vari-able are impossible to be parallelized by partitioningtheir inputs. Despite this fact, our algorithm accord-ing to the results (Figure 8) scales very well and al-most linearly.
5.2 Comparison to other Engines
The last set of experiments compares the BoboxSPARQL engine which uses new sort-merge joinalgorithm to other mainstream SPARQL engines,such as Sesame v2.0 (Broekstra et al., 2002), Jenav2.7.4 with TDB v0.9.4 (Jena, 2013) and Virtuosov6.1.6.3127-multithreaded (Virtuoso, 2013). Theyfollow client-server architecture and we provide asum of the times of client and server processes. TheBobox engine was compiled as a single application.We omitted the time spent by loading dataset to becomparable with a server that has the data already pre-pared.
We evaluated queries multiple times over datasets5M triples and we provide the average times. Eachtest run was also limited to 30 minutes (the same time-
out as in the original SP2Bench paper). All data werestored in-memory, as our primary interest is to com-pare the basic performance of the approaches ratherthan caching etc.
Table 1: Results of SP2Bench benchmark.
ST MT8 Jena Virtuoso SesameQ1 0.01 0.01 0.01 0.00 0.54Q2 1.32 0.39 242.80 39.03 16.11Q3a 0.01 0.01 20.84 7.00 2.09Q3b 0.00 0.00 1.89 0.04 0.54Q3c 0.00 0.00 1.31 0.03 0.55Q4 43.69 6.48 TO 1740.84 TOQ5a 3.08 0.77 TO 30.89 TOQ5b 1.23 0.23 38.97 28.03 11.02Q6 TO 1119.3 TO 61.53 TOQ7 54.89 6.99 TO 23.06 TOQ8 6.73 1.21 0.26 0.24 17.37Q9 3.19 0.50 12.25 16.56 7.58Q10 0.00 0.00 0.30 0.03 1.28Q11 0.42 0.12 1.50 3.12 0.53
The results are shown in Table 1 (TO means time-out, i.e., 30 min). Queries Q1, Q3a, Q3b, Q3c andQ10 operate on few tuples and they all fit into severalenvelopes. Therefore, the parallelization is insignif-icant. However, the important feature is that despitethe more complex execution plans in settings MT8,the run time is not higher than for non-parallelizedversion.
Queries Q8 and Q6 are slower than other frame-works, since our SPARQL compiler does not performsome optimizations useful for these queries.
The most important result is that queries Q2, Q3a,Q3b, Q3c, Q4, Q5a, Q5b, Q9 and Q11 significantlyoutperform other engines. All these queries benefitfrom extensive parallelization; therefore, much largerdata can be processed in reasonable time. The signif-icant slowdown of Virtuoso in Q4 is probably causedby extensive swapping, since the result set is too big.
6 CONCLUSIONS AND FUTUREWORK
In the paper, we proposed a new method of paral-lelization of sort-merge join operation for RDF data.Such algorithm is especially designed for streamingsystems; moreover, the algorithm behaves well alsowith skewed data. The pilot implementation withinthe Bobox SPARQL engine significantly outperformsother RDF engines such as Jena, Virtuoso and Sesamein all relevant queries.
In our future research we want to focus on fur-ther optimizations such as the influence of granular-ity of data stream units (envelopes) on overall perfor-
Highly�Scalable�Sort-merge�Join�Algorithm�for�RDF�Querying
299
mance. Additionally, the other research direction is touse these ideas for other than RDF processing, e.g.,SQL.
ACKNOWLEDGEMENTS
The authors would like to thank the GACR 103/13/08195, GAUK 277911, GAUK 472313, and SVV-2013-267312 which supported this paper.
REFERENCES
Albutiu, M.-C., Kemper, A., and Neumann, T. (2012).Massively parallel sort-merge joins in main memorymulti-core database systems.Proc. VLDB Endow.,5(10):1064–1075.
Bednarek, D., Dokulil, J., Yaghob, J., and Zavoral, F.(2012a). Bobox: Parallelization Framework for DataProcessing. InAdvances in Information Technologyand Applied Computing.
Bednarek, D., Dokulil, J., Yaghob, J., and Zavoral, F.(2012b). Data-Flow Awareness in Parallel Data Pro-cessing. In6th International Symposium on IntelligentDistributed Computing - IDC 2012. Springer-Verlag.
Broekstra, J., Kampman, A., and Harmelen, F. v. (2002).Sesame: A generic architecture for storing and query-ing RDF and RDF schema. InISWC ’02: Proceed-ings of the First International Semantic Web Confer-ence on The Semantic Web, pages 54–68, London,UK. Springer-Verlag.
Cermak, M., Dokulil, J., Falt, Z., and Zavoral, F. (2011).SPARQL Query Processing Using Bobox Framework.In SEMAPRO 2011, The Fifth International Confer-ence on Advances in Semantic Processing, pages 104–109. IARIA.
Cieslewicz, J., Berry, J., Hendrickson, B., and Ross, K. A.(2006). Realizing parallelism in database operations:insights from a massively multithreaded architecture.In Proceedings of the 2nd international workshop onData management on new hardware, DaMoN ’06,New York, NY, USA. ACM.
DeWitt, D. J., Naughton, J. F., Schneider, D. A., and Se-shadri, S. (1992). Practical skew handling in paralleljoins. In Proceedings of the 18th International Con-ference on Very Large Data Bases, VLDB ’92, pages27–40, San Francisco, CA, USA. Morgan KaufmannPublishers Inc.
Dittrich, J.-P. and Seeger, B. (2002). Progressive mergejoin: A generic and non-blocking sort-based join al-gorithm. InVLDB, pages 299–310.
Dittrich, J.-P., Seeger, B., Taylor, D. S., and Widmayer, P.(2003). On producing join results early. InProceed-ings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database sys-tems, PODS ’03, pages 134–142, New York, NY,USA. ACM.
Falt, Z., Bednarek, D., Cermak, M., and Zavoral, F. (2012a).On Parallel Evaluation of SPARQL Queries. InDBKDA 2012, The Fourth International Conference
on Advances in Databases, Knowledge, and Data Ap-plications, pages 97–102. IARIA.
Falt, Z., Bulanek, J., and Yaghob, J. (2012b). On ParallelSorting of Data Streams. InADBIS 2012 - 16th EastEuropean Conference in Advances in Databases andInformation Systems.
Falt, Z., Cermak, M., Dokulil, J., and Zavoral, F. (2012c).Parallel sparql query processing using bobox.Inter-national Journal On Advances in Intelligent Systems,5(3 and 4):302–314.
Frigo, M., Leiserson, C. E., Prokop, H., and Ramachandran,S. (1999). Cache-Oblivious Algorithms. InFOCS,pages 285–298.
Gordon, M. I., Thies, W., and Amarasinghe, S. (2006). Ex-ploiting coarse-grained task, data, and pipeline paral-lelism in stream programs.SIGARCH Comput. Archit.News, 34(5):151–162.
Groppe, J. and Groppe, S. (2011). Parallelizing join com-putations of sparql queries for large semantic webdatabases. InProceedings of the 2011 ACM Sympo-sium on Applied Computing, SAC ’11, pages 1681–1686, New York, NY, USA. ACM.
Hua, K. A. and Lee, C. (1991). Handling data skew in mul-tiprocessor database computers using partition tuning.In Proceedings of the 17th International Conferenceon Very Large Data Bases, VLDB ’91, pages 525–535, San Francisco, CA, USA. Morgan KaufmannPublishers Inc.
Jena (2013). Jena – a semantic web framework for Java.Available at: http://jena.apache.org/, [Online; Ac-cessed February 4, 2013].
Li, W., Gao, D., and Snodgrass, R. T. (2002). Skew han-dling techniques in sort-merge join. InProceedings ofthe 2002 ACM SIGMOD international conference onManagement of data, pages 169–180. ACM.
Liu, B. and Rundensteiner, E. A. (2005). Revisitingpipelined parallelism in multi-join query processing.In Proceedings of the 31st international conferenceon Very large data bases, VLDB ’05, pages 829–840.VLDB Endowment.
Lu, H., Tan, K.-L., and Sahn, M.-C. (1990). Hash-basedjoin algorithms for multiprocessor computers withshared memory. InProceedings of the sixteenth in-ternational conference on Very large databases, pages198–209, San Francisco, CA, USA. Morgan Kauf-mann Publishers Inc.
Ming, M. M., Lu, M., and Aref, W. G. (2004). Hash-mergejoin: A non-blocking join algorithm for producing fastand early join results. InIn ICDE, pages 251–263.
Prud’hommeaux, E. and Seaborne, A. (2008). SPARQLQuery Language for RDF. W3C Recommendation.
Schmidt, M., Hornung, T., Lausen, G., and Pinkel, C.(2008). Sp2bench: A sparql performance benchmark.CoRR, abs/0806.4627.
Schneider, D. A. and DeWitt, D. J. (1989). A performanceevaluation of four parallel join algorithms in a shared-nothing multiprocessor environment.SIGMOD Rec.,18(2):110–121.
Vinther, K. (2006). The Funnelsort Project. Availableat: http://kristoffer.vinther.name/projects/funnelsort/,[Online; Accessed February 4, 2013].
Virtuoso (2013). Virtuoso data server. Available at:http://virtuoso.openlinksw.com, [Online; AccessedFebruary 4, 2013].
DATA�2013�-�2nd�International�Conference�on�Data�Management�Technologies�and�Applications
300
AUTHOR INDEX
Akhouri, S. . . . . . . . . . . . . . . . . . . . . . . . . . . . 161Andreadis, G. . . . . . . . . . . . . . . . . . . . . . . . . . 175Ardila, C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257Behrend, A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Belo, O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191Boselli, R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138Bottrighi, A. . . . . . . . . . . . . . . . . . . . . . . . . . . 199Bouras, C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Bunakov, V. . . . . . . . . . . . . . . . . . . . . . . . . . . .211Calistru, I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Caparrini, F. . . . . . . . . . . . . . . . . . . . . . . . . . . 285Castellà, D. . . . . . . . . . . . . . . . . . . . . . . . . . . . 230Cermák, M. . . . . . . . . . . . . . . . . . . . . . . . . . . . 293Cesarini, M. . . . . . . . . . . . . . . . . . . . . . . . . . . 138Charalampakis, D. . . . . . . . . . . . . . . . . . . . . . . 97Chatzakou, D. . . . . . . . . . . . . . . . . . . . . . . . . .175Chroni, M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277Clark, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80Conrad, S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Cotofrei, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Dounias, G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Duchateau, F. . . . . . . . . . . . . . . . . . . . . . . . . . 129Dupre, D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199Falt, Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293Finke, D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Franceschinis, G. . . . . . . . . . . . . . . . . . . . . . . 199García, R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230Gil, R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .230Ginige, A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238Giordano, L. . . . . . . . . . . . . . . . . . . . . . . . . . . 199Girdhar, K. . . . . . . . . . . . . . . . . . . . . . . . . . . . 161Große, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Grzymala-Busse, J. . . . . . . . . . . . . . . . . . . . . . 80Haque, W. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Jabba, D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257Karacapilidis, N. . . . . . . . . . . . . . . . . . . . . . . . 23Karakostas, B. . . . . . . . . . . . . . . . . . . . . . . . . . .45Kießling, W. . . . . . . . . . . . . . . . . . . . . . . . . . . 104Koutsonikola, V. . . . . . . . . . . . . . . . . . . . . . . .175Krempels, K. . . . . . . . . . . . . . . . . . . . . . . 32, 118Lehner, W. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Lübbe, C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183Matthews, B. . . . . . . . . . . . . . . . . . . . . . . . . . .211May, N. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Mercorio, F. . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Mezzanzanica, M. . . . . . . . . . . . . . . . . . . . . . 138Mitschang, B. . . . . . . . . . . . . . . . . . . . . . . . . . 183Montani, S. . . . . . . . . . . . . . . . . . . . . . . . . . . . 199Nikolopoulos, S. . . . . . . . . . . . . . . . . . . . . . . 277Niño, E. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257Ohler, F. . . . . . . . . . . . . . . . . . . . . . . . . . . 32, 118Oliveira, B. . . . . . . . . . . . . . . . . . . . . . . . . . . . 191Parsons, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221Portinale, L. . . . . . . . . . . . . . . . . . . . . . . . . . . 199Pröll, S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206Rauber, A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206Rodríguez, M. . . . . . . . . . . . . . . . . . . . . . . . . .257Roocks, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Rosa, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .285Schmiegelt, P. . . . . . . . . . . . . . . . . . . . . . . . . . . 88Schüller, G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Schwarz, K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Sekhavat, Y. . . . . . . . . . . . . . . . . . . . . . . . . . . 221Stoffel, K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Suárez, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285Suraj, Z. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Terenziani, P. . . . . . . . . . . . . . . . . . . . . . 199, 269Terwelp, C. . . . . . . . . . . . . . . . . . . . . . . . .32, 118Theodoulidis, B. . . . . . . . . . . . . . . . . . . . . . . . .45Trombetta, A. . . . . . . . . . . . . . . . . . . . . . . . . . 167Tsakonas, A. . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Tsiliki, G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23Tsogkas, V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Tu, Y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257Tzagarakis, M. . . . . . . . . . . . . . . . . . . . . . . . . . 23Vakali, A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .175Vasilakis, E. . . . . . . . . . . . . . . . . . . . . . . . . . . . .97Viseur, R. . . . . . . . . . . . . . . . . . . . . . . . . 112, 248Walisadeera, A. . . . . . . . . . . . . . . . . . . . . . . . 238Wikramanayake, G. . . . . . . . . . . . . . . . . . . . .238Xie, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Zanzi, A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167Zavoral, F. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293Zhao, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
301
Proceedings of DATA 2013
2nd International Conference on Data Management Technologies and Applications
ISBN: 978-989-8565-67-9 | www.dataconference.org
Copyright 2013 SCITEPRESS
Science and Technology Publications
All Rights Reserved
VISIT REYKJAVÍK:INSTICC IS MEMBER OF: LOGISTICS PARTNER:
PROCEEDINGS WILL BE SUBMITTED FOR INDEXATION BY: