![Page 1: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/1.jpg)
SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION-AWARE
DATABASE SERVERS
A Thesis
Submitted to the Faculty
of
Purdue University
by
Mohamed F. Mokbel
In Partial Fulfillment of the
Requirements for the Degree
of
Doctor of Philosophy
August 2005
![Page 2: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/2.jpg)
ii
To my parents, Fathalla and Zeinab, my wife Thanaa, and my son Abdelrahman
![Page 3: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/3.jpg)
iii
ACKNOWLEDGMENTS
It is my pleasure to express my gratitude to a large number of people who have
contributed, in many different ways, to make my success a part of their own.
First, I wish to express my deepest gratitude to my supervisor Dr. Walid Aref. I
am totally indebted to his continuous encouragement, tireless efforts, and invaluable
guidance. He spent endless hours in teaching me how to be a researcher, how to
identify research challenges, how to transform my fledging ideas into crisp research
endeavors, how to present and sell my ideas, and finally how to tackle the job market.
Besides being a mentor, he has also been a personal friend to whom i have often
turned for advice. After all, i was really fortune to have Walid as my advisor and i
hope that i can be as generous, patient, friendly, and tireless with my students.
I will be always grateful to Dr. Ahmed Elmagarmid for the thoughtful discussions
with him on both the professional and personal levels. Whenever i was in need to
an advice or stuck in a decision, Ahmed was always there by his experience and
invaluable comments. I am really grateful to him and i wish i will be as helpful to
my students as well.
My gratitude and appreciation to my advisory and examining committee Prof.
Susanne Hambrusch, Prof. Sunil Prabhakar, and Prof. Elisa Bertino for their time
and efforts. Special thanks to Prof. Ananth Grama and Prof. Ibrahim Kamel for
collaborating in various research projects.
During my summer intern with the Database group at Microsoft Research, i have
worked with wonderful group of people. My sincere thanks to Dr. David Lomet
for being a wonderful mentor and for sharing his advice and experience with me.
My discussions with David significantly contributed to my passion for large scale
systems-oriented database research. I would never think of a better mentor than
David Lomet. Special thanks for Dr. Roger Barga for his help and support in jump-
![Page 4: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/4.jpg)
iv
starting my work in Microsoft Research. Beside his work-related support, Roger was
a great friend that i was lucky to have. I would like to extend my appreciation to
the rest of the Database group at Microsoft Research. The everyday informal lunch-
meetings and discussions were more than wonderful and added a lot to my research
and life experience.
Special thanks are due to my friends, fellow students, and colleagues who made
my graduate life easier. In particular, thanks to Ossama Younis for being my com-
panion through the whole journey of Master and PhD, to Mohamed Elfeky for being
my all-time classmate, to Ahmed Soliman for his friendship, to Xiapong Xiong for
collaborating in the PLACE project, and to M. Ali, Hicham Elmongui, Moustafa
Hammad, and Ming Lu for research collaboration. Many thanks to the rest of the
ICDS (Indiana Center for Database Systems) group at Purdue University.
My sincere gratitude goes to my wife Thanaa for her unceasing patience when
I spend way too much time on computer stuff. While she was in a real need to
more time for her PhD study, she voluntary sacrifices her time to me. I am totally
indebted to her love, caring, and support. Thanks for my son Abdelrahman (3 years
old) for organizing my life, waking me up early everyday, keeping me awake late at
night, and letting me know, appreciate, and make the best use of every available
time slot.
My forever gratitude goes to my parents. Without their unconditional love,
support, and encouragements, i would have never made this far. Everything i have
achieved or will achieve in my life is through their guidance and the sacrifices they
have made for me.
Ahead of all, I thank ALLAH. For only through ALLAH grace and blessings has
this pursuit been possible. I pray for ALLAH support and guidance in the rest of
my career and my life.
![Page 5: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/5.jpg)
v
TABLE OF CONTENTS
Page
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Location-aware Database Servers . . . . . . . . . . . . . . . . . . . . 1
1.2 New Challenges to Database Systems . . . . . . . . . . . . . . . . . . 4
1.3 The PLACE Prototype Server . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Summary and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Challenges and their Related Work . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Spatio-temporal Query Classification . . . . . . . . . . . . . . . . . . 13
2.2 Challenge I: Massive Size of Incoming Spatio-temporal Data . . . . . 16
2.3 Challenge II: Repetitive Evaluation of Continuous Queries . . . . . . 18
2.4 Challenge III: Large Numbers of Concurrent Continuous Queries . . . 19
2.5 Challenge IV: Wide Variety of Continuous Queries . . . . . . . . . . . 20
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Disk-based Spatio-temporal Continuous Query Processing . . . . . . . . . . 22
3.1 Shared Execution of Continuous Spatio-temporal Queries . . . . . . . 23
3.2 The SINA Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 Phase I: Hashing . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.2 Phase II: Invalidation . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.3 Phase III: Joining . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Extensibility of SINA . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.1 Querying the Future . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.2 k-nearest-neighbor Queries . . . . . . . . . . . . . . . . . . . . 38
![Page 6: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/6.jpg)
vi
Page
3.3.3 Aggregate Queries . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.4 Out-of-Sync Clients . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Correctness of SINA . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.1 Properties of SINA . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5.2 Number of Objects/Queries . . . . . . . . . . . . . . . . . . . 48
3.5.3 Percentage of Moving Objects/Queries . . . . . . . . . . . . . 50
3.5.4 Locality of movement . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Stream-based Spatio-temporal Query Processing: Query Operators . . . . . 55
4.1 The GPAC: Continuous Spatio-temporal Query Operators . . . . . . 56
4.2 Uncertainty in Continuous Spatio-temporal Queries . . . . . . . . . . 59
4.2.1 Types of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.2 Uncertainty Avoidance in GPAC . . . . . . . . . . . . . . . . 62
4.3 Instances of GPAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3.1 Spatio-temporal Range Queries . . . . . . . . . . . . . . . . . 67
4.3.2 Spatio-temporal k-nearest-neighbor . . . . . . . . . . . . . . . 67
4.4 Pipelined Spatio-temporal Query Operators . . . . . . . . . . . . . . 68
4.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5.1 GPAC Operators in a Pipelined Query Plan . . . . . . . . . . 71
4.5.2 Properties of GPAC . . . . . . . . . . . . . . . . . . . . . . . 74
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5 Stream-based Spatio-temporal Query Processing: Scalability . . . . . . . . 78
5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.1.1 Spatio-temporal Databases . . . . . . . . . . . . . . . . . . . . 79
5.1.2 Data Stream Management Systems . . . . . . . . . . . . . . . 80
5.2 The SOLE Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3 Shared Memory in SOLE . . . . . . . . . . . . . . . . . . . . . . . . . 83
![Page 7: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/7.jpg)
vii
Page
5.3.1 Shared Object Buffer . . . . . . . . . . . . . . . . . . . . . . . 83
5.3.2 Shared Query Buffer . . . . . . . . . . . . . . . . . . . . . . . 84
5.3.3 Optimizing the Shared Buffer Pool . . . . . . . . . . . . . . . 85
5.4 Shared Execution in SOLE . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5 Load Shedding in SOLE . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.5.1 Query Load Shedding . . . . . . . . . . . . . . . . . . . . . . . 93
5.5.2 Object Load Shedding . . . . . . . . . . . . . . . . . . . . . . 94
5.5.3 Load Shedding with Locking . . . . . . . . . . . . . . . . . . . 95
5.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.6.1 Properties of SOLE . . . . . . . . . . . . . . . . . . . . . . . . 96
5.6.2 Scalability of SOLE . . . . . . . . . . . . . . . . . . . . . . . . 98
5.6.3 Response Time . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.6.4 Accuracy of Load Shedding . . . . . . . . . . . . . . . . . . . 102
5.6.5 Scalability of Load Shedding . . . . . . . . . . . . . . . . . . . 103
5.6.6 Object Load Shedding . . . . . . . . . . . . . . . . . . . . . . 105
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2 Future Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.2.1 Continuous Query Optimization . . . . . . . . . . . . . . . . . 109
6.2.2 Cost Model for Spatio-temporal Operators . . . . . . . . . . . 110
6.2.3 Context-aware Query Processing . . . . . . . . . . . . . . . . 110
LIST OF REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
![Page 8: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/8.jpg)
viii
LIST OF FIGURES
Figure Page
1.1 Location-aware devices. . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Example of continuous spatio-temporal queries submitted to location-aware database servers. . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Incremental evaluation of range queries. . . . . . . . . . . . . . . . . . 6
1.4 Server GUI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Client GUI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Shared execution of continuous queries. . . . . . . . . . . . . . . . . . 24
3.2 State diagram of SINA. . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Example of range spatio-temporal queries. . . . . . . . . . . . . . . . 26
3.4 Phase I: Hashing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Pseudo code of the Hashing phase . . . . . . . . . . . . . . . . . . . . 29
3.6 Phase II: Invalidation. . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.7 Pseudo code of the Invalidation phase. . . . . . . . . . . . . . . . . . 31
3.8 Pseudo code invalidating moving objects. . . . . . . . . . . . . . . . . 32
3.9 Pseudo code invalidating moving queries. . . . . . . . . . . . . . . . . 33
3.10 Pseudo code for the joining phase. . . . . . . . . . . . . . . . . . . . . 36
3.11 Querying the future. . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.12 k-NN spatio-temporal queries. . . . . . . . . . . . . . . . . . . . . . . 39
3.13 Example of Out-of-Sync queries. . . . . . . . . . . . . . . . . . . . . . 41
3.14 Road network map of Oldenburg City. . . . . . . . . . . . . . . . . . 46
3.15 The answer size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.16 The impact of grid size N . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.17 Scalability with number of objects. . . . . . . . . . . . . . . . . . . . 49
3.18 Scalability with number of queries. . . . . . . . . . . . . . . . . . . . 50
![Page 9: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/9.jpg)
ix
Figure Page
3.19 Percentage of moving objects. . . . . . . . . . . . . . . . . . . . . . . 51
3.20 Scalability of SINA with update rates. . . . . . . . . . . . . . . . . . 52
3.21 Effect of movement locality. . . . . . . . . . . . . . . . . . . . . . . . 53
4.1 Pseudo code of skeleton of GPAC. . . . . . . . . . . . . . . . . . . . . 58
4.2 Updating query information in GPAC . . . . . . . . . . . . . . . . . . 60
4.3 Uncertainty in moving range queries. . . . . . . . . . . . . . . . . . . 61
4.4 Uncertainty in moving NN queries. . . . . . . . . . . . . . . . . . . . 61
4.5 Uncertainty in static NN queries. . . . . . . . . . . . . . . . . . . . . 62
4.6 The cache area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.7 Pseudo code of GPAC with caching. . . . . . . . . . . . . . . . . . . . 65
4.8 Updating query information in GPAC with caching . . . . . . . . . . 66
4.9 Greater Lafayette, Indiana, USA. . . . . . . . . . . . . . . . . . . . . 70
4.10 Pipelined GPAC operators. . . . . . . . . . . . . . . . . . . . . . . . . 72
4.11 Pipelined operators with SELECT. . . . . . . . . . . . . . . . . . . . 73
4.12 Pipelined operators with Join. . . . . . . . . . . . . . . . . . . . . . . 74
4.13 High arrival rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.14 Query selectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.1 Overview of shared execution in SOLE. . . . . . . . . . . . . . . . . . 82
5.2 Shared join operator in SOLE. . . . . . . . . . . . . . . . . . . . . . . 86
5.3 Pseudo code for receiving a new value of P . . . . . . . . . . . . . . . 87
5.4 Pseudo code for updating P ’s location. . . . . . . . . . . . . . . . . . 88
5.5 All cases of updating P ’s location. . . . . . . . . . . . . . . . . . . . . 89
5.6 Pseudo code for receiving a new query Q. . . . . . . . . . . . . . . . . 90
5.7 Pseudo code for updating a query. . . . . . . . . . . . . . . . . . . . . 91
5.8 All cases of updating Q’s region. . . . . . . . . . . . . . . . . . . . . . 91
5.9 Architecture of self tuning in SOLE. . . . . . . . . . . . . . . . . . . . 92
5.10 Cache area in SOLE. . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.11 Grid Size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
![Page 10: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/10.jpg)
x
Figure Page
5.12 Maximum Number of Supported Queries. . . . . . . . . . . . . . . . . 98
5.13 Data size in the query and cache areas. . . . . . . . . . . . . . . . . . 99
5.14 Response time in SOLE. . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.15 Load Vs. Accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.16 Reduced load for a certain accuracy. . . . . . . . . . . . . . . . . . . 103
5.17 Scalability with Load Shedding. . . . . . . . . . . . . . . . . . . . . . 104
5.18 Performance of Object Load Shedding. . . . . . . . . . . . . . . . . . 105
![Page 11: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/11.jpg)
xi
ABSTRACT
Mohamed F. Mokbel. Ph.D., Purdue University, August, 2005. Scalable ContinuousQuery Processing in Location-aware Database Servers. Major Professors: Walid G.Aref.
The wide spread use of cellular phones, handheld devices, and GPS-like tech-
nology enables location-aware environments where virtually all objects are aware of
their locations. Location-aware environments and location-aware services are char-
acterized by the large number of moving objects and large number of continuously
moving queries (also known as spatio-temporal queries). Such environments call for
new query processing techniques that deal with the continuous movement and fre-
quent updates of both spatio-temporal objects and spatio-temporal queries. This
dissertation, presents novel paradigms and algorithms for efficient processing and
scalable execution of continuous spatio-temporal queries in location-aware database
servers. We introduce a disk-based framework that exploits shared execution and
incremental evaluation paradigms. With shared execution, the problem of evaluating
a set of concurrent continuous queries is abstracted to a spatial join between the set
of moving objects and the set of moving queries. With the incremental evaluation,
rather than performing a repetitive evaluation of continuous queries, we produce
only the updates of the recently reported answer.
For streaming environments, we introduce a generic class of spatio-temporal op-
erators that can be tuned with a set of parameters and methods to act as various con-
tinuous spatio-temporal queries (e.g., range queries and k-nearest-neighbor queries).
The spatio-temporal operators can be combined with other traditional operators
(e.g., join, distinct, and aggregate) to support a wide variety of continuous spatio-
temporal queries. To support scalability in steaming environments, we introduce
![Page 12: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/12.jpg)
xii
a salable operator that shares memory resources among all outstanding continuous
queries. To cope with intervals of high arrival rates of data objects and/or continu-
ous queries, the proposed scalable operator utilizes a self-tuning approach based on
load-shedding where some of the stored objects are dropped from memory.
The experimental evaluation of our disk-based approach compares with recent
scalable approaches and shows the superior performance of our techniques. Also,
we experimentally evaluate our spatio-temporal operators based on a real implemen-
tation inside an open-source data stream management system. The experimental
results show that by delving inside the database engine and providing pipelined op-
erators for continuous spatio-temporal queries, we can achieve performance orders
of magnitude better than other application level algorithms.
![Page 13: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/13.jpg)
1
1 INTRODUCTION
This dissertation studies supporting scalable execution of continuous queries in
spatio-temporal applications. Examples of these applications include location-aware
services, traffic monitoring, and enhanced 911 services. Such applications are charac-
terized by the large number of updates and the large number of continuous queries.
In this chapter, we start by motivating the need for location-aware database
servers in Section 1.1. Next, in Section 1.2, we discuss the challenges that location-
aware environments pose to existing database management systems. Section 1.3
briefly describes the PLACE prototype server; a research prototype for location-
aware database servers. The main contributions of this dissertation are summarized
in Section 1.4. Finally, Section 1.5 outlines the rest of the dissertation.
1.1 Location-aware Database Servers
The wide spread of location-detection devices (e.g., GPS-like devices, RFID’s,
handheld devices, and cellular phones) results in environments where virtually all
objects of interest are aware of their locations. Figure 1.1 shows various forms of GPS
devices that are added to cellular phones, cars, or PDA devices. Such devices enable
new spatio-temporal applications where moving objects with any of these devices
have the ability to change their location continuously over time. Examples of spatio-
temporal applications include location-aware services [1], traffic monitoring [2], and
enhanced 911 service (http://www.fcc.gov/911/enhanced/). In this dissertation, we
mainly focus on location-ware services [1] where a massive amount of spatio-temporal
data are continuously sent from a large number of moving objects (e.g., moving
vehicles in road networks) to location-aware database servers.
![Page 14: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/14.jpg)
2
(a) Cellular GPS (b) Built-in GPS (c) Portable GPS (d) PDA + GPS
Figure 1.1. Location-aware devices.
Location-aware database servers provide new spatio-temporal services (i.e.,
spatio-temporal queries) to their subscribers. Spatio-temporal queries can be ei-
ther snapshot queries or continuous queries. This dissertation focuses on processing
continuous spatio-temporal queries in location-aware database severs. Figure 1.2
gives various examples of continuous spatio-temporal queries that are submitted
to a location-aware database server. Examples of these queries include continuous
moving range queries over moving objects (e.g., “Continuously report the number of
moving cars in the moving highlighted area”), continuous stationary spatio-temporal
range queries over moving objects (e.g., “Alert me if there is any traffic jam in a cer-
tain downtown area), continuous moving spatio-temporal k-nearest-neighbor query
over moving objects (e.g., “Continuously alert me if one of the nearest three mov-
ing aircrafts to my moving aircraft is not a friendly one”), and continuous moving
spatio-temporal k-nearest-neighbor query over stationary objects (e.g., “Continu-
ously, report the three nearest hospitals to my moving ambulance car”).
Unlike traditional snapshot queries, continuous spatio-temporal queries have the
following distinguishing characteristics:
• As continuous queries tend to stay active in the database server for several
hours or days, new continuous queries will be submitted to the same database
server while the old ones are still active. Thus, any continuous query pro-
![Page 15: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/15.jpg)
3
Figure 1.2. Example of continuous spatio-temporal queries submittedto location-aware database servers.
cessor should take into account the large number of concurrent outstanding
continuous queries in location-aware database servers.
• Unlike snapshot queries where the answer is retrieved from the already stored
data objects, the answer of continuous queries is based on the received data
objects. In other words, once a continuous query is submitted to the database
server, it initially has no answer. The answer is progressively constructed by
the arrival of new data objects that satisfy the outstanding continuous query.
• Continuous queries require continuous evaluation as the query result becomes
invalid with the continuous change of location information of the query and/or
the data objects. Furthermore, an object may be added to or removed from
the answer set of a continuous spatio-temporal query. For example, consider
moving vehicles that move in and out of a certain query region.
• Queries as well as data have the ability to continuously change their locations.
Due to this mobility, any delay in processing spatio-temporal queries may result
![Page 16: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/16.jpg)
4
in an obsolete answer. For example, consider a query that asks about moving
objects that lie in a certain region. If the query answer is delayed, the answer
may be outdated where objects are continuously changing their locations.
This dissertation focuses on leveraging traditional database management systems
and data stream management systems (e.g., Nile [3]) to support scalable execution of
continuous spatio-temporal queries in location-aware databases servers. We provide
disk-based and stream-based general frameworks for supporting a wide variety of con-
tinuous queries. In addition stream-based pipelined query operators are proposed as
an attempt to extend data stream management systems to support spatio-temporal
applications.
1.2 New Challenges to Database Systems
Traditional database systems and index structures are optimized for querying
existing data and inserting new data items, respectively. The implicit assumption
is that updates to the database engine are infrequent and have lower priority in
optimization techniques. Thus, having large numbers of updates would dramatically
degrade the performance of both traditional database systems and traditional index
structures.
In a typical location-aware environment, there is a huge number of update trans-
actions. In fact, the rate of updates may highly exceed the rate of submitting queries
to the database server. Also, the rate of updates is much larger than the rate of in-
serting new data items. For example, consider the case of having n moving objects,
all of them are subscribing with the same location-aware database server. Then,
there will be n insert transactions to insert new n moving objects in the location-
aware database server. With each single move (e.g., every 10 seconds) of any of
these objects, a new update transaction will be sent to the location-aware database
server. After only five minutes, each moving objects would send 30 update transac-
tions. Thus, the total number of the update transactions is 30 times the number of
![Page 17: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/17.jpg)
5
insert transactions. Such highly dynamic environment is not supported neither in
traditional database systems nor in traditional index structures.
In general, location-aware environments pose three main challenges to the existing
database management systems:
1. Large number of update transactions. As a result of highly dynamic envi-
ronments, new indexing structures need to deployed that are optimized for
updates.
2. Continuously moving queries. Unlike traditional snapshot queries where they
are evaluated once, continuous moving queries need to continuously updated.
Furthermore, data objects may be added to or removed from the query answer.
Existing database management systems do not support such functionality.
3. Large number of continuous queries. There is plenty of work for multi-query
optimizations in traditional database systems (e.g., see [4, 5]). The main idea
is to explore common parts of different query plans. However, the main focus
was only on snapshot queries. New techniques need to be explored to support
scalable execution of continuous queries.
1.3 The PLACE Prototype Server
This dissertation presents the PLACE prototype server (Pervasive Location-
Aware Computing Environments); a scalable location-aware database server. The
PLACE server extends the Predator database management system [6], the Shore
storage manager [7], and the Nile [3] data stream management system to support
scalable execution of continuous spatio-temporal queries over spatio-temporal data
and spatio-temporal data streams. The PLACE server aims to bridge the areas of
spatio-temporal databases and data stream management systems. In general, the
the PLACE server has the following distinguishing characteristics:
![Page 18: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/18.jpg)
6
X
Y
X
Y
(b) Snapshot at time T1
8P7P
2P
0(a) Snapshot at time T
Q
7P6P
5P1P
2P
8P
9P
5Q
3Q
4Q4P
1
3
P
Q
2Q2Q
1P5P
1Q
6P
4Q3P
4P 5Q
9P
3
Figure 1.3. Incremental evaluation of range queries.
1. Scalability in terms of supporting large numbers of moving objects.
Such scalability is achieved by optimizing the scarce memory resource to store
only either the recently moved objects or those objects that are of interest
to at least one outstanding continuous query. Further scalability is achieved
through employing load shedding mechanisms that aim to sacrifice some of the
in-memory objects to support larger number of queries, yet with an approxi-
mate answer.
2. Scalability in terms of supporting large numbers of continuous
spatio-temporal queries. Such scalability is achieved through employing
a shared execution paradigm among the concurrently outstanding continuous
queries. The main idea is to abstract the problem of executing multiple con-
tinuous spatio-temporal queries into a spatio-temporal join operation. The
inputs to the join operation are two streams; a stream of continuously moving
objects and a stream of continuously moving queries. Furthermore, concur-
rently outstanding queries share the same in-memory buffers and disk-based
data structures.
3. Incremental evaluation. Rather than performing a repetitive evaluation
of continuous queries, the PLACE continuous query processor employs an in-
cremental evaluation paradigm that continuously update the query answer.
![Page 19: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/19.jpg)
7
The PLACE continuous query processor distinguishes between two types of
updates; namely positive and negative updates. A positive update indicates
that a certain object needs to be added to the query answer. Similarly, a
negative update indicates that a certain object needs to be removed from the
query answer. Figure 1.3 gives an example of applying the concepts of posi-
tive and negative updates on a set of continuous range queries. The snapshot
of the database at time T0 is given in Figure 1.3a with nine moving objects,
p1 to p9, and five continuous range queries, Q1 to Q5 in the two-dimensional
space. The answer of the queries at time T0 is represented as (Q1, P5), (Q2, P1),
(Q3, P6, P7), (Q4, P3, P4), and (Q5, P9). At time T1 (Figure 1.3b), only the ob-
jects p1, p2, p3, and p4 and the queries Q1, Q3, and Q5 change their locations.
As a result, the PLACE server reports only the following updates: (Q1,−P5),
(Q3,−P6), (Q3, +P8), and (Q4,−p4).
4. Supporting Wide Variety of Continuous Spatio-temporal Queries.
The PLACE continuous query processor provides a general framework that
supports a wide variety of continuous spatio-temporal queries that include con-
tinuous spatio-temporal range queries, continuous spatio-temporal k-nearest-
neighbor queries, continuous aggregate queries, and continuous future queries.
Furthermore, the PLACE continuous query processors supports both station-
ary and moving queries with the same performance.
5. Spatio-temporal operators. The PLACE continuous query processor goes
beyond the idea of implementing high level algorithms for continuous spatio-
temporal queries. Instead, the PLACE server encapsulates the spatio-temporal
query algorithms into a set of primitive spatio-temporal pipelined operators
that can be part of a larger query plan. Having a set of primitive spatio-
temporal operators results in greatly enhancing the query performance by
pushing the spatio-temporal operators to the bottom of the query pipeline
![Page 20: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/20.jpg)
8
and in having flexible query optimizers where multiple candidate query plans
can be produced.
By subscribing with the PLACE server, moving objects are required to send
their location updates periodically to the PLACE server. A location update from
the client (moving object) to the server has the format (OID, x, y), where OID is the
object identifier, (x, y) is the location of the moving object in the two-dimensional
space. An update is timestamped upon its arrival at the server side. Once an object
P stops moving (e.g., P reaches its destination or P is shut down), either P sends
an explicit disappear message to the server or the server will timeout due to not
receiving any updates from P for a certain time TT imeout. In both cases, the server
recognizes that object P is no further moving.
Due to the highly dynamic nature of location-aware environments and the infinite
size of incoming spatio-temporal streams, we cannot store all incoming data. Thus,
the PLACE server employs a three-level storage hierarchy. First, a subset of the
incoming data streams is stored in in-memory buffers. In-memory buffers are associ-
ated with the outstanding continuous queries at the server. Each query determines
which tuples are needed to be in its buffer and when these tuples are expired, i.e.,
deleted from the buffer. Second, we keep an in-disk storage that keeps track with
only one reading of each moving object and query. Since, we cannot update the disk
storage every time we receive an update from moving objects, we sample the input
data by choosing every kth reading to flush to disk. Moreover, we cache the readings
of moving objects/queries and flush them once to the secondary storage every T
time units. Data on the secondary storage are indexed using a simple grid structure.
Third, every Tarchive time units, we take a snapshot of the in-disk database and flush
it to a repository server. The repository server acts as a multi-version structure of
the moving objects that supports historical queries. Stationary objects (e.g., gas
stations, hospitals, restaurants) are preloaded to the system as relational tables that
are infrequently updated.
![Page 21: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/21.jpg)
9
Figure 1.4. Server GUI.
Figures 1.4 and 1.5 give snapshots of the server and client graphical user inter-
face (GUI) of PLACE, respectively. The server GUI displays all moving objects on
the map1. The client GUI simulates a client end-device used by the users. Users
can choose the type of query from a list of available query types that include sta-
tionary range queries, moving range queries, stationary k-nearest-neighbor queries,
and moving k-nearest-neighbor queries.The spatial region of the query can be de-
termined using the map of the area of interest. By pressing the submit button, the
client translates the query into SQL language and transmits it to the PLACE server.
The result appears both in the list of Figure 1.5 and as moving objects on the map.
A client can see only, on its map, the objects that belong to its issued query.
1The map in Figures 1.4 and 1.5 is for the Greater Lafayette area, Indiana, USA.
![Page 22: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/22.jpg)
10
Figure 1.5. Client GUI.
1.4 Contributions
The main contributions of this dissertation are as follows:
• We introduce a disk-based framework for scalable execution of multiple con-
current continuous spatio-temporal queries. The proposed framework employs
two main paradigms; a shared execution paradigm as a means of achieving scal-
ability and an incremental evaluation paradigm to avoid repetitive evaluation
of continuous spatio-temporal queries.
• We introduce the first attempt to furnish data stream management systems
by a set of primitive spatio-temporal pipelined query operators. The proposed
spatio-temporal operators can be combined with traditional query operators to
provide a wide set of continuous spatio-temporal queries over spatio-temporal
data streams.
• We introduce a stream-based scalable pipelined query operator for evaluating
large numbers of concurrent continuous spatio-temporal queries over spatio-
![Page 23: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/23.jpg)
11
temporal data streams. The proposed scalable operator takes two streams as
its input; a stream of moving objects and a stream of moving queries.
• To cope with time intervals of high workloads in the number of moving objects
and/or the number of moving queries, we introduce load shedding mechanisms
that aim to reduce the memory load while guaranteeing the query accuracy to
be above a certain threshold.
The proposed query operators are evaluated based on a real implementation
inside the query engine of a research prototype of a data stream management system.
The performance results of all the proposed algorithms and operators validate our
approaches. Experimental results are based on using synthetic date of moving objects
on a real road networks.
1.5 Summary and Outline
In this chapter, we motivated the need for having location-aware database servers
along with the challenges it poses to existing database management systems. Then,
we briefly highlight the PLACE server; our location-aware database research pro-
totype server. We briefly summarized our contributions in supporting continuous
query processing in location-aware database servers through providing disk-based
and stream-based frameworks and pipelined query operators that can be plugged
into existing database and data stream management systems.
The rest of this dissertation is organized as follows. Chapter 2 points out the
challenges we face in building the PLACE server along with the related work to each
challenge. Also, Chapter 2 classifies continuous spatio-temporal queries based on
their time domain and the mutability of both continuous queries and data objects.
In Chapter 3, we present SINA; our proposed disk-based framework for achieving scal-
able and incremental evaluation of continuous spatio-temporal queries. Chapter 3
also provides the correctness proof of SINA in terms of completeness, uniqueness, and
progressiveness. Chapter 4 presents a family of in-memory stream-based pipelined
![Page 24: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/24.jpg)
12
query operators that can be combined with traditional query operators to support a
wide variety of continuous spatio-temporal queries. In Chapter 5, we propose SOLE;
a scalable pipelined query operator for evaluating a set of spatio-temporal contin-
uous queries over spatio-temporal streams. In addition, Chapter 5 introduces load
shedding mechanisms to cope with time intervals of high workloads in the number
of objects and/or the number of continuous spatio-temporal queries. Experiment
evaluation of the SOLE framework based on a real implementation inside a data
stream management research prototype is provided in Chapter 5. Finally, Chapter 6
concludes this dissertation and points out to future research directions.
Parts of this dissertation have been published in workshops, conferences, and jour-
nals. The disk-based scalable incremental framework for continuous spatio-temporal
queries have been published in ACM-SIGMOD-2004 [8]. Stream-based query op-
erators have been published in MDM-2005 [9]. Vision and overview papers of the
PLACE prototype server have been published in the ACM-GIS Symposium [1], the
ICDE PhD Workshop [10], the STDBM workshop [11], and the GeoInformatica Jour-
nal [12]. The PLACE prototype system has been demonstrated in VLDB-2004 [13].
![Page 25: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/25.jpg)
13
2 CHALLENGES AND THEIR RELATED WORK
In this chapter, we start by presenting a thorough classification of continuous spatio-
temporal queries. Then, we present a set of challenges that we face in developing the
PLACE location-aware database server. With each challenge, we briefly highlight
its related work and the PLACE approach of dealing with it.
This chapter is organized as follows: Section 2.1 classifies continuous spatio-
temporal queries based on both the temporal dimension and the mutability of objects
and queries. In Section 2.2 we discuss the PLACE approach in dealing with massive
size of incoming spatio-temporal data. Section 2.3 presents the related work in
continuous spatio-temporal query processing along with the PLACE approach in
tackling such continuous evaluation. The scalability of the PLACE location-aware
server is discussed in Section 2.4 along with the related work. Section 2.5 presents
the need for having a general framework that support a wide variety of continuous
spatio-temporal queries. Finally, Section 2.6 summarizes this chapter.
2.1 Spatio-temporal Query Classification
There is a wide variety of continuous spatio-temporal queries. In this section,
we provide two classifications of continuous spatio-temporal queries based on the
temporal dimension and the mutability of both objects and queries. With each
classification, we give an example along with related work that deals with such
query. As the first classification is based on the query time, spatio-temporal queries
can be classified as:
• Historical Spatio-temporal Queries. Historical queries ask about the past
data. An example of historical queries is ”Find the locations of a certain
object between 7 AM and 8 AM today”. A continuous version of this query is:
![Page 26: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/26.jpg)
14
”Continuously, Find the locations of a certain object in the last hour”. In this
case, the continuous query time interval (last hour) is a sliding time window.
To support historical queries, a location-aware server needs to store and index
all the incoming locations of moving objects. Examples of spatio-temporal
indexing techniques that support historical queries include the HR-tree [14],
the HR+-tree [15], the TB-tree [16], the MV3R-tree [15], and SETI [17].
• NOW Spatio-temporal Queries. NOW queries are interested only on the
current locations of moving objects. An example of a NOW query is “Based
on my current location, what is the nearest gas station?”. Due to the highly
dynamic environment that is supported by location-aware servers, dealing with
NOW queries is challenging. To answer NOW queries, a location-aware server
needs to keep track of the latest locations of all moving objects. Examples
of spatial access methods that support NOW queries include hashing [18],
the VCI-Index [19], the Q-Index [19], the LUR-tree [20], and the frequently
updated R-tree [21].
• Future Spatio-temporal Queries. Future queries are interested in predict-
ing the locations of moving objects. Additional information (e.g., the velocity
or destination) need to be sent from the moving objects to the location-aware
server. An example of a future query is ”Alert me if a non-friendly airplane
is going to cross a certain region in the next 30 minutes”. Notice that in
this query, the alert is sent before the actual event happens, hence, is termed
a future or predicting query. Examples of spatio-temporal access methods
that support future queries include the TPR-tree [22], the REXP -tree [23], the
TPR*-tree [24]), and STRIPES [25].
The second classification of spatio-temporal queries is based on the mutability of
both objects and queries. Thus, continuous spatio-temporal queries can be classified
as:
![Page 27: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/27.jpg)
15
• Stationary Queries on Moving Objects. In this category, the query re-
gions are stationary, while objects are moving. Example of these queries include
”How many trucks are within the city boundary?” and ”Find the nearest 100
taxis to a certain hotel”. In these queries, the query regions (city boundary
and hotel neighborhood) are fixed, while the objects of interest (trucks and
cars) are moving. Two approaches have been proposed to support continuous
fixed queries. The first approach is to index the moving object with a spatio-
temporal access method [22–24]. The second approach is to index the fixed
queries with a spatial access methods [19, 26].
• Moving Queries on Stationary Objects. In this category, query regions
are moving, while objects are stationary. An example of this category is ”As I
am moving in a certain trajectory, show me all gas stations within 3 miles of
my location”. This category of queries employ traditional methods to organize
the fixed objects (e.g., fractals [27–29] or R-trees [30]). Efficient algorithms
that utilize the R-tree are proposed for the continuous single nearest-neighbor
queries [31] and the continuous K-nearest neighbor queries [32].
• Moving Queries on Moving Objects. In this category, both query regions
and objects are moving. An example of such queries is ”As I (the sheriff) am
moving in the space, make sure that the number of police cars within 3 miles
of my location is more than a certain threshold”. In this case, the query region
is moving. Also, the objects of interest (police cars) are moving. To support
moving queries in a location-aware server, moving objects need to be indexed
using a TPR-tree like structure (e.g., [22–24]). Then, special algorithms are
developed to process moving queries in TPR-tree-like structures.
The previous classifications can be applied to any kind of continuous or snap-
shot spatio-temporal queries (e.g., range queries, k-nearest-neighbor queries, reverse-
nearest-neighbor queries [33, 34], aggregate queries [35,36]). In this dissertation, we
go beyond the idea of having tailored algorithms and data structure for each spe-
![Page 28: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/28.jpg)
16
cialized query type. Instead, we provide a general framework that supports all the
mutability combinations for continuous spatio-temporal queries. For simplicity, we
present our proposed algorithms and data structures on the context of NOW queries.
The extension to the case of future queries is straightforward. Continuous histor-
ical queries (i.e., sliding window queries) are discussed extensively in data stream
applications (e.g., [37–41]) and it is beyond the scope of this dissertation.
2.2 Challenge I: Massive Size of Incoming Spatio-temporal Data
Existing continuous query processors for spatio-temporal databases assume ex-
plicitly that all incoming data can be indexed and/or stored in the secondary stor-
age. A wide variety of spatio-temporal access methods (e.g., see [42] for a survey)
has been introduced to deal with massive sizes of spatio-temporal data. With the
highly dynamic nature in location-aware environments, several attempts (e.g. the
LUR-tree [20], the FUR-tree [21], and the CTR-tree [43]) are proposed to tune tra-
ditional index structures to support frequent updates. Traditional index structures
are optimized for answering queries and inserting new data not for supporting fre-
quent updates. The Lazy Update R-tree [20] (LUR-tree) aims to handle the frequent
updates of moving objects without degrading the performance of the R-tree index
structure. The main idea is that as long as the new position of a moving object lies
inside its minimum bounding rectangle (MBR), there is no action taken other than
updating the position. Once an object moves out from its MBR, two approaches are
proposed: (1) The object is deleted and is reinserted causing the necessary merge and
split operations. (2) If the object does not move very far from the MBR, the MBR
can be extended to enclose the new location. The Frequently Updated R-tree [21]
(FUR-tree) extends the idea of the LUR-tree by investigating several bottom-up ap-
proaches to accommodate the frequent updates of the moving objects. Examples of
these approaches include extending the MBR to enclose the new value and moving
the current object to one of the siblings. While both the LUR-tree and the FUR-
![Page 29: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/29.jpg)
17
tree have constraints about the pattern of object movement, the Change-Tolerant
R-tree [43] does not have any restrictions about the object movement.
The common objective among the LUR-tree, the FUR-tree, and the CTR-tree
is to push the limits of traditional R-trees to provide efficient support of updating
existing data. However, these index structures are valid only for moderate update fre-
quencies. For really high dynamic environments, the performance of all R-tree mod-
ifications degrades dramatically. In fact, for even high arrival rates for update (e.g.,
data streaming environments), only in-memory algorithms are feasible. Although
there is extensive work in querying streaming data (e.g., see [37,38,44–49]), there is
limited work that exploits the spatial and/or temporal properties of data streams.
The spatial properties of data streams are addressed recently in [50, 51] to solve
geometric problems, e.g., computing the convex hull [51]. In [52], spatio-temporal
histograms are used as synopses for approximate query processing on spatio-temporal
data streams. Up to our knowledge, there is no existing work that addresses contin-
uous query processing for spatio-temporal streams.
The PLACE approach. To deal with the massive size of incoming data to the
PLACE server, we employ two techniques: First, a disk-based technique that aims
to index both frequently updated data objects and moving queries using a simple
grid structure. Grid structures are simple to update compared to the necessary split
and merging procedures for updating R-tree-like structures. Second, for the case of
streaming data, we employ in-memory techniques that limit the focus of the PLACE
query processor to only those objects that are of interest to at least one outstanding
continuous query. This is in contrast to existing streaming engines where they use
the sliding-window query model to limit the focus of the query processing engine to
only the recent history (e.g., see [38, 41, 46, 47, 49, 53, 54]). Our proposed model is
different than the sliding window query model in two aspects: (1) We are interested
in querying the current locations of moving objects, which is not supported by the
sliding window queries where they can support only historical queries, (2) In our
![Page 30: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/30.jpg)
18
model, data objects are expired in a random order, rather than first-in-first-expert
model that used in sliding-window queries.
2.3 Challenge II: Repetitive Evaluation of Continuous Queries
Most of the existing techniques in spatio-temporal databases abstract any contin-
uous query into a series of snapshot queries executed at different interval times. Dif-
ferent approaches aim to support various optimizations of the time interval between
any two consecutive evaluations of the continuous query. Mainly, three different
approaches have been investigated:
1. The validity of the results [55,56]. With each query answer, the server returns a
valid time [56] or a valid region [55] of the answer. The valid time and the valid
region indicate the temporal and the spatial validity of the returned answer,
respectively. Once the valid time is expired or the client goes out of the valid
region, the client resubmits the continuous query for complete reevaluation.
2. Caching the results [32,57]. The main idea is to resubmit the continuous query
every fixed time interval T . The recent query result is cached either in the client
side [32] or in the server side [57]. Upon resubmission, the previously cached
results are used to prune the search for the new results of k-nearest-neighbor
queries [32] and range queries [57].
3. Precomputing the result [31,57]. If the trajectory of query movement is known
apriori, then by using computational geometry for stationary objects [31] or
velocity information for moving objects [57], we can identify which objects will
be nearest-neighbors [31] to or within a range [57] from the query trajectory.
However, if the trajectory information changes, then the query needs to be
reevaluated.
The PLACE approach. It is clear that query reevaluation consumes system re-
sources by doing redundant query processing. In PLACE, we avoid having query
![Page 31: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/31.jpg)
19
reevalaution. Instead, we employ an incremental evaluation paradigm. With incre-
mental evaluation, only the changes to the query answer are evaluated and sent to
the user. A distinguished characteristic of continuous spatio-temporal queries is that
we need to have the ability to remove some parts of the query answer (e.g., an object
moves out of the range query). This feature is not available in traditional contin-
uous queries where the query answer is append-only. Thus, incremental evaluation
in PLACE indicate continuously updating the query answer by a set of positive and
negative updates. A positive update indicates the addition of a certain object to the
query answer. Similarly, a negative update indicates the removal of a certain object
from the query answer.
2.4 Challenge III: Large Numbers of Concurrent Continuous Queries
Most of the existing spatio-temporal algorithms focus on evaluating only one
outstanding continuous spatio-temporal query (e.g., see [31–33, 55–59]). Since the
continuous query stays active at the server side for a long time, then it is highly
likely that new continuous queries will be submitted to the server while there are
existing active queries. Dealing with each outstanding query as a single thread
will easily consume the system resources. Optimization techniques for evaluating
a set of continuous concurrent spatio-temporal queries are recently addressed for
centralized [19] and distributed environments [60, 61]. Techniques in distributed
environments assume that clients have computational and storage capabilities to
share the query processing with the server. The main idea of [60,61] is to ship some
part of the query processing down to the moving objects, while the server mainly
acts as a mediator among moving objects. This assumption is not always realistic.
In many cases, clients use cheap, low battery, and passive devices that do not have
any computational or storage capabilities. While [60] is limited to stationary range
queries, [61] can be applied for both moving and stationary queries. The Q-Index [19]
considers the case of centralized environments where there is no client overhead. The
![Page 32: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/32.jpg)
20
main idea of the Q-index is to build an R-tree-like index structure on the queries
instead of the objects. Then, at each time interval T , moving objects probe the
Q-index to find the queries they belong to. The Q-index is limited in two aspects:
(1) It performs reevaluation of all the queries (through the R-tree index) every T
time units. (2) It is applicable only for stationary queries. Moving queries would
spoil the Q-index and hence dramatically degrade its performance.
The PLACE approach. In the PLACE server, we exploit the shared execution
paradigm as a means of achieving scalability for concurrently executing continuous
spatio-temporal queries. The main idea is to group similar queries in a query table.
Then, the evaluation of a set of continuous spatio-temporal queries is abstracted as
a spatio-temporal join between the moving objects and the moving queries. Sim-
ilar ideas of shared execution have been exploited in different contexts (e.g., the
NiagaraCQ [62] for web queries, PSoup [63], and [38] for streaming queries).
2.5 Challenge IV: Wide Variety of Continuous Queries
A major challenge for spatio-temporal continuous query processors is the wide
variety of spatio-temporal query types (e.g., see Section 2.1). Most of the existing
approaches are lacking generality where they focus only on solving special cases of
continuous spatio-temporal queries. For example, [31,32,55,56] focus only on moving
queries on stationary objects. These techniques are not applicable in case of having
moving objects. Similarly, [19, 35, 60, 61] focus only on stationary range queries on
moving objects. Also, these techniques cannot support the concept of moving queries.
Other work focuses on aggregate queries (e.g., see [35, 36, 52]), k-NN queries (e.g.,
see [32,58]), and reverse nearest-neighbor queries [33]. From a system point of view,
it is cumbersome to implement various techniques with different data structures to
support each query type with a certain specialized algorithm.
The PLACE approach. In the PLACE server, we develop disk-based and memory-
based general frameworks that can support a wide variety of continuous range queries
![Page 33: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/33.jpg)
21
that include range queries, k-nearest-neighbor queries, aggregate queries, and future
queries. Also, any mutability combination of both objects and queries are handled
within the same general framework.
2.6 Summary
In this chapter, we presented two classifications of continuous spatio-temporal
queries. The first classification is based on the temporal domain. In this classifica-
tion, continuous spatio-temporal queries are either historical queries, now queries,
or future queries where the continuous queries are concerned with past, current, or
future data, respectively. The second classification is based on the mutability of
both objects and queries. In this classification, continuous spatio-temporal queries
are either stationary queries on moving objects, moving queries on stationary objects,
or moving queries on moving objects. Also in this chapter, we have discussed four
main challenges in realizing location-aware database servers. These challenges are:
(1) Dealing with massive data sizes, (2) Continuous evaluation of continuous queries,
(3) Supporting large number of concurrent continuous spatio-temporal queries, and
(4) Supporting a wide variety of continuous spatio-temporal queries. With each
challenge, we have discussed its related work and we briefly outlined the PLACE
approach in dealing with it.
![Page 34: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/34.jpg)
22
3 DISK-BASED SPATIO-TEMPORAL CONTINUOUS QUERY PROCESSING
In this chapter, we introduce the Scalable INcremental hash-based Algorithm (SINA,
for short) for continuously evaluating a dynamic set of continuous spatio-temporal
queries. SINA exploits two main paradigms: Shared execution and incremental eval-
uation. By utilizing the shared execution paradigm, continuous spatio-temporal
queries are grouped together and joined with the set of moving objects. By uti-
lizing the incremental evaluation paradigm, SINA avoids continuous reevaluation of
spatio-temporal queries. Instead, SINA updates the query results every T time units
by computing and sending only updates of the previously reported answer. We dis-
tinguish between two types of query updates: Positive updates and negative updates.
Positive updates indicate that a certain object needs to be added to the result set
of a certain query. In contrast, negative updates indicate that a certain object is
no longer in the answer set of a certain query. As a result of having the concept of
positive and negative updates, SINA achieves two goals: (1) Fast query evaluation,
since SINA computes only the update (change) of the answer not the whole answer.
(2) In a typical spatio-temporal application (e.g., location-aware services and traffic
monitoring), query results are sent to customers via satellite servers [64]. Thus, lim-
iting the amount of data sent to the positive and negative updates only rather than
the whole query answer saves in network bandwidth.
SINA is a general framework that deals with all mutability combinations of ob-
jects and queries. Thus, it is applicable to stationary queries on moving objects, mov-
ing queries on stationary objects, and moving queries on moving objects. For sim-
plicity, we present SINA in the context of continuous spatio-temporal range queries.
However, as will be discussed in Section 3.3, SINA is applicable to a broad class of
continuous spatio-temporal queries (e.g., nearest-neighbor and aggregate queries).
SINA is proved to be correct with respect to the following: (a) Completeness, i.e.,
![Page 35: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/35.jpg)
23
all query results will be produced by SINA. (b) Uniqueness, i.e., SINA produces
duplicate-free results. (c) Progressiveness, i.e., SINA reports only the updates of
the previously reported answer. In contrast to previously proposed approaches (e.g.,
see [19, 21]) that rely mainly on R-tree-like data structures, SINA employs a simple
disk-based grid data structure that is shared among all moving objects and moving
queries. The main idea of using a grid structure is to be applicable to the highly
dynamic nature of location-aware environments where there are large number of
updates. The update cost of the grid structure is much simpler than that of the
R-tree-like data structures. Experimental result show the SINA outperforms other
R-tree-based algorithms (e.g., Q-index [19] and Frequently Updated R-tree [21]).
The rest of the chapter is organized as follows: Section 3.1 introduces the concept
of shared execution for a group of spatio-temporal queries. Section 3.2 itnroduces
the Scalable INcremental hash-based Algorithm (SINA). The extensibility of SINA
to a variety of continuous spatio-temporal queries and to handle clients that are
disconnected from the server for short periods of times is discussed in Section 3.3.
The correctness proof of SINA is given in Section 3.4. Section 3.5 provides an
extensive list of experiments to study the performance of SINA. Finally, Section 3.6
summarizes this chapter.
3.1 Shared Execution of Continuous Spatio-temporal Queries
SINA exploit the shared execution paradigm as a means of achieving scalability
for concurrently executing continuous spatio-temporal queries. The main idea is to
group similar queries in a query table. Then, the evaluation of a set of continuous
spatio-temporal queries is abstracted as a spatial join between the moving objects
and the moving queries.
Figure 3.1a gives the execution plans of two simple continuous spatio-temporal
queries, Q1: ”Find the objects inside region R1”, and Q2: ”Find the objects inside
region R2”. Each query performs a file scan on the moving object table followed by
![Page 36: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/36.jpg)
24
Q1 Q2
R1
R2
Q2Q1
Select ID Where
location inside R1
Select ID Where
location inside R2
File Scan File Scan File Scan File Scan
SpatialJoin
Moving Objects Moving Objects Moving Objects Moving Queries
(a) Local query plan for two range queries (b) A global shared plan for two range queries
Figure 3.1. Shared execution of continuous queries.
a selection filter. With shared execution, we have the execution plan of Figure 3.1b.
The table for moving queries contains the regions of the range queries. Then, a
spatial join is performed between the table of objects (points) and the table of
queries (regions). The output of the spatial join is split and is sent to the queries.
For stationary objects (e.g., gas stations), the spatial join can be performed using
an R-tree index [30] on the object table. Similarly, if the queries are stationary, the
Q-index [19] can be used for query indexing. However, if both objects and queries
are highly dynamic, the R-tree and Q-index structures result in poor performance.
To avoid this drawback, we can follow one of two approaches: (1) Utilize the tech-
niques of frequently updating R-tree (e.g., see [20, 21]) to cope with the frequent
updates of moving objects and moving queries. (2) Use a spatial join algorithm that
does not assume the existence of any indexing structure. Our proposed Scalable IN-
cremental hash-based Algorithm (SINA) utilizes the second approach. Experimental
results, given in Section 3.5, compare SINA with the first approach and highlights
the drawbacks and advantages of each approach.
3.2 The SINA Framework
The main idea of the Scalable INcremental hash-based Algorithm (SINA) is to
maintain an in-memory table, termed Updated Answer, that stores the positive and
![Page 37: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/37.jpg)
25
n−1Q
nQ
2Q1 Q. . . . . .
. . . . . .
Positive Updates Negative and
Positive Updates
HashingMemory−Disk
Join
Phase I: Hashing
Disk
Phase II: Invalidation Phase III: Joining
Invalidation
Memory Fullor
Timeout
Incremental
Result
Negative Updates
DONE
DONEan
d m
ov
ing
qu
erie
sS
trea
ms o
f m
ov
ing
ob
jects Tim
eoutor
Mem
ory fu
ll
result to the queries
Send the incremental
In−memory
Figure 3.2. State diagram of SINA.
negative updates during the course of execution to be sent to the clients. Positive
updates indicate that a certain object needs to be added to the query results. Sim-
ilarly, negative updates indicate that a certain object needs to be removed from
the previously reported answer. Entries in the Updated Answer table have the form
(QID, Update List(±,OID)) where QID is the query identifier, the Update List is a
list of OIDs (object identifiers) and the type of update (+ or −). To reduce the size
of the Updated Answer table, negative updates may cancel previous positive updates
and vice versa. SINA sends the set of updates to the appropriate queries every T
time units.
SINA has three phases: The hashing, invalidation, and joining phases. Figure 3.2
provides a state diagram of SINA. The hashing phase is continuously running where
it receives incoming information from moving objects and moving queries. While
tuples arrive, an in-memory hash-based join algorithm is applied between moving
objects and moving queries. The result of the hashing phase is a set of positive
![Page 38: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/38.jpg)
26
XX
Y Y
4
P3Q4
P6 7
1P
PP6
Q
7
P3
P
5
P2
Q3
P
P
1P
2Q
1
5
8
P
Q
P5
9
P
8P
2
P9P
5Q
3Q
4Q4P
1
(b) Snapshot at time T1(a) Snapshot at time T0
2Q
Q
Figure 3.3. Example of range spatio-temporal queries.
updates added to the Updated Answer table. The invalidation phase is triggered
every T time units or when the memory is full to flush in-memory data into disk. The
invalidation phase acts as a filter for the joining phase where the invalidation phase
reports negative updates of some objects to save their processing in the joining phase.
The joining phase is triggered by the end of the invalidation phase to perform a join
between in-memory moving objects and queries with in-disk stationary objects and
queries. The joining phase results in reporting both positive and negative updates.
Once the joining phase is completed, the positive and negative updates are sent to
the users that issued the continuous queries.
Throughout this section, we use the example given in Figure 3.3 to illustrate the
ideas and execution of SINA. Figure 3.3a gives a snapshot of the database at time T0
with nine moving objects, p1 to p9, and five continuous range queries, Q1 to Q5. At
time T1 (Figure 3.3b), only the objects p1, p2, p3, and p4 and the queries Q1, Q3, and
Q5 change their locations. The old query locations are plotted with dotted borders.
Black objects are stationary, while white objects are moving.
We use the term ”moving” object/queries at time Ti to indicate the set of ob-
jects/queries that report a change of information from the last evaluation time
Ti−1. Moving objects and queries are stored in memory for the evaluation time
![Page 39: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/39.jpg)
27
Ti. Similarly, we use the term ”stationary” objects/queries to indicate the set of ob-
jects/queries that did not report any change of information from the last evaluation
time Ti−1. Stationary objects and queries are stored in disk at the evaluation time
Ti. Notice that stationary objects/queries at time Ti may become moving objects
and queries at time Ti+1 and vice versa.
3.2.1 Phase I: Hashing
Data Structure. The hashing phase maintains two in-memory hash tables,
each with N buckets for sources P and R that correspond to moving objects (i.e.,
points) and moving queries (i.e., rectangles), respectively. In addition, for the moving
queries, we keep an in-memory query table that keeps track of the corresponding
buckets of the upper-left and lower-right corners of the query region. In the following,
we use the symbols Pk and Rk to denote the kth bucket of P and R, respectively.
Algorithm. Figures 3.4 and 3.5 provide an illustration and pseudo code of the
hashing phase, respectively. Once a new moving object tuple t with hash value
k = hP (t) is received (Step 2 in Figure 3.5), we probe the hash table Rk for moving
queries that can join with t (i.e., contain t) (Step 2b in Figure 3.5). For the queries
that satisfy the join condition (i.e., the containment of the point objects in the query
region), we add positive updates to the Updated Answer table (Step 2c in Figure 3.5).
Then, we store t in the hash bucket Pk (Step 2d in Figure 3.5). Similarly, if a
moving query tuple t is received, we probe all the hash buckets of P that intersect
with t. For the objects that satisfy the join condition, we add positive updates to
the Updated Answer table (Step 4b in Figure 3.5). Then, the tuple t is clipped and
is stored in all the R buckets that t overlaps. Finally, to keep track of the list of
buckets that t intersects with, we store t in the in-memory query table with two
bucket numbers; the upper-left and the lower-right (Step 5 in Figure 3.5).
Example. In the example of Figure 3.3, the hashing phase is concerned with
objects and queries that report a change of location in the time interval [T0, T1].
![Page 40: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/40.jpg)
28
RStreamPStream
..... ..... ..... .....
P R
Memory
Hash table for Hash table for
Query Table
P R
1 2 k N 1 2 k N
(2)(1)
(1)(2)
(3)
Incremental join results
h (P) h (R)
of moving objects of moving queries
Figure 3.4. Phase I: Hashing.
Thus, the objects p1, p2, p3, p4 are joined with the queries Q1, Q3, Q5. Only the
positive update (Q3, +p2) is reported.
Discussion. The hashing phase is designed to deal only with memory, thus,
there is no I/O overhead. Joining the in-memory data with the in-disk objects and
queries is to be performed at the joining phase. The fact that the hashing phase
performs in-memory join within the hashing process enables sending early and fast
results to the users. In many applications, it is desirable that the user have early and
fast partial results, sometimes at the price of slightly increasing the total execution
time. Similar ideas for in-memory hash-based join have been studied in the context
of non-blocking join algorithms, e.g., the symmetric hash join [65], XJoin [66], the
hash-merge join [67], and the RPJ [68].
3.2.2 Phase II: Invalidation
Data Structure. Figure 3.6 sketches the data structures used in the invalida-
tion phase. The invalidation phase relies on partitioning the two-dimensional space
![Page 41: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/41.jpg)
29
Procedure HashingPhase(tuple t, source (P/ R))
Begin
1. If there is not enough memory to accommodate t, start the InvalidationPhase(),
return
2. If (source==P) //Moving object
(a) k = the hash value hP (t) of tuple t.
(b) Sq = Set of queries from joining t with queries in Rk
(c) For each Q ∈ Sq, add (Q,+t) to Updated Answer
(d) Store t in Bucket Pk
(e) return
3. Sk = Set of buckets result from hash function hR(t)
4. For each bucket k ∈ Sk
(a) So = Set of objects from joining t with objects in Pk
(b) For each O ∈ So, add (t,+O) to Updated Answer
(c) Store a clipped part of t in Bucket Rk
5. Store t in the query table
End.
Figure 3.5. Pseudo code of the Hashing phase
into N × N grid cells1. Objects and queries are stored in grid cells based on their
locations. To handle skewed data distribution of objects and queries, we employ
similar techniques as in [70] where we map grid cells into smaller size tiles in a round
1For simplicity, we present SINA in the context of a disk-based grid. However, the uniform gridcan be substituted by more sophisticated structures e.g., the FUR-tree [21] or quad-tree-like struc-tures [69].
![Page 42: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/42.jpg)
30
Grid IndexObjects: (OID, Location, Timestamp, Qlist)
Queries(QID, Region, Timestamp, OList)
Object Index
(OID, Pointer) (QID,Pointer I, Ponter II)
Query Index
Figure 3.6. Phase II: Invalidation.
robin fashion. Tiles are directly mapped to disk-based pages. An object entry O
has the form (OID, loc, t, QList), where OID is the object identifier, loc is the re-
cent location of the object, t is the timestamp of the recently reported location loc,
and QList is the list of the queries that O is satisfying. A query Q is clipped to
all grid cells that Q overlaps with. For any grid cell C, a query entry Q has the
form (QID, region, t, OList), where QID is the query identifier, region is the recent
rectangular region of Q that intersects with C, t is the timestamp of the recently
reported region, and OList is the list of the objects in C that satisfy Q.region. In
addition to the grid structure, we keep track of two auxiliary data structures; the
object index and the query index. The object and query indexes are indexed on the
OID and QID, respectively, and are used to provide the ability for searching the
old locations of moving objects and queries given their identifiers.
Algorithm. The pseudo code of the invalidation phase is given in Fig-
ures 3.7, 3.8, and 3.9. The invalidation phase starts by flushing the non-empty
buckets that contain moved objects (Step 1 in Figure 3.7) and moved queries (Step 2
in Figure 3.7) into the corresponding grid cells in disk. Figure 3.8 gives the pseudo
code of invalidating a moving object Mo that is mapped into grid cell Gk. If there is
an old entry of Mo in Gk, this means that Mo did not cross a cell boundary. Thus,
![Page 43: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/43.jpg)
31
Procedure InvalidationPhase()
Begin
• For (k=0;k<MAX GRID CELL;k++)
1. For each moving object Mo ∈ Pk, call Invalidate Object(Mo,Gk)
2. For each moving query Mq ∈ Rk
(a) if Mq ∈ Gk, update the information of Mq in Gk
(b) else, insert a new entry in Gk for Mq, with an OList initialized from the
Updated Answer
• call Invalidate Queries()
End.
Figure 3.7. Pseudo code of the Invalidation phase.
we update only the information of Mo in Gk (Step 1 in Figure 3.8). If Mo is a new
entry in Gk, we insert a new entry for Mo in Gk with the current timestamp and
a QList that contains the moving queries from the Updated Answer table that are
satisfied by Mo (Step 2 in Figure 3.8). Then, we utilize the auxiliary structure object
index using Mo.OID to get the old entry Oold of Mo (Step 3 in Figure 3.8). For all
queries in Oold.QList, we report negative updates to the Updated Answer table and
update the corresponding OLists (Step 6 in Figure 3.8). Finally, we delete the old
entry of Mo (Step 7 in Figure 3.8).
The invalidation process of moving queries starts by flushing query parts in the
corresponding disk-based cells (Step 2 in Figure 3.7). Similar to moving objects, we
either update an old entry or insert a new one. Then, we compare the in-memory
query table with the in-disk query index. For each moving query, we keep track with
a set Sk that contains the cells that were part of the old region of the query, but are
not in the new query region (Step 1 in Figure 3.9). Then, we send negative updates
![Page 44: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/44.jpg)
32
Procedure Invalidate Object(Object Mo, GridCell Gk)
Begin
1. If Mo ∈ Gk
(a) Update the location and timestamp of Mo in Gk
(b) Sq = Queries in Updated Answer that contains Mo
(c) For each query Q ∈ Sq ∩ Mo.QList, add (Q,−Mo) to Updated Answer
(d) Mo.QList = Mo.QList ∪ Sq
(e) return
2. Insert Mo as a new entry in Gk with timestamp and a QList initialized with from
the Updated Answer
3. Gold = Old cell Mo from the Object index table
4. If Gold = NULL, return
5. Retrieve Oold; the old entry of Mo from Gold
6. For each query Q ∈ Oold.QList
(a) Add (Q,-Mo) to Updated Answer table
(b) Remove the entry Mo from Q.Olist
7. Delete the entry Oold from Gold
End.
Figure 3.8. Pseudo code invalidating moving objects.
for each object that was part of the query answer in each grid cell of Sk (Step 2 in
Figure 3.9). Finally, we delete the old entry of the moving query.
![Page 45: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/45.jpg)
33
Procedure Invalidate Queries()
Begin
• For each query Mq in the in-memory query table
1. Sk = Set of grid cells that was covered by the old value of Mq and not covered
by the new value of Mq
2. For each grid k ∈ Sk
(a) Retrieve Qold; the old entry of Mq in cell k
(b) For each O ∈ Qold.OList, add (Mq,−O) to Updated Answer and remove
Mq from O.QList
3. Delete the entry Qold from k
End.
Figure 3.9. Pseudo code invalidating moving queries.
Example. For the example given in Figure 3.3, the invalidation phase is con-
cerned only with moving objects and queries that change their locations in the time
interval [T0, T1]. Moving objects p1, p2 do not report any updates where p1 does not
cross its cell boundaries and p2 was not involved in any query answer at time T0.
Although p3 is still in Q4, however, the negative update (Q4,−p3) is reported since
object p3 crosses its cell boundaries. To guarantee that only incremental results will
be maintained, this negative tuple will be deleted in the joining phase. For object
p4, we report the negative update (Q4,−p4). For moving queries Q1, Q5, we do not
report any result, where they do not leave any of their old cells. Query Q3 reports a
negative update (Q3,−p6) where Q3 completely leaves its old cell that contains p6.
Notice that we do not report any negative update for p7 where Q3 still did not leave
the cell that contains p7.
![Page 46: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/46.jpg)
34
Discussion. The invalidation phase uses the object index and the query index
to retrieve the old information for moving objects and moving queries that cross
their cell boundaries, respectively. Another approach is to let the client send the old
location information along with the new location information. In this case, there
will be no need for maintaining the two auxiliary data structures. Although this
approach would simplify SINA and would save I/O overhead, it lacks practicality.
The main reason is that this approach assumes that the client has the ability to store
its old location information, which is not guaranteed for all clients. The objective of
SINA is to assume the minimal computation and storage requirement from clients.
Using auxiliary data structures to keep track of the old locations is utilized in
the LUR-tree as a linked list [20] and in the frequently updated R-tree as a hash
table [21]. However, the invalidation phase in SINA limits the access of the auxiliary
data structures to only the objects that move out of their cells, rather than to all
moved objects, which is the case in [20, 21].
The invalidation phase reports negative updates that correspond to moving ob-
jects that cross their cell boundaries and moving queries that leave some of their
old cells. For moving objects and queries that move within their cell boundaries,
we defer their invalidation process to the joining phase. Another approach for the
invalidation phase is to report negative updates from all moving objects and queries
regardless of their old locations. This approach would incur redundant I/O overhead.
In the joining phase, the cells that contain in-cell moving objects or queries have to
be fetched into memory to perform a join between objects and queries. Computing
negative updates for the in-cell movement in the invalidation phase results in redun-
dant operations between the two phases. Thus, the invalidation phase acts as a filter
to avoid unnecessary joins in the joining phase.
![Page 47: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/47.jpg)
35
3.2.3 Phase III: Joining
Data Structure. The joining phase does not require any additional data struc-
ture where it uses only the grid data structure that is utilized in the invalidation
phase.
Algorithm. Figure 3.10 gives the pseudo code of the joining phase. For each
grid cell, the joining phase performs two spatial join operations: (1) Joining in-
memory objects with in-disk queries (Steps 1 and 2 in Figure 3.10), (2) Joining
in-memory moving queries with in-disk objects (Steps 3 and 4 in Figure 3.10). For
each moving object/query, we get the set of queries/objects from applying a spatial
join algorithm, respectively (Steps 2a and 4a in Figure 3.10). Then, based on the
answer set, we report positive and negative updates while updating the corresponding
data structures. After performing the spatial join for all grid cells, we send the
Updated Answer to the clients, and clear all memory data structures.
Example. For the example given in Figure 3.3, during this phase, moving object
p1 reports the negative update (Q2,−p1). Object p2 does not report any updates
where there are no in-disk stationary queries to join with p2’s new cell (Q3 is a
moving query). The moving object p3 is joined with the stationary query Q4 that
produces (Q4, +p3) as a positive update. Notice that this positive update cancels the
corresponding previously reported negative update in the invalidation phase. Thus,
the size of the Updated Answer table is minimized and only the incremental results
are maintained. Object p4 does not produce any results where there are no in-disk
queries to join with. For moving queries, Q1 reports the negative update (Q1,−p5)
where p5 and Q1 are not joined together in the upper-left corner cell. Query Q3
reports the positive update (Q3, +p8) as a result of the spatial join of one of the
news cells covered by Q3. Also, Q3 reports (Q3,−p7). Query Q5 does not report any
of the updates where object p9 is still in the new region of Q5.
Discussion. The joining phase only joins the cells that have new moving objects
and/or queries. Cells that contain only stationary (i.e., not recently moving) objects
![Page 48: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/48.jpg)
36
Procedure JoiningPhase()
Begin
• For (k=0; k < MAX GRID CELL; k++)
1. Join moving objects in the in-memory bucket Pk with stationary queries in the
in-disk grid cell Gk
2. For each moving object Mo ∈ Pk
(a) Sq = Set of queries that results from the join
(b) For each query Q ∈ (Sq − Mo.QList), add (Q,+Mo) to Updated Answer,
update Q.OList
(c) For each stationary Q ∈ (Mo.QList − Sq), add (Q,−Mo) to Up-
dated Answer, update Q.OList
(d) Mo.QList = Sq ∪ Mo.QList
3. Join moving queries in the in-memory bucket Rk with stationary objects in the
in-disk grid cell Gk
4. For each moving query Mq ∈ Rk
(a) So = Set of objects that results from the join
(b) For each object O ∈ (So − Mq.OList), add (Mq,+O) to Updated Answer,
, update O.QList
(c) For each stationary O ∈ (Mq.OList − So), add (Mq,−O) to Up-
dated Answer, update O.QList
(d) Mq.OList = So ∪ Mq.OList
• Send the Updated Answer table to the users and empty all memory data structure
End.
Figure 3.10. Pseudo code for the joining phase.
![Page 49: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/49.jpg)
37
and queries are not processed in the joining phase. In addition, cells that contain
stationary or old information of moving objects and/or queries are filtered out and
are processed at the invalidation phase (e.g., the cells that contain p2, p3, and p4 at
time T0 in Figure 3.3a). Each iteration of the joining phase deals with only one grid
cell. Thus, the I/O cost of each iteration is bounded by the number of disk pages of
a grid cell. For the CPU time, we utilize a plane-sweep-based spatial join algorithm
similar to the ones used in hash-based spatial join algorithms (e.g., [70]).
3.3 Extensibility of SINA
In this section, we explore the extensibility of SINA to support a broad class of
continuous spatio-temporal queries (e.g., future, k-nearest-neighbor, and aggregate
spatio-temporal queries) and to support clients that may be disconnected from the
server for short periods of time (i.e., out-of-sync clients).
3.3.1 Querying the Future
Future queries [52], also termed as predictive queries [24, 71], are interested in
predicting the locations of moving objects. An example of a future query is ”Alert me
if a non-friendly airplane is going to cross a certain region in the next 30 minutes”.
To support future queries, D-dimensional moving objects report their current loca-
tions ~x0 = (x1, x2, · · · , xd) at time t0 and a velocity vector ~v = (v1, v2, · · · , vd). The
predicted location ~xt of the moving object at any instance time t > t0 is computed
by ~xt = ~x0 + ~v(t − t0).
The extension of SINA to support future queries is straightforward. Moving
objects are represented as lines instead of points. Thus, in the hashing phase, moving
objects are clipped into several hash buckets (same as rectangular queries). In the
invalidation and joining phases, moving objects will be treated as moving queries
in the sense that they may span more than one grid cell. The shared execution
paradigm can exactly fit for future queries. Also, moving queries do not need any
![Page 50: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/50.jpg)
38
Y
XXP
P3
P5 P1
P
4
2
P4
2
P5
3
P
P
1P
(a) Snapshot at time T0 (b) Snapshot at time T1
Y
Q Q
Figure 3.11. Querying the future.
special handling other than the ones used in the original description of SINA in
Section 3.2.
Figure 3.11a gives an example of querying the future. Five moving objects p1 to
p5, have the ability to report their current location at time T0 and a velocity vector
that is used to predict their future locations at times T1 and T2. The range query Q
is interested in objects that will intersect with its region at time T2 > T0. At time
T0 the rectangular query region is joined with the lines representation of the moving
objects. The returned answer set of Q is (p1, p3). At T1 (Figure 3.11b), only the
objects p2, p3, and p4 change their locations. Based on the new information, SINA
reports only the positive update (Q, +p2) and negative update (Q,−p3) that indicate
that p2 is considered now as part of the answer set of Q while p3 is no longer in the
answer set of Q.
3.3.2 k-nearest-neighbor Queries
SINA can be utilized to continuously report the changes of a set of concurrent
kNN queries. Figure 3.12a gives an example of two kNN queries where k = 3
issued at points Q1 and Q2. Assuming that both queries are issued at time T0,
![Page 51: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/51.jpg)
39
Y Y
XX
5
Q2
P6
P
3
1
P3
PP4
2
P
P
Q
P8
P6Q2
1
P
1
P
P1
5
7
Q
7P
8
2
P
4
P
P
(b) Snapshot at time T1(a) Snapshot at time T0
Figure 3.12. k-NN spatio-temporal queries.
we compute the first-time answer using any of the traditional algorithms of kNN
queries (e.g., [72]). For Q1, the answer would be Q1 = p1, p2, p3 while for Q2, the
answer would be Q2 = p5, p6, p7. In this case, we present Q1 and Q2 as circular
range queries with radius equal to the distance of the kth neighbor. Later, at time
T1 (Figure 3.12b), object p4 and p7 are moved. Thus, SINA can be utilized to allow
for a shared execution among the two queries and to compute the updates from the
previously reported answer. Notice that the only change to the original SINA is
that we utilize circular range queries rather than rectangular range queries. For Q1,
object p4 intersect with the query region. This results in invalidating the furthest
neighbor of Q1, which is p1. Thus, two update tuples are reported (Q1,−p1) and
(Q1, +p4). For Q2, the object p7 was part of the answer at time T0. However, after p7
moves,the joining phase checks whether p7 still inside the query region or not. If p7
is outside the circular query region, we compute another nearest-neighbor, which is
p8. Thus, two update tuples are reported, (Q2,−p7) and (Q2, +p8). Notice that the
query regions of Q1 and Q2 are changed from T0 to T1 to reflect the new k-nearest
neighbors.
![Page 52: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/52.jpg)
40
3.3.3 Aggregate Queries
Continuous spatio-temporal aggregate queries are recently addressed in [35] where
dense areas are discovered online (i.e., areas with the number of moving objects above
a certain threshold). The areas to be discovered are limited to pre-defined grid cells.
Thus, if a dense area is not aligned to a grid cell, it will not be discovered. The
work in [35] can be modeled as a special instance of SINA in the following way:
For a N × N grid, we consider having N2 spatio-temporal disjoint aggregate range
queries, where each query represents a grid cell. Moreover, SINA can extend [35]
to have the ability to discover pre-defined dense areas of arbitrary regions. Thus,
important areas (e.g., areas around airport or in downtown) can be discovered even
if they are not aligned to grid cells. All pre-defined dense areas are treated as range
queries. Then, the shared execution paradigm with the incremental evaluation of
SINA continuously reports the density of such areas. Positive and negative updates
report only the increase and decrease of the density from the previously reported
result.
3.3.4 Out-of-Sync Clients
Mobile objects tend to be disconnected and reconnected several times from the
server for some reasons beyond their control, i.e., being out of battery, losing com-
munication signals, being in a congested network, etc. This out-of-sync behavior
may lead to erroneous query results in any incremental approach. Figure 3.13 gives
an example of erroneous query result. The answer of query Q that is stored at both
the client and server at time T1 is (p1, p2) . At time T2, the client is disconnected
from the server. However, the server does not recognize that Q is disconnected.
Thus, the server keeps computing the answer of Q, and sends the negative update
(Q,−p2). Since the client is disconnected, the client could not receive this negative
update. Notice the inconsistency of the stored result at the server side (p2) and the
client side (p1, p2). Similarly, at time T3, the client is still disconnected. The client is
![Page 53: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/53.jpg)
41
++
+
−
SINA Server
+
1
P2 P3
P1 P
P
1 P2P4P1
2 4T
2 P P
P
P
P2P1 P4
P1 P3
T T1
1
2P 4
2
T 3
P
P
2
3P
Q Q Q Q
1 1P
Figure 3.13. Example of Out-of-Sync queries.
connected again at time T4. The server computes the incremental result from T3 and
sends only the positive update (Q, +p4). At this time, the client is able to update its
result to be (p1, p2, p4). However, this is a wrong answer, where the correct answer is
kept at the server (p1, p3, p4). SINA can easily be extended to resolve the out-of-sync
problem by adding the following catch-up phase.
Catch-up Phase. A naive solution for the catch-up phase is once the client
wakes up, it empties its previous result and sends a wakeup message to the server.
The server replies by the query answer stored at the server side. For example, in
Figure 3.13, at time T4, SINA will send the whole answer (p1, p3, p4). This approach
is simple to implement and process in the server side. However, it may result in
significant delay due to the network cost in sending the whole answer. Consider
a moving query with hundreds of objects in its result that gets disconnected for a
short period of time. Although, the query has missed a couple of points during its
disconnected time, the server would send the complete answer to the query.
To save the network bandwidth, SINA maintains a repository of committed query
answers. An answer is considered committed if it is guaranteed that the client has
received it. Once the client wakes up from the disconnected mode, it sends a wakeup
message to the server. SINA compares the latest answer for the query with the
committed answer, and sends the difference of the answer in the form of positive and
negative updates. For example, in Figure 3.13, SINA stores the committed answer
of Q at time T1 as (p1, p2). Then, at time T4, SINA compares the current answer
![Page 54: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/54.jpg)
42
with the committed one, and send the updates (Q,−p2, +p3, +p4). Once SINA
receives any information from a moving query, SINA considers its latest answer as
a committed one. However, stationary queries are required to send explicit commit
message to SINA to enable committing the latest result. Commit messages can be
sent at the convenient times of the clients.
3.4 Correctness of SINA
In this section, we provide a proof of correctness of the Scalable INcremental hash-
based Algorithm (SINA). The correctness proof is divided into three parts: First, we
prove that SINA is complete, i.e., all result tuples are produced. Second, we prove
that SINA is a duplicate-free algorithm, i.e., output tuples are produced exactly
once. Third, we prove that SINA is progressive, i.e., only new results will be sent to
the user.
Theorem 3.4.1 For any two sets of moving objects P and moving queries R, SINA
produces all output results (p, r) of P ⋊⋉ R, where the join condition p inside r is
satisfied at any time instance t.
Proof Assume that ∃(p, r) : p ∈ P, r ∈ R, and at some time instance t, p was
located inside r. However, the tuple (p, r) is not reported by SINA. Since (p, r)
satisfies the join condition, then there exists a hash bucket h such that h = hP (p)
and h ∈ hR(r). Assume that the latest information sent from p and r were in time
intervals [Ti, Ti+1] and [Tj , Tj+1], respectively. Then, there are exactly two possible
cases:
Case 1: i = j. In this case, both p and r reports their recent information at the
same time interval [Ti, Ti+1]. Thus, we guarantee that p and r were resident in the
memory at the same time. If p arrives before r, then p will be stored in bucket Ph
without joining with r. Later when r arrives, it will probe the bucket Ph, and join
with p. The same proof is applicable when r arrives before p. Thus, the tuple (p, r)
cannot be missed in case of i = j.
![Page 55: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/55.jpg)
43
Case 2: i 6= j. Assume i < j. This indicates that p arrives before r. Then,
in the invalidation phase, p is flushed into disk before r arrives. Once r arrives,
it is stored in an in-memory hash bucket h that corresponds to the disk-based cell
of object p. Since the joining phase joins all in-memory hash buckets with their
corresponding in-disk grid cells, we guarantee that r will be joined with p. The same
proof is applicable when i > j where the in-memory object p will be joined with the
in-disk query r. Thus, the tuple (p, r) cannot be missed in case of i 6= j.
From Cases 1 and 2, we conclude that the assumption that (p, r) is not reported
by SINA is not possible. Thus, SINA produces all output results.
Theorem 3.4.2 At any evaluation time Ti, SINA produces the output result that
corresponds to all information change in [Ti−1, Ti] exactly once.
Proof Assume that ∃(p, r) : p ∈ P, r ∈ R, and (p, r) satisfies the join condition.
Assume that SINA reports the tuple (p, r) twice. We denote such two instances as
(p, r)1 and (p, r)2. Since, we are interested only on tuples that satisfy the join condi-
tion (i.e., positive updates), then, we skip the invalidation phase where it produces
only negative updates. Thus, we identify the following three cases:
Case 1: (p, r)1 and (p, r)2 are both produced in the hashing phase.
Assume that p arrives after r. Once p arrives, it probes the hash bucket of r and
outputs the result (p, r)1. Then, during the hashing phase, only newly incoming
tuples are used to probe the hash buckets of p and r. Thus, (p, r) cannot be produced
again in the hashing phase.
Case 2: (p, r)1 and (p, r)2 are both produced in the joining phase. The
joining phase produces positive updates at two outlets: in-memory moving objects
with in-disk (not recently moving) queries and moving queries with in-disk objects.
If (p, r)1 is produced in the former outlet, then p is a moving object that reports its
recent information in [Ti−1, Ti]. Thus, (p, r)2 cannot be produced in the second outlet
where it is concerned only with in-disk objects. The same proof is applicable when
(p, r)1 is produced in the second outlet. Thus, the tuple (p, r) cannot be produced
again in the joining phase.
![Page 56: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/56.jpg)
44
Case 3: One of the tuples, say (p, r)1, is produced in the hashing phase,
while the other one is produced in the joining phase. Since (p, r)1 is reported
in the hashing phase, then we guarantee that both p and r were in memory (moving)
in the same time interval [Ti−1, Ti]. Thus, p is a moving object and r is a moving
query. In the joining phase, (p, r) cannot be produced again in the first outlet where
r is not stationary. Similarly, (p, r) cannot be produced again in the second outlet
where p is not stationary. Thus, the tuple (p, r) cannot be produced again in the
joining phase.
From the above three cases, we conclude that the assumption that the tuple (p, r)
is reported twice at the evaluation time Ti is not valid.
Theorem 3.4.3 For any two sets of moving objects P and moving queries R, at
any evaluation time Ti, SINA produces ONLY the changes of the previously reported
result at time Ti−1.
Proof Assume that ∃p1, p2, p3 ∈ P, r ∈ R, and only (p1, r), (p2, r) satisfy the join
condition at time Ti−1. Then, at time Ti, p1 is still inside r, p2 is moved out of r
while p3 is moved inside r. In the following we prove that only the tuples (r,−p2)
and (r, +p3) are produced at time Ti. Mainly, we identify the following three cases:
Case 1: r is a moving query, p1, p2, and p3 are moving objects. This case
is processed only in the hashing and invalidation phases. Based on Theorem 2, the
hashing phase produces only the updates (r, +p1) and (r, +p3). In the invalidation
phase, (r, +p1) is deleted either in Step 1c in Figure 3.8 (if p1 moves within its cell
boundary) or in Step 6a in Figure 3.8 (if p1 moves out of its cell) by adding the
counterpart tuple (r,−p1). The tuple (r,−p2) is produced only in the invalidation
phase either in Step 1c or Step 6a of Figure 3.8.
Case 2: r is a moving query, p1, p2, and p3 are stationary objects. This
case is processed only in the invalidation and joining phases. Since p1 is still in
the answer set of r, then p1 is inside some grid cell c that intersects with both the
old and new regions of r. Thus, p1 will be processed only in the joining phase,
![Page 57: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/57.jpg)
45
particularly at Step 4 in Figure 3.10. However, since p1 is an old answer, no action
will be taken. For object p2, assume that Ci−1, Ci are the sets of grid cells that
are covered by r in Ti−1, Ti, respectively. c2 is the grid cell of p2. Since p2 is not
in the answer set of r at time Ti, then c2 /∈ Ci − Ci−1. If c2 ∈ Ci−1 − Ci, then the
tuple (r,−p2) will be produced in the invalidation phase (Step 2b in Figure 3.9).
However, if c2 ∈ Ci−1 ∩ Ci, then the tuple (r,−p2) will be produced in the joining
phase (Step 4c in Figure 3.10). For object p3, since p3 is not an old answer, then
it will not be processed in the invalidation phase. Thus, the tuple (r, +p3) will be
reported in the joining phase (Step 4b in Figure 3.10).
Case 3: r is a stationary query, p1, p2, and p3 are moving objects. The
proof is very similar to Case 2 by reversing the roles of queries and objects.
We do not include the case of stationary queries on stationary objects where it is
not precessed. Also, we assume that either all p′is are moving or stationary. However,
the proof is still valid for any combination of moving and stationary p′is. Thus, from
the above three cases, we conclude that: At time Ti, SINA only produces the change
of the result from the previously reported answer at time Ti−1.
3.5 Performance Evaluation
In this section, we compare the performance of SINA with the following: (1) Hav-
ing an R-tree-based index on the object table. To cope with the moving objects, we
implement the frequently updated R-tree [21] (FUR-tree, for short). The FUR-tree
modifies the original R-tree to efficiently handling moving objects. (2) Having a Q-
index [19] on the query table. Since Q-index is designed for static queries, we modify
the original Q-index to employ the techniques of the FUR-tree to handle moving
queries. Thus, the Q-index can handle moving queries as efficient as the FUR-tree
handles moving objects. (3) Having both the FUR-tree on moving objects and the
modified Q-index on the query table. Then, we employ an R-tree based spatial join
algorithm [73] (RSJ, for short) to join objects and queries.
![Page 58: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/58.jpg)
46
Figure 3.14. Road network map of Oldenburg City.
We use the Network-based Generator of Moving Objects [74] to generate a set of
moving objects and moving queries. The input to the generator is the road map of
Oldenburg (a city in Germany) given in Figure 3.14. The output of the generator
is a set of moving points that moves on the road network of the given city. Moving
objects can be cars, cyclists, pedestrians, etc. We choose some points randomly
and consider them as the centers of square queries. Unless mentioned otherwise,
we generate 100K moving objects and 100K moving queries. Each moving object
or query reports its new information (if changed) every 100 seconds. The space
is represented as the unit square, query sizes are assumed to be square regions of
side length 0.01. SINA is adopted to refresh query results every T = 10 seconds.
The percentage of objects and queries that report a change of information within T
seconds is 10% of the moving objects and queries, respectively.
All the experiments in this section are conducted on Intel Pentium IV CPU
2.4GHz with 256MB RAM running Linux 2.4.4. SINA is implemented using GNU
C++. The page size is 2KB. We implement FUR-tree, Q-index, and RSJ using
the original implementation of R*-tree [75]. Our performance measures are the I/O
overhead and CPU time incurred. For the I/O, we consider that the first two levels
of any R-tree-based structures are in memory. The CPU time is computed as the
time used to perform the spatial join in the memory (i.e., once the page is retrieved
![Page 59: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/59.jpg)
47
0
100
200
300
400
0 1 2 3 4 5 6 7 8 9 10Size of answer (*1000 bytes)
Percentage for moving objects (%)
SINAComplete answer
(a) Moving objects (%)
0
200
400
600
800
1000
1200
1400
1600
1 1.2 1.4 1.6 1.8 2Size of answer (*1000 bytes)
query side length (*0.01)
SINAComplete answer
(b) Query size
Figure 3.15. The answer size.
from disk). For SINA, the CPU time also includes the time that the hashing phase
consumes for the in-memory join.
3.5.1 Properties of SINA
Figure 3.15 compares between the size of the answer returned by SINA and the
size of the complete answer returned by any non-incremental algorithm. In Fig-
ure 3.15a, the percentage of moving objects varies from 0% to 10%. The size of
the complete answer is constant and is orders of magnitude of the size of the in-
cremental answer returned by SINA. A complete answer is not affected by recently
moved objects. However, for SINA, the size of the answer is increasing slightly,
where it is affected by the number of objects being evaluated at every T seconds. In
Figure 3.15b, the query side length varies from 0.01 to 0.02. The size of the com-
plete answer is increased dramatically to up to seven times that of the incremental
result returned by SINA. The saving in the size of the answer directly affect the
communication cost from the server to the clients.
![Page 60: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/60.jpg)
48
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80
I/O (*1000)
Number of cells per dimension
SINA
(a) I/O
2
2.5
3
3.5
4
4.5
5
5.5
6
0 10 20 30 40 50 60 70 80
Time (sec)
Number of cells per dimension
SINA
(b) CPU Time
Figure 3.16. The impact of grid size N .
Figures 3.16a and 3.16b give the effect of increasing the grid size N on the I/O
and CPU time incurred by SINA, respectively. With small number of grid cells (i.e,
less than 10), each cell contains a large number of disk pages. Thus a spatial join
within each cell results in excessive I/O and CPU time. On the other hand, with
a large number of grid cells (i.e., more than 60), each cell contains a small number
of moving objects and queries. Although this results in lower CPU time, where the
spatial join is performed among few tuples. However, disk pages are under utilized.
Thus, additional I/O overhead will be incurred. Based on this experiment, we set
the number of grid cells N along one dimension to be 40.
3.5.2 Number of Objects/Queries
In this section, we compare the scalability of SINA with the FUR-tree, Q-index,
and RSJ algorithms. Figures 3.17a and 3.17b give the effect of increasing the num-
ber of moving objects from 10K to 100K on I/O and CPU time, respectively. In
Figure 3.17a, SINA outperforms all other algorithms. RSJ has double the I/O’s of
SINA due to the R-tree update cost. Notice that the performance of the R-trees
![Page 61: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/61.jpg)
49
0
10
20
30
40
50
60
70
80
90
10 20 30 40 50 60 70 80 90 100
I/O (*1000)
Number of objects (*1000)
SINAFUR-treeQ-Index
RSJ
(a) I/O
0
1
2
3
4
5
6
7
10 20 30 40 50 60 70 80 90 100
Time (sec)
Number of objects (*1000)
SINAFUR-treeQ-Index
RSJ
(b) CPU Time
Figure 3.17. Scalability with number of objects.
is degraded with the increase in the number of moving objects and moving queries.
The performance of Q-index is dramatically degraded with the increase of the num-
ber of moving objects as moving objects are not indexed. The FUR-tree has the
worst performance for all cases where there is no index of the 100K queries. How-
ever, the performance is slightly affected by the increase of the moving objects. The
slight increase is due to the maintenance of the increasing size of the moving objects.
When the number of moving objects is increased up to 100K, both the FUR-tree and
the Q-index have similar performance which is eight times worse than that of the
performance of SINA. The main reason is that both FUR-tree and Q-index utilize
only one index structure. Thus, the non-indexed objects and queries worsen the
performance of FUR-tree and Q-index, respectively.
In Figure 3.17b, SINA has the lowest CPU time. The relative performance of
SINA over other R-tree-based algorithms increases with the increase of the number of
moving objects. The main reason is that the update cost of SINA is much lower than
updating R-tree structures. As the number of moving objects increases, the quality
of the bounding rectangles in the R-tree structure is degraded. Thus, searching
and querying an R-tree incurs higher CPU time. The RSJ algorithm gives lower
![Page 62: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/62.jpg)
50
0
10
20
30
40
50
60
70
80
90
10 20 30 40 50 60 70 80 90 100
I/O (*1000)
Number of queries (*1000)
SINAFUR-treeQ-Index
RSJ
(a) I/O
0
1
2
3
4
5
6
7
10 20 30 40 50 60 70 80 90 100
Time (sec)
Number of queries (*1000)
SINAFUR-treeQ-Index
RSJ
(b) CPU Time
Figure 3.18. Scalability with number of queries.
performance in CPU time than FUR-tree and Q-index since RSJ needs to update
in two R-trees. The performance of RSJ ranges from 1.5 to 3 times worse than the
performance of SINA.
Figure 3.18 gives similar experiment to Figure 3.17 with exchanging the roles of
objects and queries. Since both SINA and RSJ treat objects and queries similarly,
their performance is similar to that of Figure 3.17. However, the FUR-tree and Q-
index exchange their performance as they deal with objects and queries differently.
3.5.3 Percentage of Moving Objects/Queries
Figure 3.19 investigates the effect of increasing the percentage of the number
of moving objects and queries on the performance of SINA and R-tree-based algo-
rithms. The percentage of moving objects varies from 1% to 10%. The percentage
of moving queries is set to 5%. For the I/O overhead (Figure 3.19a), RSJ has sim-
ilar performance as SINA for up to 5% of moving objects. Then, RSJ incurs up to
double the number of I/O’s over that of SINA for 10% of moving objects. Both the
FUR-tree and the Q-index have similar performance which is almost eight times of
![Page 63: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/63.jpg)
51
0
10
20
30
40
50
60
70
80
90
1 2 3 4 5 6 7 8 9 10
I/O (*1000)
Percentage for moving objects (%)
SINAFUR-treeQ-Index
RSJ
(a) I/O
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8 9 10
Time (sec)
Percentage for moving objects (%)
SINAFUR-treeQ-Index
RSJ
(b) CPU Time
Figure 3.19. Percentage of moving objects.
magnitude worse than that of SINA. When the percentage of moving objects is lower
than 5% (i.e., lower than the percentage of moving queries), the FUR-tree has bet-
ter performance. When the percentage of the number of moving objects and moving
queries are equal (i.e., 5%) both FUR-tree and Q-index have similar performance.
Basically, the performance of FUR-tree and Q-index are degraded with the increase
of the percentage of moving objects and moving queries, respectively.
For the CPU time (Figure 3.19b), SINA outperforms all R-tree based algorithms.
This is mainly due to the high update cost of the R-tree. The RSJ algorithm has
the highest CPU time, where it updates in two R-trees. In addition, SINA computes
incremental results while R-tree-based algorithms are non-incremental.
Similar performance is achieved when fixing the number of moving objects to 5%
while varying the number of moving queries from 0% to 10%. The only difference is
that we replace the roles of objects and queries. Thus, the performance of the FUR-
tree and Q-index is exchanged while SINA and SRJ maintain their performance.
In Figure 3.19, we limit the number of moving queries to 5% and the number
of moving objects to 10%. Having more dynamic environment degrades the per-
formance of all R-tree based algorithms. In the following experiment, we explore
![Page 64: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/64.jpg)
52
9
10
11
12
13
14
15
10 15 20 25 30
I/O (*1000)
Percentage for moving objects (%)
moving queries=10%moving queries=20%moving queries=30%
(a) I/O
0
1
2
3
4
5
6
7
8
10 15 20 25 30
Time (sec)
Percentage for moving objects (%)
moving queries=10%moving queries=20%moving queries=30%
(b) CPU Time
Figure 3.20. Scalability of SINA with update rates.
the scalability of SINA in terms of handling highly dynamic environments. In Fig-
ure 3.20, the percentage of moving objects varies from 10% to 30%. We plot three
lines for SINA that correspond to the percentage of moving queries as 10%, 20%,
and 30%. We do not include any performance results of any of the R-tree-based
algorithms where their performance is dramatically degraded in highly dynamic en-
vironments. Figures 3.20a, and 3.20b give the I/O and CPU time incurred by SINA,
respectively. The trend of SINA is similar with all percentages of moving queries.
Also, the performance of SINA increases linearly with the increase of moving objects.
Thus, SINA is more suitable for highly dynamic environments.
3.5.4 Locality of movement
This section investigates the effect of locality of movement on SINA and R-tree-
based algorithms. By locality of movement, we mean that objects and queries are
moving within a certain distance. As an extreme example, if all objects are moving
within small distance, then at each evaluation time T of SINA, all objects and queries
are moving within their cells. Thus, SINA achieves its best performance. On the
![Page 65: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/65.jpg)
53
0
10
20
30
40
50
60
70
80
90
0 10 20 30 40 50 60 70 80 90 100
I/O (*1000)
Objects that change their cells (%)
SINAFUR-treeQ-Index
RSJ
(a) I/O
0
1
2
3
4
5
6
7
8
0 10 20 30 40 50 60 70 80 90 100
Time (sec)
Objects that change their cells (%)
SINAFUR-treeQ-Index
RSJ
(b) CPU Time
Figure 3.21. Effect of movement locality.
other side, SINA has its worst performance if 100% of the objects change their cells.
By tuning the moving distance of moving objects, we can keep track of the number
of moving objects that cross their cell boundaries. Figures 3.21a and 3.21b give the
effect of the movement locality on the I/O and CPU time, respectively. For I/O,
even the worst case of SINA is still better than R-tree-based algorithms (similar to
RSJ and four times better than the FUR-tree and the Q-index). The performance
of R-tree-based algorithms is almost not affected even all objects change their cells.
The main reason is that changing the cell in the grid structure does not necessarily
mean changing the R-tree node. For the CPU time, SINA outperforms all other
algorithms by two orders of magnitude. In addition, the performance of SINA has
only slight increase with the number of objects that change their cells.
3.6 Summary
This chapter introduced the Scalable INcremental hash-based Algorithm (SINA,
for short); a new algorithm for evaluating a set of concurrent continuous spatio-
temporal range queries. SINA employs the shared execution and incremental evalu-
![Page 66: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/66.jpg)
54
ation paradigms to achieve scalability and efficient processing of continuous spatio-
temporal queries. SINA has three phases: Hashing phase, invalidation phase, and
joining phase. The hashing phase employs an in-memory hash-based join algorithm
that results in a set of positive updates. The invalidation phase is triggered every T
time units or when the memory is full to produce a set of negative updates. Then,
the joining phase is triggered to produce a set of both positive and negative updates
that result from joining in-memory data with in-disk data. We discussed the exten-
sibility of SINA to support a wide variety of spatio-temporal queries and out-of-sync
clients. The correctness of SINA is proved in terms of completeness, uniqueness, and
progressiveness. Comprehensive experiments show that the performance of SINA is
orders of magnitude better than other R-tree based algorithms where the experi-
ments demonstrate that SINA is: (1) Scalable to a large number of moving objects
and/or moving queries, (2) Stable in highly dynamic environments. Finally, SINA
saves in the network bandwidth by minimizing the data sent to clients.
![Page 67: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/67.jpg)
55
4 STREAM-BASED SPATIO-TEMPORAL QUERY PROCESSING: QUERY
OPERATORS
While the previous chapter and most of the existing approaches for continuous spatio-
temporal query processing (e.g., see [8, 20–24, 31, 57, 58, 76, 77]) focus on indexing
and/or storing the incoming object updates on the disk storage, this chapter focuses
on data stream environments where only in-memory algorithms and data structures
are allowed. Data streaming environments (e.g., see [3, 45,78–81]) are characterized
by: (1) Large numbers of data objects that are beyond the system capabilities to
store, and (2) Very high data arrival rates that hinder consulting the secondary
storage for indexing and/or storing the incoming data. Most of the existing work
in data streaming environments (e.g., see [45, 62, 79, 80]) aim to efficiently support
continuous queries over data streams. However, the spatial and temporal properties
of data streams and/or continuous queries are overlooked.
In this chapter, we introduce the Generic Progressive Algorithm (GPAC, for
short) for continuously evaluating continuous spatio-temporal queries over spatio-
temporal data streams. GPAC provides a generic skeleton that can be tuned
through a set of methods to behave as different continuous spatio-temporal queries
(e.g., continuous range queries and k-nearest-neighbor queries). The GPAC family
of algorithms is mainly designed to achieve the following goals:
1. Online evaluation. Incoming data is processed and stored in-memory without
the need for secondary storage.
2. Progressive evaluation. Only the updates of the previously reported result are
computed progressively as new tuples arrive. This is in contrast to previous
approaches that buffer some of the updates and send them once to the user.
![Page 68: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/68.jpg)
56
Unlike most of the existing algorithms for continuous spatio-temporal queries
(e.g., see [8,31–33,55]) that are implemented as high-level functions at the application
level, GPAC algorithms are encapsulated into physical pipelined query operators that
can be part of a query execution plan. By having GPAC as pipelined query operators,
we achieve the following three goals:
1. GPAC operators can be combined with other traditional operators (e.g., dis-
tinct, aggregate, and join) to support online and progressive evaluation for a
wide variety of continuous spatio-temporal queries.
2. Pushing GPAC operators deep in the query execution plan reduces the number
of tuples in the query pipeline where GPAC operators act as filters to other
operators.
3. Flexibility in the query optimizer where multiple candidate execution plans can
be produced by shuffling the GPAC operators with other traditional operators.
The rest of this chapter is organized as follows: Section 4.1 introduces the basic
idea of GPAC. In Section 4.2, we introduce the problem of uncertainty in continuous
spatio-temporal queries and how can the GPAC framework avoids such uncertainty.
Section 4.3 provides two instances of GPAC that behave as continuous range queries
and k-nearest-neighbor queries. Encapsulation of GPAC into physical query opera-
tors is presented in Section 4.4. Section 4.5 provides an experimental study of GPAC.
Finally, Section 4.6 summarizes this chapter.
4.1 The GPAC: Continuous Spatio-temporal Query Operators
In this section, we introduce the Generic Progressive Algorithm (GPAC) for con-
tinuous spatio-temporal queries over spatio-temporal streams. GPAC is similar in
spirit to generalized search tree indexes (e.g., GiST [82] and SP-GiST [83]), but
GPAC is in the context of spatio-temporal query processing algorithms. GPAC is
introduced as a general skeleton that can be adjusted through a set of methods to
![Page 69: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/69.jpg)
57
behave as various continuous spatio-temporal queries (e.g., continuous range queries
and nearest-neighbor queries). In GPAC, each moving query is bounded to one focal
object. For example, if a moving object M submits a query Q that asks about its
nearest police car, then M is considered the focal object of Q. Mobile objects and
queries are required to send updates of their locations every T seconds. Failure to
do so results in considering the mobile object or query as disconnected. As GPAC
can be implemented either at the application level or as a physical query operator,
the output of GPAC is sent either directly to the user or to the next query operator
in the pipeline. Thus, throughout the rest of the paper, we use the terms “user” and
“next query operator” as synonyms.
In GPAC, we store the tuples that satisfy the each query Q in a data structure
termed Q.Answer. Then, for each newly incoming tuple P , GPAC performs two
tests: Test I: Is P ∈ Q.Answer? Test II: Does P satisfy the query predicate?. Based
on the results of the two tests, GPAC distinguishes among four cases:
• Case I: P ∈ Q.Answer and P still satisfies the query predicate. As GPAC
processes only the updates of the previously reported result, P will neither be
processed nor will P be sent to the user.
• Case II: P ∈ Q.Answer, however, P does not qualify to be part of the answer
anymore (i.e., P does not satisfy the query predicate anymore) . In this case,
GPAC reports a negative update P− to the user. The negative update indicates
that P needs to be removed from the query answer and hence is discarded from
the system.
• Case III: P /∈ Q.Answer, however, P qualifies to be part of the current answer
(i.e., P satisfies the query predicate currently). In this case, GPAC reports a
positive update to the user. The positive update indicates that P needs to be
added to the query answer.
![Page 70: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/70.jpg)
58
Procedure Q.ReceiveTupleI(Tuple P )
Begin
1. If query Q is moving and P is the focal point
(a) Q.UpdateCriteriaI(P ) (Figure 4.2)
(b) return
2. if Q.satisfy(P ) AND P /∈ Q.Answer
(a) Add P to Q.Answer
(b) Send the Positiveupdate P to the user
(c) If Q.IsDynamic()
• Q.UpdateCriteriaI(P ) (Figure 4.2)
(d) return
3. If (!Q.satisfy(P )) AND (P ∈ Q.Answer)
(a) Delete P from Q.Answer
(b) Send the Negative tuple P− to the user.
End
Figure 4.1. Pseudo code of skeleton of GPAC.
• Case IV: P /∈ Q.Answer and P still does not qualify to be part of the current
answer. In this case, P has no effect on Q. Thus, P will neither be processed
nor will P be sent to the user.
Figures 4.1 and 4.2 give the pseudo code of the main idea of GPAC upon receiving
a tuple P . Functions and variables written in bold font need to be implemented
separately for each query type as will be addressed in Section 4.3. Initially, GPAC
checks if P is the focal object of the moving query Q (Step 1 in Figure 4.1). If this
![Page 71: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/71.jpg)
59
is the case, we update the spatial region covered by Q. Based on the update, some
tuples from Q.Answer may be out of the new query spatial region. These tuples
are deleted (expired) from Q.Answer and corresponding negative updates are sent
to the user or the next query operator (Step 2 in Figure 4.2).
If the newly incoming tuple P is not the query focal object (Step 2 in Figure 4.1),
we check if P qualifies to be in the query answer (Test II). If this is the case, we
check if P is part of the recently reported answer (Test I). In this case, we do not
process or send P since P is still in the reported answer (Case I). However, if P is
not part of the recently reported answer (Case III), we add P to Q.Answer (Step 2a
in Figure 4.1) and send P as a positive update to the user (Step 2b in Figure 4.1).
Then, we update the query information (if needed) based on P ’s effect on the query
spatial region (Step 2c in Figure 4.1). The predicate Q.IsDynamic() returns “true”
if the query spatial area is changed as a result of P .
If the incoming tuple P does not qualify to be part of the answer, then we check
if P is part of the recently reported answer (Step 3 in Figure 4.1). In this case (Case
II), we delete P from the current answer (Step 3a in Figure 4.1) and report P as a
negative update to the user (Step 3b in Figure 4.1). Notice that if P was not in the
previously reported answer, we do not have to process or send P to the user (Case
IV).
4.2 Uncertainty in Continuous Spatio-temporal Queries
4.2.1 Types of Uncertainty
One of the goals of GPAC is to provide a fast and up-to-the-moment answer
to continuous queries over spatio-temporal streams. However, this goal is hindered
by the fact that spatio-temporal streams are not materialized on secondary storage.
The basic GPAC algorithm stores only the tuples that satisfy the query predicate
Q. Such implementation may result in having uncertainty areas in Q. We define the
uncertainty area of a query Q as follows:
![Page 72: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/72.jpg)
60
Procedure Q.UpdateCriteriaI(Tuple P )
Begin
1. Q.Update(P )
2. For all moving objects M ∈ Q.Answer
• If NOT Q.satisfy(M)
(a) Send the Negative output M− to user.
(b) Delete M from Q.Answer.
End
Figure 4.2. Updating query information in GPAC
Definition 4.2.1 The uncertainty area of query Q is the spatial area of Q that
may contain potential moving objects that satisfy Q, with Q not being aware of the
contents of this area.
Uncertainty areas in GPAC may result in erroneous query result. We distinguish
among three cases for producing uncertainty areas within the basic GPAC framework:
1. New query. Initially, there are no outstanding queries in the system. Thus,
continuously arrived spatio-temporal streams are neither processed nor stored.
Once a query Q is submitted to the system, we cannot provide a fast answer
to Q, simply because there is nothing currently being stored in the database.
In this case, all the area covered by Q is considered an uncertainty area. Later
on, moving objects update their locations and the answer of Q is incrementally
built.
2. Moving queries. Figures 4.3 and 4.4 give examples of uncertainty areas
that result from moving range queries and moving nearest-neighbor queries,
respectively. Figure 4.3a represents a snapshot at time T0 where point P is
outside the area of query Q. Thus, P is not physically stored in the database.
![Page 73: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/73.jpg)
61
P
Q
(c) T : P moves(b) T : Q moves(a) Q and P at T
P
P
P
0 1 2
Figure 4.3. Uncertainty in moving range queries.
1
P3
P2
P
Q
3P3
P1
P2
P
1
QP
P2
Q
(b) T : Q moves(a) Q at time T (c) T : P moves2 30 1
Figure 4.4. Uncertainty in moving NN queries.
At time T1 (Figure 4.3b), Q is moved to cover a new spatial area. The shaded
area in Q represents its uncertainty area. Although P is inside the new query
region, P is not reported in the query answer. At T2 (Figure 4.3c), object
P moves out of the query region. Thus, P is never reported at the query
result, although it was physically inside the query region in the interval [T1, T2].
Similar erroneous output is given in Figure 4.4 for k-nearest-neighbor queries
(k = 2). Object P3 is never reported in the query answer, although it should
have been within the answer in the interval [T1, T2].
3. Stationary queries. Figure 4.5 gives an example of uncertainty area in sta-
tionary k-nearest-neighbor queries (k = 2). At time T0 (Figure 4.5a), the query
Q has P1 and P3 as its answer. P2 is outside the query spatial region, thus P2 is
not stored in the database. At T1 (Figure 4.5b), P1 is moved far from Q. Since
Q is aware of P1 and P3 only, we extend the spatial region of Q to include the
new location of P1. Thus, an uncertainty area is produced. Notice that Q is
![Page 74: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/74.jpg)
62
(c) T : P moves(b) T : P moves(a) Q at time T
P2
P
3P P1
0
P3 P1
P
1
P3
2 21 1
P22
Q QQ
Figure 4.5. Uncertainty in static NN queries.
unaware with P2 since P2 is not stored in the database. At T2 (Figure 4.5c), P2
moves out of the new query region. Thus, P2 never appears as an answer of Q,
although it should have been part of the answer in the time interval [T1, T2].
4.2.2 Uncertainty Avoidance in GPAC
GPAC does not handle the uncertainty area that results from newly submitted
queries. Continuous queries are issued to run for hours and days. Thus, having a
warm-up period for a few seconds does not affect neither the accuracy nor the effi-
ciency of the query result. However, uncertainty areas that result from stationary
or moving queries are crucial and are treated by GPAC. In this section, we modify
the basic GPAC algorithm given in Section 4.1 to avoid having uncertainty areas in
both stationary and moving queries. The main idea is to anticipate the change in
the query spatial region and cache all moving objects that lie inside the anticipated
area in an in-memory structure called Q.Cache. A conservative approach for deter-
mining the anticipated area is to expand the query region in all directions with the
maximum possible distance that a moving object can cover between any two con-
secutive updates. Such conservative approach completely avoids uncertainty areas.
Once a query changes its spatial region, we probe Q.Cache for all objects that lie
inside the new spatial region. Thus a fast answer of Q is retrieved. Notice that with
the conservative approach, the change of the query spatial region is guaranteed to
![Page 75: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/75.jpg)
63
(a) Snapshot at time T
P3
P1P
0
P
P5
2
4
2
P3
P1
P5
P4
P
1
P5
(c) The cache area is adjusted
P
(b) The query is moved at time T
4
P3
P1P2
Figure 4.6. The cache area.
be completely inside the anticipated area. To realize GPAC with caching, we equip
each query Q with the following: (1) A variable Q.CacheArea that contains the
boundary of the anticipated area. (2) The data structure Q.Cache that keeps track
of all moving objects within Q.CacheArea. (3) The function Q.InCacheArea() that
takes an input tuple P and outputs true if P lies inside Q.CacheArea.
The conservative caching approach requires only the knowledge of the maximum
object speed, which is typically available in moving object applications (e.g., mov-
ing cars in road network have limited speeds). This is in contrast to all validity
region approaches (e.g., the safe region [19], the valid region [55], and the No-Action
region [84]) that require the knowledge of the locations of other objects. This infor-
mation is not available in our case since GPAC is aware only of objects that satisfy
the query predicate. Thus, validity region approaches are not applicable in the case
of spatio-temporal streams.
Figure 4.6a gives an example of a continuous range query (the shaded area) along
with its extended cache area (the dotted area). All objects that lie either in the
query area or in the cache area are considered significant, thus are stored in memory.
Figure 4.6b illustrates the query movement. Since, object P4 is significant one, we
are able to produce P4 as an answer of the continuous query. In case that we do not
have the concept of a caching area, object P4 would not have been significant. Thus,
we would not be able to produce P4 in the query answer. Upon the query movement,
the cache area has to be adjusted based on the new query region (Figure 4.6b).
![Page 76: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/76.jpg)
64
Figures 4.7 and 4.8 give the pseudo code of GPAC when caching is employed as
a means to avoid uncertainty areas. The changes in the basic GPAC algorithm are
limited to the following: (1) When the focal point of a moving query moves, we update
the new Q.CacheArea. Then, we go over all the objects in Q.Cache to determine
whether any of them become part of the query answer (Step 2 in Figure 4.8). Also,
for moving objects that are out of the new query region, we check whether they need
to be moved into Q.Cache or not (Step 3b in Figure 4.8). (2) When the input P is
inside the query area but was not in the previously reported answer, we check if P is
stored in Q.Cache. In this case, we delete P from Q.Cache (Step 2c in Figure 4.7).
(3) When the input P is not inside the query region but was in the old answer,
we check if the new value of P lies in the query region. In this case, we add P to
Q.Cache (Step 3c in Figure 4.7). (4) If P is neither inside the query region nor in
the previous query answer, we maintain the status of P with respect to the query
region (Steps 4 and 5 of Figure 4.7).
The cache area enlarges the query size and hence more input tuples need to be
stored. However, this increase in size is limited and can be neglected in many cases.
For example, consider a square range query with side length x. A conservative cache
area would increase the side length to be x + d where d is the maximum distance an
object can travel between any two consecutive updates. The ratio of area increase
would be (x+d)2−x2
x2 . A typical query region would be orders of magnitude of d, i.e.,
x = md. Thus, the ratio of increase is 2md2+d2
m2d2 = 2m+1m2 , which can be approximated
to 2m
. In a typical scenario, m can be in the order of tens, which results in a slight
overhead in the query size. For example, consider a square range query with side
length 2 miles that monitors the traffic in a downtown area. If objects are moving
with speed 25 miles/hour while updating their locations every 30 seconds, then the
maximum travelled distance for each object is d = 1/8. This will result in increasing
the query area by only 12.5%. Similarly, for the same setting, a query about objects
within 3 miles suffers only an 8.5% increase in size. Notice that the overhead in
having a cache area is reduced by the increase in the area of the original query.
![Page 77: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/77.jpg)
65
Procedure Q.ReceiveTupleII(Tuple P )
Begin
1. If query Q is moving and P is the focal point, Q.UpdateCriteriaII(P ) (Figure 4.8),
return
2. if Q.satisfy(P ) AND P /∈ Q.Answer
(a) Add P to Q.Answer
(b) Send the Positive update P to the user
(c) If P ∈ Q.Cache, delete P from Q.Cache
(d) If Q.IsDynamic(), Q.UpdateCriteriaII(P ) (Figure 4.8)
(e) return
3. If (!Q.satisfy(P )) AND (P ∈ Q.Answer)
(a) Delete P from Q.Answer
(b) Send the Negative tuple P− to the user
(c) If Q.InCacheArea(P ), Insert P in Q.Cache
(d) If Q.IsDynamic(), Q.UpdateCriteriaII(P ) (Figure 4.8)
(e) return
4. If Q.InCacheArea(P )
(a) If P /∈ Q.Cache, Insert P in Q.Cache
(b) return
5. If P ∈ Q.Cache, delete P from Q.Cache.
End
Figure 4.7. Pseudo code of GPAC with caching.
![Page 78: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/78.jpg)
66
Procedure Q.UpdateMovingQueryII(Tuple P )
Begin
1. Update Q.Criteria and Q.CacheArea based on P
2. For all moving objects M in Q.Cache
• If Q.satisfy(M)
(a) Move M from Q.Cache to Q.Answer
(b) Send the Positive update M to the user
• If NOT Q.InCacheArea(M)
– Delete M from Q.Cache
3. For all moving objects M ∈ in Q.Answer
• If NOT Q.satisfy(M)
(a) Send the Negative tuple M− to the user
(b) if Q.InCacheArea(M), move M from Q.Answer to Q.Cache, else,
delete M from Q.Answer.
End
Figure 4.8. Updating query information in GPAC with caching
4.3 Instances of GPAC
In this section, we develop two instances of GPAC, namely, for continuous spatio-
temporal range queries and continuous k-nearest-neighbor queries. Other instances
of GPAC (e.g., reverse nearest-neighbor [33], group nearest-neighbor queries [85],
and time-parameterized queries [86]) can be developed in a similar way.
![Page 79: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/79.jpg)
67
4.3.1 Spatio-temporal Range Queries
Q.Answer is represented by a hash table. Q.Cache is represented as a linked
list that is sorted on the distance from the moving object to the boundary of the
query region. The functions Q.satisfy() and Q.InCacheArea() represent a test
of object P inside the rectangular region of Q and the cache area, respectively.
The function Q.IsDynamic() always returns false for stationary queries and true
for moving queries. This is because static range queries never change their spatial
regions.
4.3.2 Spatio-temporal k-nearest-neighbor
A k-nearest-neighbor query (kNN) is represented in GPAC in the same way as a
range query. The only difference is that the kNN query has a circular region rather
than a rectangular region. Initially, a kNN query is submitted to GPAC with the
format (QID, center, k) or (QID, FocalID, k) for stationary and moving queries,
respectively. Thus, the center of the query circular region is either stated explicitly
as in stationary queries or implicitly as the current location of the object FocalID in
case of moving queries. Once the kNN query is registered in SOLE, the first incoming
k objects are considered the initial query answer. The radius of the circular region
is determined by the distance from the query center to the current kth farthest
neighbor. Then, the query execution continues as a regular range query, yet with
a variable size. Whenever a newly coming object P lies inside the circular query
region, P removes the kth farthest neighbor from the answer set (with a negative
update) and adds itself to the answer set (with a positive update). The query circular
region is shrunk to reflect the new kth neighbor. Similarly, if an object P , which
is one of the k neighbors, updates its location to be outside the circular region, we
expand the query circular region to reflect the fact that P is considered the farthest
kth neighbor. Notice that in case of expanding the query region, we do not output
any updates.
![Page 80: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/80.jpg)
68
Thus, for k-nearest-neighbor queries, Q.Answer and Q.Cache are represented by
a linked list that is sorted on the distance from the moving object to the query focal
point. The functions Q.satisfy() and Q.InCacheArea() represent a test of object P
inside the circular region of Q and the cache area, respectively. The circular region
has the focal point as its center and the distance to the furthest k point as its radius.
The function Q.IsDynamic() always returns true for both stationary and moving
queries.
4.4 Pipelined Spatio-temporal Query Operators
We encapsulate GPAC algorithms for continuous range queries and continuous k-
nearest-neighbor queries into the pipelined query operators GPAC-IN and GPAC-kNN,
respectively. The pipelined operators are implemented inside the PLACE (Pervasive
Location-Aware Computing Environments) server [11, 13]. A typical SQL query
submitted to the PLACE server may have the following form:
SELECT select clause
FROM from clause
WHERE where clause
GPAC-IN in clause
GPAC-kNN knn clause
The in clause may have one of the following two forms:
• Static range query (x1, y1, x2, y2), where (x1, y1) and (x2, y2) represent the top
left and bottom right corners of the rectangular range query.
• Moving rectangular range query (′M ′, ID, xdist, ydist), where ′M ′ is a flag
indicates that the query is moving, ID is the identifier of the query focal point,
xdist is the length of the query rectangle, and ydist is the width of the query
rectangle.
![Page 81: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/81.jpg)
69
Similarly, the knn clause may have one of the following two forms:
• Static kNN query (k, x, y), where k is the number of the neighbors to be main-
tained, and (x, y) is the center of the query point.
• Moving kNN query (′M ′, k, ID), where ′M ′ is a flag indicates that the query is
moving, k is the number of neighbors to be maintained, and ID is the identifier
of the query focal point.
As will be discussed in Section 4.5, pushing the operators GPAC-IN and GPAC-kNN
to the bottom of the execution query plan always achieves the best performance.
However, having the spatio-temporal operators at the bottom or at the middle of
the query evaluation pipeline requires that all the above operators be equipped with
special handling of negative tuples. The NILE query processor [3] handles negative
tuples in pipelined operators as follows: Selection and Join operators handle nega-
tive tuples in the same way as positive tuples. The only difference is that the output
will be in the form of a negative tuple. Aggregates update their aggregate functions
by considering the received negative tuple. The Distinct operator reports a negative
tuple at the output only if the corresponding positive tuple is in the recently re-
ported result. For more details about handling the negative tuples in various query
operators, the reader is referred to [39].
4.5 Performance Evaluation
In this section, we give experimental evidence that encapsulating GPAC algo-
rithms with appropriate cache size into physical pipelined query operators outper-
forms high level implementations. Mainly, the experiments in this section are divided
into two categories:
• Pipelined operators. This set of experiments compare the high level imple-
mentation of GPAC with the encapsulation of GPAC algorithms in pipelined
query operators.
![Page 82: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/82.jpg)
70
Figure 4.9. Greater Lafayette, Indiana, USA.
• Properties of GPAC. In this set of experiments, we study some properties
of GPAC, namely dealing with high data rates and various spatio-temporal
selectivities.
All the results in this section are based on a real implementation of GPAC al-
gorithms and operators inside our prototype database engine for spatio-temporal
streams, PLACE [11,13]. PLACE extends the Nile [3] streaming database manage-
ment system to handle spatio-temporal streams. We run PLACE on Intel Pentium IV
CPU 2.4GHz with 512MB RAM running Windows XP. Without loss of generality,
all the presented experiments are conducted on stationary and moving continuous
spatio-temporal queries. Similar results are achieved when employing continuous
k-nearest-neighbor queries.
We use the Network-based Generator of Moving Objects [74] to generate a set
of moving objects and moving queries in the form of spatio-temporal streams. The
input to the generator is the road map of Greater Lafayatte (a city in the state of
Indiana, USA) given in Figure 4.9. The output of the generator is a set of moving
points that move on the road network of the given city. Moving objects can be
![Page 83: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/83.jpg)
71
cars, cyclists, pedestrians, etc. Any moving object can be a focal of a moving query.
Unless mentioned otherwise, we generate 110K moving objects as follows: Initially,
we generate 10K moving objects from the generator, then we run the generator for
1000 time units. At each time unit, we generate new 100 moving objects. Moving
objects are required to report their locations every time unit T . Failure to do so
results in disconnecting the moving object from the server.
Although, it is appealing to have a conservative cache, a large cache size may
encounter high overhead while maintaining objects inside the cache area. Thus, for
the rest of the experiments, we use the cache size 75% of the conservative cache area.
In most cases, a cache size of 75% would have similar performance as that of the
conservative cache. Notice that a conservative cache is designed for the most speedy
moving object. Most likely, the query focal object is not that object of maximum
speed.
4.5.1 GPAC Operators in a Pipelined Query Plan
In this section, we compare the implementation of GPAC at the application level
with the encapsulation of GPAC inside query operators.
Pipeline with a Selection Operator
Consider the query Q:“Continuously report all trucks that are within MyArea”.
MyArea can be either a stationary or moving range query. A high level implementa-
tion of this query is to have only a selection operator that selects only the “trucks”.
Then, a high level algorithm implementation would take the selection output and
incrementally produce the query result. However, an encapsulation of GPAC into
the GPAC-IN operator allows for more flexible plans. Figure 4.10a gives a query eval-
uation plan when pushing the GPAC-IN operator before the selection operator. The
following is the SQL presentation of the query.
![Page 84: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/84.jpg)
72
JOIN
(b) JOIN
GPAC−IN
SELECT
(a) SELECTION
GPAC−IN
Figure 4.10. Pipelined GPAC operators.
SELECT M.ObjectID
FROM MovingObjects M
WHERE M.type = ”truck”
GPAC-IN MyArea
Figure 4.11 compares the high level implementation of the above query with
pipelined GPAC-IN operators for both stationary and moving queries. The selectivity
of the queries varies from 2% to 64%. The selectivity of the selection operator
is 5%. Our measure of comparison is the number of tuples that go through the
query evaluation pipeline. When GPAC is implemented at the application level,
its performance is not affected by the query selectivity. However, when GPAC-IN
is pushed before the selection, it acts as a filter for the query evaluation pipeline,
thus, limiting the tuples through the pipeline to only the progressive updates. With
GPAC-IN selectivity less than 32%, pushing GPAC-IN before the selection greatly
affects the performance. However, with selectivity more than 32%, it would be
better to have the GPAC-IN operator above the selection operator.
![Page 85: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/85.jpg)
73
0
1000
2000
3000
4000
5000
6000
7000
8000
2 4 8 16 32 64
Tuples in the Pipeline
Query Selectivity
Stationary Pipelined QueryMoving Pipelined Query
Application level
Figure 4.11. Pipelined operators with SELECT.
Pipeline with a Join Operator
In this section, we consider a more complex query plan that contains a join
operator. Consider the query Q: “Continuously report moving objects that belong to
my favorite set of objects and that lie within MyArea”. A high level implementation
of GPAC would probe a streaming database engine to join all moving objects with
my favorite set of objects. Then, the output of the join is sent to the GPAC algorithm
for further processing. However, with the GPAC-IN operator, we can have a query
evaluation plan as that of Figure 4.10b where the GPAC-IN operator is pushed below
the Join operator. The SQL representation of the above query is as follows:
SELECT M.ObjectID
FROM MovingObjects M, MyFavoriteCars F
WHERE M.ObjectID = F.ObjectID
GPAC-IN MyArea
Figure 4.12 compares the high level implementation of the above query with the
pipelined GPAC-IN operator for both stationary and moving queries. The selectivity
![Page 86: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/86.jpg)
74
0
500
1000
1500
2000
2500
3000
0.64 0.32 0.16 0.08 0.04 0.02
Tuples in the Pipeline
Query Size
SOLE as an OperatorSOLE as a Table Function
Figure 4.12. Pipelined operators with Join.
of the queries varies from 2% to 64%. As in Figure 4.11, the selectivity of GPAC does
not affect the performance if it is implemented in the application level. Unlike the
case of selection operators, GPAC provides a dramatic increase in the performance
(around an order of magnitude) when implemented as a pipelined operator. The
main reason in this dramatic gain in performance is the high overhead incurred
when evaluating the join operation. Thus, the GPAC-IN operator filters out the
input tuples and limit the input to the join operator to only the incremental positive
and negative updates.
4.5.2 Properties of GPAC
In this section, we study some properties of GPAC algorithms, namely, dealing
with high rates of data arrival and the spatio-temporal query selectivity.
![Page 87: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/87.jpg)
75
0
1
2
3
4
5
6
7
8
0 500 1000 1500 2000 2500 3000 3500 4000
Delay in output (seconds)
Arrival Rate (tuples/sec)
Stationary QueryMoving Query
Figure 4.13. High arrival rates.
High Arrival Rates
Figure 4.13 gives the result of an experiment that deals with high arrival rates in
GPAC for stationary and moving queries. Spatio-temporal data arrives exponentially
with an arrival rate that varies from 100 tuples per second to 4000 tuples per second.
Our measure is the average output delay. The output delay of a tuple P is the
difference from the time that P enters the system to the time that P has an effect
on the output result. As shown in Figure 4.13, GPAC algorithms can afford up to
2000 tuples per second with only one second in output delay.
Query Selectivity due to Incremental Evaluation
Unlike the selectivity of traditional queries, the selectivity of spatio-temporal
queries is more sophisticated. Figure 4.14 gives the result of an experiment that
shows the selectivity of spatio-temporal queries. We run a continuous spatio-
temporal query Q that should have a selectivity that varies from 10% to 100%.
We call this selectivity as the correct selectivity where it is induced from the spa-
tial area covered by Q. However, the actual selectivity of the spatio-temporal query
![Page 88: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/88.jpg)
76
10
20
30
40
50
60
70
80
90
100
10 20 30 40 50 60 70 80 90 100
Actual Selectivity
Query Selectivity
Actual SelectivityCorrect Selectivity
Figure 4.14. Query selectivity.
is higher than its correct selectivity. The main reason is that in spatio-temporal
queries, moving objects can go back and forth and report themselves in the query
answer as multiple positive and negative tuples. Thus, it may happen that a query
with a smaller area produces more output results than a query with a larger area.
For example, consider a query that covers all the spatial area (i.e., selectivity 100%).
Such a query would never output negative tuples. In addition, once all objects are
inside the query area, no output will be produced due to the progressive property.
Consider another query that has a slightly less area. Due to the area not covered by
this query, it may happen that some tuples go out of the query region and produce
negative tuples. Then, these tuples can move again inside the query area to produce
a set of positive tuples. As a result, a query with smaller area may produce more
output tuples.
4.6 Summary
In this chapter, we introduced a new family of Generic and Progressive Algo-
rithms (GPAC, for short) for continuous query evaluation over spatio-temporal data
![Page 89: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/89.jpg)
77
streams. GPAC is a general skeleton that can be tuned through a set of methods to
behave as various continuous spatio-temporal queries. GPAC provides online, pro-
gressive, and fast response to continuous spatio-temporal queries. We described two
versions of GPAC. The first version (with no caching) is simple to maintain, however,
produces inaccurate answers. In the second version (with caching), we introduce the
concept of anticipation, where the query answer can be anticipated beforehand and
is cached in a cache structure. We show how to realize two types of continuous
spatio-temporal queries form GPAC, namely, continuous range queries and contin-
uous k-nearest-neighbor queries. Moreover, we encapsulate GPAC algorithms into
physical pipelined query operators. Pipelined operators are combined with tradi-
tional operators (e.g., selection and join) to provide online, progressive, and fast
response of a wide variety of continuous spatio-temporal queries. Experimental re-
sults determine the appropriate size of caching in GPAC. In addition, we show that
encapsulating GPAC into pipelined query operators is an order of magnitude better
than implementing GPAC at the application level. Also, GPAC is stable with high
data arrival rates. For arrival rate of 2000 tuples per second, GPAC results in only
one second delay in the query answer.
![Page 90: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/90.jpg)
78
5 STREAM-BASED SPATIO-TEMPORAL QUERY PROCESSING:
SCALABILITY
In this chapter, we focus on the scalable execution of multiple concurrent spatio-
temporal queries in the streaming environments. We propose the Scalable On-Line
Execution algorithm (SOLE, for short) for continuous and on-line evaluation of
concurrent continuous spatio-temporal queries over spatio-temporal data streams.
SOLE combines the recent advances of both spatio-temporal continuous query pro-
cessors and data stream management systems. On-line execution is achieved in SOLE
by allowing only in-memory processing of incoming spatio-temporal data streams.
Scalability in SOLE is achieved by using a shared in-memory buffer pool that is ac-
cessible by all outstanding queries. The scarce memory resource is efficiently utilized
by keeping track of only those objects that are considered significant to the out-
standing continuous queries. Furthermore, SOLE is presented as a spatio-temporal
join between two input streams; a stream of spatio-temporal objects and a stream
of spatio-temporal queries.
To cope with intervals of very high arrival rates of objects and/or queries, SOLE
adopts a self-tuning approach based on load-shedding. Two load shedding techniques
are proposed, namely, query load shedding and object load shedding. Load shedding
techniques continuously negotiate with SOLE to reduce the memory workload in
order to support larger number of queries with a certain guaranteed query accuracy.
Two alternative approaches exist for implementing spatio-temporal algorithms in
database systems: using table functions or encapsulating the algorithm into a physical
pipelined operator. In the first approach, which is employed by existing spatio-
temporal algorithms, algorithms are implemented using SQL table functions [87].
Since there is no straightforward method of pushing query predicates into table
functions [88], the performance of this table function is severely limited and the
![Page 91: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/91.jpg)
79
approach does not give enough flexibility in optimizing the issued queries. The
second approach, which we adopt in SOLE, is to define a query operator that can be
part of a query execution plan. The SOLE operator can be combined with traditional
operators (e.g., join, aggregates, and distinct) to support a wide variety of spatio-
temporal queries. In addition, with the SOLE operator, the query optimizer can
support multiple candidate execution plans.
The rest of this paper is organized as follows: Section 5.1 highlights related work
to SOLE in terms of spatio-temporal query processing ad data stream management
systems. The SOLE framework is presented in Section 5.2. Section 5.3 illustrates
sharing memory resources among concurrent continuous queries in the SOLE frame-
work. The shared execution among continuous queries in SOLE is presented in
Section 5.4. Section 5.5 discusses load shedding in SOLE. Experimental results that
are based on a real implementation of SOLE as an operator inside a data stream
management system are presented in Section 5.6. Finally, Section 5.7 summarizes
this chapter.
5.1 Related Work
Up to our knowledge, SOLE provides the first attempt to furnish query processors
in data stream management systems with the required scalable operators and al-
gorithms to support a scalable execution of concurrent continuous spatio-temporal
queries over spatio-temporal data streams. Since SOLE bridges the areas of spatio-
temporal databases and data stream management systems, we discuss the related
work in each area separately.
5.1.1 Spatio-temporal Databases
Existing algorithms for continuous spatio-temporal query processing focus mainly
on materializing spatio-temporal data in disk-based indexing structures (e.g., hash
tables [17, 18], grid files [8, 25, 61, 89], the B-tree [90], the R-tree [20, 21, 31, 57, 58],
![Page 92: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/92.jpg)
80
and the TPR-tree [22,24]). Scalable execution of concurrent spatio-temporal queries
is addressed recently for centralized [8,19,89] and distributed environments [60,61].
However, the underlying data structure is either a disk-based gird structure [8,61,89]
or a disk-based R-tree [19,60]. None of these techniques deal with the issue of spatio-
temporal data streams. Issues of high arrival rates, infinite nature of data, and
spatio-temporal streams are overlooked by these approaches. With the notion of
data streams, only in-memory algorithms and data structures can be realized.
The most related work to SOLE in the context of spatio-temporal databases
is the SINA framework [8]. SOLE has common functionalities with SINA where
both of them utilize a shared grid structure to produce incremental query results.
However, SOLE distinguishes itself from SINA and other scalable spatio-temporal
query processors (e.g., [61, 89]) in the following aspects: (1) SOLE is an in-memory
algorithm where all data structures are memory-based. (2) SOLE is equipped with
load shedding techniques to cope with intervals of high arrival rates of moving objects
and/or queries. (3) As a result of the streaming environment, SOLE deals with
new challenging issues, e.g., uncertainty in query areas, scarce memory resources,
and approximate query processing. (4) SOLE is encapsulated into a physical non-
blocking pipelined query operator where the result of SOLE is produced one tuple
at a time. Previous scalable spatio-temporal query processors (e.g., SINA [8], SEA-
CNN [89], Q-Index [19], and MobiEyes [61]) can be implemented only as a table
function where the result is produced periodically in batches.
5.1.2 Data Stream Management Systems
Existing prototypes for data stream management systems [45, 62, 79, 80] aim to
efficiently support continuous queries over data streams. However, the spatial and
temporal properties of data streams and/or continuous queries are overlooked by
these prototypes. With limited memory resources, existing stream query processors
adopt the concept of sliding windows to limit the number of tuples stored in-memory
![Page 93: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/93.jpg)
81
to only the recent tuples [40,41,91]. Such model is not appropriate for many spatio-
temporal applications where the focus is on the current status of the database rather
than on the recent past. The only work for continuous queries over spatio-temporal
streams is the GPAC algorithm [9]. However, GPAC is concerned only with the
execution of a single outstanding continuous query. In a typical data stream envi-
ronment, there is a huge number of outstanding continuous queries in which GPAC
cannot afford.
Scalable execution of continuous queries in traditional data streams aim to either
detect common subexpressions [48,62,92] or share resources at the operator level [38,
41, 93]. SOLE exploits both paradigms where evaluating multiple spatio-temporal
queries is performed as a spatio-temporal join between an object stream and a query
stream while a shared memory resource (buffer pool) is maintained to support all
continuous queries. Load shedding in data stream management systems is addressed
recently in [94, 95]. The main idea to add a special operator to the query plan
to regulate the load by discarding unimportant incoming tuples. Load shedding
techniques in SOLE are distinguished from other approaches where in addition to
discarding some of the incoming tuples, SOLE voluntary drops some of the tuples
stored in-memory.
The most related work to SOLE in the context of data stream management
systems is the NiagaraCQ framework [62]. SOLE has common functionalities with
NiagaraCQ where both of them utilize a shared operator to join a set of objects with
a set of queries. However, SOLE distinguishes itself from NiagaraCQ and other data
stream management systems in the following: (1) As a result of the spatio-temporal
environment, SOLE has to deal with new challenging issues, e.g., moving queries,
uncertainty in query areas, positive and negative updates to the query result. (2) In
a highly overloaded system, SOLE provides approximate results by employing load
shedding techniques. (3) In addition to sharing the query operator as in NiagaraCQ,
SOLE share memory resources at the operator level.
![Page 94: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/94.jpg)
82
Q1
Q2
QN .
.
Range
. .
+/−
. .
Q1
Q2
QN
Split
+/− +/−
Queries (Q)Objects (P)
Stream of Moving Objects (P)
+/−+/−+/−
buffer for each query
(a) Separate query plan and
buffer pool for all queries
(b) Shared operator and shared
+− ji
(Q , P )
Stream ofSpatio−temporalMoving
Stream of
Buffer
. . .
Operator
Range
. . .
. . .
kNN
. . .
. . .
Join
SharedSpatio−temporal
Figure 5.1. Overview of shared execution in SOLE.
5.2 The SOLE Framework
Figure 5.1a gives the pipelined execution of N queries (Q1 to QN ) of various types
where each query is considered a separate entity. With each single query Qi, an in-
memory buffer Bi is maintained to keep track of moving objects that are needed
by Qi. Such approach is employed by continuous query algorithms for single query
execution (e.g., GPAC [9]). In a typical spatio-temporal application (e.g., location-
aware servers), there are large numbers of concurrent spatio-temporal continuous
queries. Dealing with each query as a separate entity would easily consume the
system resources and degrade the system performance.
Our proposed SOLE approach is designed to support scalable execution of con-
tinuous spatio-temporal queries of various types. Figure 5.1b gives the pipelined
execution of the same N queries as in Figure 5.1a, yet with the shared SOLE op-
erator. The problem of evaluating concurrent continuous queries is reduced to a
spatio-temporal join between two streams; a stream of moving objects and a stream
of continuous queries. The shared SOLE operator has a shared buffer pool that is
accessible by all continuous queries. The output of the SOLE operator has the form
![Page 95: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/95.jpg)
83
(Qi,±Pj) which indicates an addition/removal of object Pj to/from query Qi. The
SOLE operator is followed by a split operator that distributes the output of SOLE
either to the users or to the various query operators. The split operator is similar to
the one used in NiagaraCQ [62] and it is out of the focus of this paper. Our focus
is in realizing the shared memory buffer and the shared SOLE spatio-temporal join
operator.
Without loss of generality, we present SOLE in the context of stationary and
moving rectangular range queries. The extension to other continuous query types is
straightforward. While a stationary query is represented only by its region, a moving
query should be bounded to a focal moving object. For example, if a moving object
M issues a query Q that asks about objects within a certain range of M , then M is
considered the focal object of Q. The region of the moving query is determined by
the continuous movement of its focal object.
5.3 Shared Memory in SOLE
SOLE maintains a simple grid structure as an in-memory shared buffer pool
among all continuous queries and objects. The shared buffer pool is logically divided
into two parts; a query buffer that stores all outstanding continuous queries and an
object buffer that is concerned with moving objects. In addition to the grid structure,
SOLE employs a hash table h to index moving objects based on their identifiers.
5.3.1 Shared Object Buffer
To optimize the scarce memory resource, SOLE employs two main techniques:
(1) Rather than redundantly storing a moving object P multiple times with each
query Qi that needs P , SOLE stores P at most once along with a reference counter
that indicates the number of continuous queries that need P . (2) Rather than storing
all moving objects, SOLE keeps track with only the significant objects. Insignificant
![Page 96: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/96.jpg)
84
objects are ignored (i.e., dropped) from memory. Significant objects are defined as
follows:
Definition 5.3.1 A moving object P is considered significant if P satisfies any of
the following two conditions: (1) There is at least one outstanding continuous query
Q that shows interest in object P (i.e., P has a non-zero reference counter), (2) P
is the focal object of at least one outstanding continuous query.
The definition of significant objects relies on the concept that a certain query
shows interest in a certain object, which will be clarified in the next section.
Having the previous definition of significant objects, SOLE continuously maintains
the following assertion:
Assertion 1 Only significant objects are stored in the shared memory buffer
To always satisfy this assertion, SOLE continuously keeps track of the following:
(1) A newly incoming data object P is stored in memory only if P is significant,
(2) At any time, if an object P that is already stored in the shared buffer becomes
insignificant, we drop P immediately from the shared buffer.
Significant moving objects are hashed to grid cells based on their spatial loca-
tions. An entry of a significant moving object P in a grid cell C has the form
(PID, Location, RefCount, FocalList). PID and Location are the object identi-
fier and location, respectively. RefCount indicates the number of queries that are
interested in P . FocalList is the list of active moving queries that have P as their
focal object.
5.3.2 Shared Query Buffer
The concepts of uncertainty and caching have been introduced in [9] in the context
of single continuous query. SOLE generalizes these concepts to be applicable to
scalable execution of multiple concurrent continuous queries. Based on the caching
area, we define when a query Q is interested in an object P as follows:
![Page 97: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/97.jpg)
85
Definition 5.3.2 A continuous query Q is interested in object P if P either lies in
Q’s spatial area or in Q’s cache area.
Unlike data objects that are stored in only one grid cell, continuous queries are
stored in all grid cells that overlap either the query spatial area or the query cache
area. A query entry in a grid cell contains only the query identifier (QID). The
spatial region for each query is stored separately in a global lookup table. The
redundancy of storing the query identifier multiple times can be reduced by using
the optimizations discussed in Section 5.3.3. The trade-offs of the cache area size
along with the size of the grid structure are evaluated experimentally in Section 5.6.
5.3.3 Optimizing the Shared Buffer Pool
The shared memory buffer may suffer from redundancy where every query iden-
tifier QID is stored in all gird cells that overlap the query region. In this section,
we discus two optimizations, namely the layered grid and the edgy grid that aim to
reduce the redundancy in the shared memory buffer.
Layered Grid. The layered grid optimization uses multiple layers of grids with
different resolutions. A query identifier is stored in the lowest grid layer L that
results in less redundancy. All objects are stored in the lowest grid layer (i.e., the
one with the highest resolution). Although the layered grid reduces the redundancy
in the shared buffer, it may increase the processing time because a newly incoming
object needs to be joined with one grid cell from each layer.
Edgy Grid. The edgy grid optimization uses only one grid where the query identifier
QID is stored only at the grid cells that intersect the query boundary. Thus, we
do not need to store the query identifier in grid cells that are completely inside the
query region. The edgy grid optimization has two main advantages: (1) Redundancy
is greatly reduced, especially for large query sizes or small-sized grid cells. (2) The
execution time of updating the location of an object P is also reduced. The main
reason is that P is tested against fewer queries (i.e., only those that have boundaries
![Page 98: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/98.jpg)
86
Buffer
Object
Expired
Objects
(1)Read
(3)Delete
object(s)
(2)Ignore
queries
Stationary
queries
+− ji
(Q , P )
(1)Store
with
Moving
object
(3)Delete
temporal queries (Q)moving objects (P)
Queries (Q) Objects (P)
with
JOIN
+/− +/−
Yes/No
Yes
(2)Read
JOIN
Stream of spatio−Stream of
Query
Buffer
Is Focal?
(2)Store
Figure 5.2. Shared join operator in SOLE.
in CP , the grid cell of P ). These fewer queries are the only ones ones that can
produce positive or negative updates. The drawback of the edgy grid optimization is
in the case of receiving a new object P that is not stored in memory. In this case,
the old location of P is considered to be out of the grid space. Then, all the grid
cells from the old location to the new location of P have to be tested.
5.4 Shared Execution in SOLE
Figure 5.2 gives the architecture of the shared spatio-temporal join operator. For
any incoming data object, say P , the shared spatio-temporal join operator consults
its query buffer to check if any query is affected by P (either in a positive or a
negative way). Based on the result, we decide either to store P in the object buffer
or to ignore P and delete P ’s old location (if any) from the object buffer. On the
other hand, for any incoming continuous query, say Q, we store Q or update Q’s
![Page 99: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/99.jpg)
87
Procedure IncomingNewObject(Object P , GridCell CP ) Begin
1. For each Query Qi ∈ CP AND P ∈ Qi
(a) P.RefCount++
(b) if (P ∈ Qi) then output (Qi,+P ).
2. if (P.RefCount) then store P in CP and in hash table h.
End.
Figure 5.3. Pseudo code for receiving a new value of P .
old location (if any) in the query buffer. Then, we consult the object buffer to
check if any of the objects need to be added to or removed from Q’s answer. Based
on this operation, some in-memory stored objects may become insignificant, hence,
are deleted from the object buffer. Stationary queries are submitted directly to the
join operator, while moving queries are generated from the movement of their focal
objects.
Based on the data stored in the shared buffer, SOLE distinguishes among four
types of data inputs: (1) A new data object P that is not stored in memory, (2) Up-
date of the location of object P , (3) A new stationary query Q, (4) An update of the
region of a moving query Q. Figures 5.3, 5.4, 5.6, and 5.7 give the pseudo code of
SOLE upon receiving each input type. The details of the algorithms are described
below. SOLE makes use of the following notations: Q indicates the extended query
region that covers the cache area so that Q ⊂ Q. CQ, CQ are the set of grid cells that
are covered by Q and Q, respectively. CP represents a single grid cell that covers
the object P .
Input Type I: A new object P . Figure 5.3 gives the pseudo code of SOLE upon
receiving a new object P in the grid cell CP (i.e., P is not stored in memory). P is
tested against all the queries that are stored in CP (Step 1 in Figure 5.3). For each
query Qi ∈ CP , only three cases can take place: (1) P lies in Qi but not in Q. In
![Page 100: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/100.jpg)
88
Procedure UpdateObj(Object Pold,P , GridCell CPold,CP ) Begin
1. For each query Qi ∈ P.FocalList, UpdateQuery(Qi)
2. Let L be the line (Pold, P )
3. For each query Qi ∈ (CPold∪ CP )
(a) if Qi intersects L, then
• if P ∈ Qi then Output (Qi,+P ), if Pold /∈ Qi, P.RefCount++
• else Output (Qi,−P ), if P /∈ Qi then P.RefCount−−
(b) else if Qi intersects L
• if P ∈ Qi then P.RefCount++, else P.RefCount−−
4. if (!P.RefCount) then delete Pold and ignore P , return.
5. if (CPold6= CP ) then move Pold from CPold
to CP .
6. Update the location of Pold to that of P in CP .
End.
Figure 5.4. Pseudo code for updating P ’s location.
this case, we need only to increase the reference counter of P to indicate that there
is one more query interested in P (Step 1a in Figure 5.3). Notice that no output is
produced in this case since P does not satisfy Qi. (2) P satisfies Qi. In this case,
in addition to increasing the reference counter, we output a positive update that
indicates the addition of P to the answer set of Qi (Step 1b in Figure 5.3). In the
above two cases, P is stored in the shared buffer as it is considered significant. (3) P
neither satisfies Qi nor lies in Qi. Thus, P is simply ignored as it is insignificant.
Input Type II: An update of P . Figure 5.4 gives the pseudo code of SOLE upon
receiving an update of object P ’s location. The old location of P is retrieved from
the hash table h. First, we evaluate all moving queries (if any) that have P as their
![Page 101: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/101.jpg)
89
(b) Action taken for each case(a) All cases of updating P’s location
L
L2
L5
1
L9
4
L3
L
P
L6
9
3
L4
L7 old
+P
P +P
L8L2L1
L L5 L6
L7 L8 L
−PIN
Out
new
−P
Cache
OutCache
RefCount−−
RefCount−−
RefCount++RefCount++
IN
Figure 5.5. All cases of updating P ’s location.
focal object (Step 1 in Figure 5.4). Then, we check all the queries that belong to
either CP or CPold(Step 3 in Figure 5.4) against the line L that connects P and Pold.
Figure 5.5a gives nine different cases for the intersection of L with Q where Pold and
P are plotted as white and black circles, respectively. Both Pold and P can be in one
of the three states, in, cache, or out that indicates that P satisfies Q, in the cache
area of Q, or does not satisfy Q, respectively. The action taken for each case is given
in Figure 5.5b. Basically, if there is no change of state from Pold to P (e.g., L1, L5,
and L9), no action will be taken. If Pold was in Q, however, P is not, (e.g., L2 and
L3) we output the negative update (Q,−P ). The reference counter is decreased only
when Pold is of interest to Q while P is not (e.g., L3 and L6). Notice that in the case
of L2, we do not need to decrease the reference counter where although P does not
satisfy Q, P is still of interest to Q as P lies in Qi. Also, in the case of L6, we do
not need to output a negative update, however we decrease the reference counter. In
this case, since P and Pold are not in the answer set of Q, there is no need to update
the answer. Similarly, with a symmetric behavior, we output a positive update in
the cases of L4 and L7 and we increment the reference counter in the cases of L7 and
L8. After testing all cases, we check whether object P becomes insignificant. If this
is the case, we immediately drop P from memory (Step 4 in Figure 5.4). If P is still
![Page 102: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/102.jpg)
90
Procedure StationaryQuery(Query Q) Begin
• For each grid cell cj ∈ CQ
1. Register Q in cj
2. For each object Pi ∈ cj AND Pi ∈ Q
– P.RefCount++, if P ∈ Q then output (Q,+P )
End.
Figure 5.6. Pseudo code for receiving a new query Q.
significant, we update P ’s location and cell (if needed) in the grid structure (Steps 5
and 6 in Figure 5.4).
Input Type III: A new query Q. Figure 5.6 gives the pseudo code of SOLE upon
receiving a continuous stationary query Q. Basically, we register Q in all the grid
cells that are covered by Q. In addition, we test Q against all data objects that are
stored in these cells. We increase the reference counter of only those objects that lie
in Q. In addition, objects that satisfy Q results in producing positive updates.
Input Type IV: An update of Q’s region. Figure 5.7 gives the pseudo code
of SOLE upon receiving an update of a moving query region. All stored objects
in all cells that are covered by the old and new regions of Q are tested against Q.
Figure 5.8a divides the space covered by the old and new regions of Q into seven
regions (R1-R7). The action taken for any point that lies in any of these regions is
given in Figure 5.8b. Similar to Figure 5.5b, a region Ri could have any of the three
states in, cache, or out based on whether Ri is inside Q, is in the cache area of Q, or
is outside Q. Basically, no action is taken for objects in any region Ri that maintains
its state for both Q and Qold (e.g., R4). If a region Ri is inside Qold, but is not in Q,
(e.g., R2 and R3), we output a negative update for each object in Ri. We decrement
the reference counter of these objects only if they lie in the region that is out of
the new cache area (e.g., R2) (Step 1 in Figure 5.7). Also, the reference counter is
![Page 103: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/103.jpg)
91
Procedure UpdateQuery(Query Qold, Q) Begin
• For each object Pi ∈ (CQold∩ CQ)
1. if Pi ∈ Qoldthen
– if Pi /∈ Q then (Output (Q,−Pi), if Pi /∈ Q then (Pi.RefCount−−, if
(!Pi.RefCount) thendelete(Pi)))
2. else if Pi ∈ Q then (Output (Q,+Pi), if Pi /∈ Qold then Pi.RefCount++)
3. else if Pi ∈ Qold AND Pi /∈ Q then (Pi.RefCount−−, if (!Pi.RefCount)
then delete(Pi))
4. else if Pi ∈ Q AND Pi /∈ ˆQold then Pi.RefCount++.
• Register Q in CQ − CQold, unregister Q from CQold
− CQ
End.
Figure 5.7. Pseudo code for updating a query.
������������������������������������������
������������������������������������������
���������������������������������������������������������������
���������������������������������������������������������������
������������������������
������������������������
������������������������������������������������������������������������������������������������������������������������
������������������������������������������������������������������������������������������������������������������������
������������������������
������������������������
(b) Action taken for each case(a) All cases of updating query region
������������������������������������
������������������������������������
������������������������������������
������������������������������������
����������������������������������������������������������������������
����������������������������������������������������������������������
R4R2R1
R
Q
Q
RefCount−−
+P
+P
−P−PIN
Out
R3Cache
OutCache
5R 76
IN
R old
RefCount++ RefCount++
oldQ
RefCount−−
newnew
Q
R
4R
1RR4
R7R6
R5
R3 24R
Figure 5.8. All cases of updating Q’s region.
decremented for all objects in the region that are in the old cache area but are out of
the new cache area (e.g., R1) (Step 3 in Figure 5.7). Similarly, the reference counter
is increased for regions R6 and R7 while a positive output is sent for the points in
regions R5 and R6. Notice that whenever we decrement the reference counter for
![Page 104: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/104.jpg)
92
(3) Memory Load
QueriesObjects
(5) STOP (Memory is OK)
Update
(1) Trigger
(Memory is almost full)
(2) Update criteria
(4) Update criteria
Statistics
Shared
Join
Operator
Expired
ObjectsShedding
Load
Figure 5.9. Architecture of self tuning in SOLE.
any moving object P , we check whether P becomes insignificant. If this is the case,
we immediately drop P from memory (e.g., Steps 1 and 3 in Figure 5.8). Finally, Q
is registered in all the new cells that are covered by the new region and not the old
region. Similarly, Q is unregistered from all cells that are covered by the old region
and not the new region.
5.5 Load Shedding in SOLE
Even with the scalability features of SOLE, the memory resource may be ex-
hausted at intervals of unexpected massive numbers of queries and moving objects
(e.g., during rush hours). To cope with such intervals, SOLE is equipped with a self-
tuning approach that tunes the memory load to support a large number of concurrent
queries, yet with an approximate answer. The main idea is to tune the definition
of significant objects based on the current workload. By adapting the definition of
significant objects, the memory load will be shed in two ways: (1) In-memory stored
objects will be revisited for the new meaning of significant objects. If an insignificant
![Page 105: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/105.jpg)
93
object is found, it will be shed from memory. (2) Some of the newly input data will
be shed at the input level.
Figure 5.9 gives the architecture of self-tuning in SOLE. Once the shared join
operator incurs high resource consumption, e.g., the memory becomes almost full,
the join operator triggers the execution of the load shedding procedure. The load
shedding procedure may consult some statistics that are collected during the course
of execution to decide on a new meaning of significant objects. While the shared join
operator is running with the new definition of significant objects, it may send updates
of the current memory load to the load shedding procedure. The load shedding
procedure replies back by continuously adopting the notion of significant objects
based on the continuously changing memory load. Finally, once the memory load
returns to a stable state, the shared join operator retains the original meaning of
significant objects and stops the execution of the load shedding procedure. Solid
lines in Figure 5.9 indicate the mandatory steps that should be taken by any load
shedding technique. Dashed lines indicate a set of operations that may or may not
be employed based on the underlying load shedding technique. In the rest of this
section, we propose two load shedding techniques, namely query load shedding and
object load shedding.
5.5.1 Query Load Shedding
The main idea of query load shedding is to negotiate the query region with the
user. Whenever a query, say Q, is submitted to SOLE, Q specifies the minimum
accuracy that is acceptable by Q. Initially, the submitted query Q is evaluated
with complete accuracy. However, when the system is overloaded, Q’s accuracy is
degraded to its minimum permissible accuracy. Reducing the accuracy is achieved
by shrinking Q’s cache area from all directions to have a smaller cache area. After
we are done with all the cache area, if the system is still overloaded, and we have not
reached to the minimum permissable accuracy yet, we start to reduce Q’s area itself.
![Page 106: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/106.jpg)
94
Thus, the notion of significant objects is adopted to be those tuples that lie in the
reduced query area of at least one continuous outstanding query. By reducing the
query sizes of all outstanding queries, objects that are outside of the reduced area
and are not of interest to any other query are immediately dropped from memory
and the corresponding negative updates are sent. During the course of execution, we
gradually increase the query size to cope with the memory load. Finally, when the
system reaches a stable state, we retain the original query sizes.
Query load shedding has two main advantages: (1) It is intuitive and simple to
implement where there is no need to maintain any kind of statistical information,
and (2) Insignificant objects are immediately dropped from memory. On the other
side, there are two main disadvantages: (1) The query load shedding process is
expensive, where it scans all stored objects and queries. This exhaustive behavior
results in pause time intervals where the system cannot produce output nor process
data inputs. (2) Although the query accuracy is guaranteed (assuming uniform data
distribution), there is no guarantee of the amount of reduced memory. Assume the
case that the reduced area from a query Qi lies completely inside another query Qj.
Thus, even though Qi is reduced, we cannot drop tuples from the reduced area where
they are still needed by Qj . Thus, the accuracy of Qi is reduced, yet the amount of
memory is not.
5.5.2 Object Load Shedding
The main idea of object load shedding is to drop objects that have less effect on
the average query accuracy. Thus, the definition of significant objects is adopted to
be those objects that are of interest to at least k queries (i.e., objects with reference
counter greater than or equal k). Notice that the original definition of significant
objects implicitly assumes that k = 1. A key point in object load shedding is that we
do not perform an exhaustive scan to drop insignificant objects. Instead, insignificant
objects are lazily dropped whenever they get accessed later during the course of
![Page 107: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/107.jpg)
95
execution. Such lazy behavior completely avoids the pause time intervals in query
load shedding. In contrast to query load shedding, in object load shedding, we
guarantee the reduced memory load.
During the course of execution, we monitor the memory load and de-
crease/increase k accordingly. Once the system stabilizes and returns to its orig-
inal state, we set k = 1 to retain the original execution of SOLE. Determining the
threshold value k is achieved by maintaining a statistical table S that keeps track of
the number of objects that satisfy a certain number of queries. Assuming that we
will never drop an object that has a reference counter greater than N , then S can
be represented as an array of N numbers where the jth entry in S corresponds to
the number of moving objects that are of interest to j queries. Whenever the system
is overloaded, we go through S to get the minimum k that achieves the required
reduced load.
5.5.3 Load Shedding with Locking
Degenerate cases may affect severely the behavior of load shedding. Consider the
case of a query Q that has only one object P as its answer while P is not of interest
to any other query. By applying object load shedding, P will be dropped where it
is of interest to only one query Q. Thus, the accuracy of Q is dropped to zero. To
alleviate such problem, we use a locking technique. Basically, each query Q has a
threshold n where if Q has less than n objects in its answer set, all the n objects are
locked. Locked objects do not participate in the statistical table S. Once an object
is locked, the corresponding entry in S is updated. Whenever we lazily drop objects
from memory, we make sure that we do not drop any locked object. The concept of
locking can also be generalized to accommodate locking of important objects and/or
queries.
![Page 108: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/108.jpg)
96
5.6 Performance Evaluation
In this section, we study the performance of various aspects of SOLE. All the
experiments in this section use the Network-based Generator of Moving Objects [74]
to generate a set of moving objects and moving queries. The input to the generator is
the road map of Oldenburg (a city in Germany), The output of the generator is a set
of moving objects that move on the road network of the given city. Unless mentioned
otherwise, we generate 100K moving objects and 50K queries. The maximum speed
of any object covers 10% of the space along any dimension. All experiments are based
on a real implementation of the SOLE operator inside the engine of a prototype data
stream management system [3] where the SOLE operator is always at the bottom of
the query pipeline. The underlying machine is Intel Pentium IV CPU 2.4GHz with
256MB RAM running Windows XP.
5.6.1 Properties of SOLE
Figures 5.10a gives the performance of the first 25 seconds of executing a moving
query of size 0.5% of the space with a cache area that is 25% of the conservative
cache area. Our performance measure is the query accuracy that is represented as the
percentage of the number of produced tuples to the actual number that should have
been produced if all moving objects are materialized into secondary storage. With
only 25% cache, the query accuracy is almost stable with minor fluctuations that
degrade the accuracy to only 95%. No caching would result in a highly fluctuating
performance while a conservative caching would result in having a single line that
always have 100% accuracy.
Figure 5.10b gives the memory overhead when using a 25%, 50%, or 100% (con-
servative) cache sizes. The overhead is computed as a percentage from the original
query memory requirements. Thus a 0% cache does not incur any overhead. On
average a 25% cache results in only 10% overhead over the original query, while the
50% and 100% caches result in 25% and 50% overhead, respectively. As a compro-
![Page 109: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/109.jpg)
97
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25
Percentage of result
Time
(a) 25% Cache
0
10
20
30
40
50
60
70
80
0 5 10 15 20 25
Overhead Percentage
Time
100% Cache50% Cache25% Cache
(b) Cache Overhead
Figure 5.10. Cache area in SOLE.
0
10
20
30
40
50
60
70
0 20 40 60 80 100 120 140
Redundancy
Grid Size
(a) Redundancy
10
15
20
25
30
0 20 40 60 80 100 120 140
Response Time
Grid Size
(b) Response Time
Figure 5.11. Grid Size.
mise between the cache overhead and the query accuracy, we use a 25% cache in
SOLE in all the following experiments.
Figure 5.11 studies the trade-offs for the number of grid cells in the shared mem-
ory buffer of SOLE for 50K moving queries of various sizes. Increasing the number
of cells in each dimension increases the redundancy that results from replicating the
query entry in all overlapping grid cells. On the other hand, increasing the grid size
![Page 110: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/110.jpg)
98
6
8
10
12
14
16
18
20
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Ratio Over Non-Sharing
Query Size
Sharing
(a) Ratio
8184 634460.01
0.09
0.25
0.49
0.81
% RatioSingle Sharing
7.75
8.83
11.51
13.85
17.64
19.49
8250
4016
2577
2082
2007
934
349
186
118
1031
(b) Table of values
Figure 5.12. Maximum Number of Supported Queries.
results in a better response time. The response time is defined as the time interval
from the arrival of an object, say P , to either the time that P appears at the output
of SOLE or the time that SOLE decides to discard P . When the grid size increases
over 100, the response time performance degrades. Having a grid of 100 cells in each
dimension results in a total of 10K small-sized grid cells, thus, with each movement
of a moving query Q, we need to register/unregister Q in a large number of grid
cells. As a compromise between redundancy and response time, SOLE uses a grid
of size 30 in each dimension.
5.6.2 Scalability of SOLE
Figure 5.12 compares the performance of the SOLE shared operator as opposed
to dealing with each query as a separate entity (i.e., with no sharing). Figure 5.12a
gives the ratio of the number of supported queries via sharing over the non-sharing
case for various query sizes. Some of the actual values are depicted in the table
in Figure 5.12b. For small query sizes (e.g., 0.01%) with sharing, SOLE supports
more than 60K queries, which is almost 8 times better than the case of non-sharing.
![Page 111: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/111.jpg)
99
0
0.2
0.4
0.6
0.8
1
1.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Number of Points in Query Area (M)
Query Size
SharingNo Sharing
(a) Query area
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Number of Points in Cache Area (M)
Query Size
SharingNo Sharing
(b) Cache area
Figure 5.13. Data size in the query and cache areas.
The performance of sharing increases with the query size where it becomes 20 times
better than non-sharing in case of query size 1% of the space. The main reason of
the increasing performance with the size increase is that sharing benefits from the
overlapped areas of continuous queries. Objects that lie in any overlapped area are
stored only once in the sharing case rather than multiple times in the non-sharing
case. With small query sizes, overlapping of query areas is much less than the case
of large query sizes.
Figures 5.13a and 5.13b give the memory requirements for storing objects in the
query region and the query cache area, respectively, for 1K queries over 100K moving
objects. In Figure 5.13a, for large query sizes (e.g., 1% of the space), a non-shared
execution would need a memory of size 1M objects, while in SOLE, we need, at most,
a memory of size 100K objects. The main reason is that with non-sharing, objects
that are needed by multiple queries are redundantly stored in each query buffer,
while with sharing, each object is stored at most once in the shared memory buffer.
Thus, in terms of the query area, SOLE has a ten times performance advantage
over the non-shared case. Figure 5.13b gives the memory requirement for storing
objects in the cache area. The behavior of the non-sharing case is expected where
![Page 112: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/112.jpg)
100
5
10
15
20
25
30
35
40
45
50
5 10 15 20 25 30 35 40 45 50
Response Time (msec)
Number of Queries (k)
StaticMoving
(a) Number of queries (b) Query size & Percent of moving
queries
Figure 5.14. Response time in SOLE.
the memory requirements increase with the increase in the query size. Surprisingly,
the caching overhead in the case of sharing decreases with the increase in the query
size. The main reason is that with the size increase, the caching area of a certain
query is likely to be part of the actual area of another query. Thus, objects that are
inside this caching area are not considered an overhead, where they are part of the
actual answer of some other query.
5.6.3 Response Time
Figure 5.14a gives the effect of the number of concurrent continuous queries
on the performance of SOLE. The number of queries varies from 5K to 50K. Our
performance measure is the average response time. The response time is defined as
the time interval from the arrival of object P to either the time that P appears at the
output of SOLE or the time that SOLE decides to discard P . We run the experiment
twice; once with only stationary queries, and the second time with only moving
queries. The increase in response time with the number of queries is acceptable
since as we increase the number of queries 10 times (from 5K to 50K), we get only
![Page 113: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/113.jpg)
101
0
20
40
60
80
100
0 20 40 60 80 100
Load
Accuracy
Query Load SheddingObject Load Shedding
(a) 1K Queries
0
20
40
60
80
100
0 20 40 60 80 100
Load
Accuracy
Query Load SheddingObject Load Shedding
(b) 25K Queries
Figure 5.15. Load Vs. Accuracy.
twice the increase in response time in the case of stationary queries (from 11 to 22
msec). The performance of moving queries has only a slight increase over stationary
queries (2 msec in case of 50K queries).
Figure 5.14b gives the effect of varying both the query size and the percentage
of moving queries on the response time of the SOLE operator. The number of
outstanding queries is fixed to 30K. The response time increases with the increase
in both the query size and the percentage of moving queries. However, the SOLE
operator is less sensitive to the percentage of moving queries than to the query size.
Increasing the percentage of moving queries results in a slight increase in response
time. This performance indicates that SOLE can efficiently deal with moving queries
in the same performance as with stationary queries. On the other hand, increasing
the query size from 0.01% to 1% only doubles the response time (from around 12
msec to around 24 msec) for various moving percentages.
![Page 114: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/114.jpg)
102
5.6.4 Accuracy of Load Shedding
Figures 5.15a and 5.15b compare the performance of query and object load shed-
ding techniques for processing 1K and 25K queries with various sizes, respectively.
Our performance measure is the reduced load to achieve a ceratin query accuracy.
When the system is overloaded, we vary the required accuracy from 0% to 100%.
In degenerate cases, setting the accuracy to 100% requires keeping the whole mem-
ory load (100% load) while setting the accuracy to 0% requires deleting all memory
load. The bold diagonal line in Figure 5.15 represents the required accuracy. It is
“expected” that if we ask for m% accuracy, we will need to keep only m% of the
memory load. Thus, reducing the memory load to be lower than the diagonal line
is considered a gain over the “expected” behavior. The object load shedding always
maintains better performance than that of the query load shedding. For example, in
the case of 1K queries, to achieve an average accuracy of 90%, we need to keep track
of only 85% of the memory load in the case of object load shedding while 97% of
the memory is needed in the case of query load shedding. The performance of both
load shedding techniques is worse with the increase in the number of queries to 25K.
However, the object load shedding still keeps a good performance where it is almost
equal to the “expected” performance. The performance of query load shedding is
dramatically degraded where we need more than 90% of the memory load to achieve
only 20% accuracy.
Figures 5.16a and 5.16b compare the performance of query and object load shed-
ding to achieve an accuracy of 70% and 90%, while varying the number of queries
from 2K to 32K. The object load shedding greatly outperforms the query load shed-
ding and results in a better performance than the “expected” reduced load for all
query sizes. The main reason behind the bad performance of query load shedding is
that in the case of a large number of queries, there are high overlapping areas. Thus,
the reduced area of a certain query is highly likely to overlap other queries. So, even
![Page 115: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/115.jpg)
103
55
60
65
70
75
80
85
90
95
100
0 5 10 15 20 25 30
Load
Number Queries (K)
Query Load SheddingObject Load Shedding
(a) 70% Accuracy
75
80
85
90
95
100
0 5 10 15 20 25 30
Load
Number Queries (K)
Query Load SheddingObject Load Shedding
(b) 90% Accuracy
Figure 5.16. Reduced load for a certain accuracy.
though we reduce the query area, we cannot drop any of the tuples that lie in the
reduced area. Such tuples are still of interest to other outstanding queries.
5.6.5 Scalability of Load Shedding
Figure 5.17a gives the ratio of the number of supported queries with query and
object load shedding techniques over the sharing case with no load shedding. All
queries are supported with a minimum accuracy of 90%. Depending on the query
size, query load shedding can support up to 3 times more queries than the case with
no load shedding. This indicates a ratio of up to 60 times better than the non-sharing
cases (refer to the table in Figure 5.12b). On the other hand, object load shedding
has much better scalable performance than that of query load shedding. With object
load shedding SOLE can have up to 13 times more queries than the case of no load
shedding, which indicates up to 260 times than the case of no sharing.
Figure 5.17b gives the performance of the query and object load shedding tech-
niques in terms of maintaining the average query accuracy with the arrival of con-
tinuous queries. The horizontal access advances with time to represent the arrival
![Page 116: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/116.jpg)
104
0
2
4
6
8
10
12
14
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Ratio over no load shedding
Query Size
Query Load SheddingObject Load Shedding
(a) Ratio
0
20
40
60
80
100
0 500 1000 1500 2000 2500 3000
Average Query Accuracy
Number of Queries (one at a time)
Query Load SheddingObject Load Shedding
(b) Query Arrival
Figure 5.17. Scalability with Load Shedding.
of each continuous query. With tight memory resources, the memory is consumed
completely with the arrival of about 1200 queries. At this point, the process of load
shedding is triggered. The required memory consumption level is set to 90%. Since
query load shedding immediately drops tuples from memory, the query accuracy is
dropped sharply to 90%. In contrast, in object load shedding, the accuracy degrades
slowly. With the arrival of more queries, query load shedding tries to slowly enhance
its performance. However, the memory consumption is faster than the recovery of
query load shedding. Thus, soon, we will need to drop some more tuples from mem-
ory that will result in less accuracy. The behavior continues with two contradicting
actions: (1) Query load shedding tends to enhance the accuracy by retaining the
original query size, and (2) The arrival of more queries consumes memory resources.
Since the second action is faster than the first one, the performance has a zigzag
behavior that leads to reducing the query accuracy. On the other hand, object load
shedding does not suffer from this drawback. Instead, due to the smartness of choos-
ing victim objects, object load shedding always maintains sufficient accuracy with
minimum memory load.
![Page 117: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/117.jpg)
105
Figure 5.18. Performance of Object Load Shedding.
5.6.6 Object Load Shedding
Figure 5.18 focuses on the performance of object load shedding. The required
reduced load varies from 10% to 90% while the number of queries varies from 1K
to 32K. This experiment shows that object load shedding is scalable and is stable
when increasing the number of queries. For example, when reducing the memory
load to 90%, we consistently get an accuracy around 94% regardless of the number
of queries. Such consistent behavior appears in various reduced loads.
5.7 Summary
In this chapter, we introduced the Scalable On-Line Execution algorithm (SOLE,
for short) for continuous and on-line evaluation of concurrent continuous spatio-
temporal queries over spatio-temporal data streams. SOLE is an in-memory al-
gorithm that utilizes the scarce memory resources efficiently by keeping track of
only those objects that are considered significant. SOLE is a unified framework for
stationary and moving queries that is encapsulated into a physical pipelined query
operator. To cope with intervals of high arrival rates of objects and/or queries, SOLE
utilizes load shedding techniques that aim to support more continuous queries, yet
![Page 118: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/118.jpg)
106
with an approximate answer. Two load shedding techniques were proposed, namely,
query load shedding and object load shedding. Experimental results based on a real
implementation of SOLE inside a prototype data stream management system show
that SOLE can support up to 20 times more continuous queries than the case of
dealing with each query separately. With object load shedding, SOLE can support
up to 260 times more queries than the case of no sharing.
![Page 119: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/119.jpg)
107
6 CONCLUSIONS AND FUTURE WORK
The main goal of this dissertation is to extend database management systems and
data stream management systems to efficiently support continuous query processing
in location-aware environments. Location-aware environments are characterized by
the large number of moving objects and the large number of outstanding continuous
spatio-temporal queries. Moving objects continuously send updates of their location
information to a location-aware database server. The answers of continuous queries
are continuously changing with the change of location of moving objects and/ or
query regions.
This dissertation fills the gap between spatio-temporal query algorithms and the
practical environment where issues of scalability, practicality, and realization inside
database engines are of great concern.
6.1 Summary of Contributions
This dissertation introduces three main contributions in supporting continuous
query processing in location-aware environments. First, we introduced SINA; a disk-
based framework for scalable execution of multiple concurrent continuous spatio-
temporal queries. SINA is designed with two goals in mind: (1) Scalability in terms
of the number of concurrent continuous spatio-temporal queries, and (2) Incremen-
tal evaluation of continuous spatio-temporal queries. SINA achieves scalability by
employing a shared execution paradigm where the execution of continuous spatio-
temporal queries is abstracted as a spatial join between a set of moving objects
and a set of moving queries. Incremental evaluation is achieved by computing only
the updates of the previously reported answer. We introduce two types of updates,
namely positive and negative updates. Positive or negative updates indicate that a
![Page 120: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/120.jpg)
108
certain object should be added to or removed from the previously reported answer,
respectively.
The second contribution of this dissertation is that we furnished data stream
management systems by a set of primitive spatio-temporal pipelined query operators
(e.g., range query and k-nearest-neighbor operators). Unlike previous approaches
that focus only on high level implementation of spatio-temporal algorithms, our
approach in providing spatio-temporal query operators provides the following ad-
vantages: (1) Spatio-temporal operators can be combined with other traditional
operators (e.g., distinct, aggregate, and join) to support a wide variety of continuous
spatio-temporal queries. (2) Pushing spatio-temporal operators deep in the query
execution plan reduces the number of tuples in the query pipeline, hence provides
efficient query processing. (3) Flexibility in the query optimizer where multiple can-
didate execution plans can be produced by shuffling the spatio-temporal operators
with other traditional operators.
Our third contribution is that we introduced SOLE; a stream-based scalable
pipelined query operator for evaluating large numbers of concurrent continuous
queries over spatio-temporal data streams. concurrent continuous spatio-temporal
queries over data streams. SOLE performs an incremental spatio-temporal join be-
tween two input streams, a stream of spatio-temporal objects and a stream of spatio-
temporal queries. In addition, all the continuous outstanding queries in SOLE share
the same buffer pool. To cope with intervals of high arrival rates of objects and/or
queries, SOLE utilizes a self-tuning approach based on load-shedding where some of
the stored objects are dropped from memory.
6.2 Future Extensions
This dissertation raises a number of research problems related to spatio-temporal
query processing and continuous query processing in general. In this section, we give
an overview of several directions for future research.
![Page 121: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/121.jpg)
109
6.2.1 Continuous Query Optimization
Once a continuous query is submitted to the server, the server consults its query
optimizer to decide about an optimal plan for the newly submitted query. The
decision for the optimal plan is based on certain cost models and environmental
variables. For example, the selectivity of each operator at the time of receiving the
continuous query. However, since the continuous query stays active at the server
for long time, some of the environmental environments change over time. Thus, the
initial decision for the optimal query plan becomes invalid and the initially optimal
execution plan becomes suboptimal. As a result, the performance of the continuous
query degrades over time.
To overcome this drawback, we need to furnish the spatio-temporal continuous
query processor with the necessary techniques that continuously monitor the perfor-
mance of the continuous query plan and adopt it to the optimal one. Such adaptive
continuous query optimization needs to maintain some statistics about the behavior
of the continuously received data. In addition, some data mining techniques need to
be explored to detect if there are special patterns in the received data or not.
One approach to achieve continuous query optimization is to employ spatio-
temporal histograms. Unlike traditional histograms that capture a snapshot of the
underlying environment, spatio-temporal histograms take the time dimension into
account. Once a new continuous query is submitted to the server, a spatio-temporal
histogram is constructed using the first few incoming data. The notion of ”few”
depends on the query lifetime. For example, if a query is submitted to run for
a whole year, then we can use the data for the first month to build our spatio-
temporal histogram. Then, we employ periodicity mining techniques to discover any
periodicity in the first few incoming data. Based on the spatio-temporal histogram
and the periodicity mining techniques, we can decide to use the query plan P1 if the
query runs in early morning, the query plan P2 if the query runs in rush hours, or
the query plan P3 if the query runs on weekends.
![Page 122: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/122.jpg)
110
6.2.2 Cost Model for Spatio-temporal Operators
This dissertation introduces new spatio-temporal operators that can be combined
with traditional query operators in a large query plan. To fully integrate the pro-
posed query operators in the query optimizer, we need to develop cost models for
the continuous spatio-temporal operators. Unlike traditional operators that only
produce positive tuples in the query pipeline, spatio-temporal operators can produce
both positive and negative tuples in the query pipeline.
The main challenge in developing cost models for spatio-temporal operators is
that we have to take into account the number of negative tuples that result from these
operators. The number of negative tuples that come out of the spatio-temporal op-
erators depends on many factors that include the query size, the speed of moving
objects, the speed of the moving query, and the pattern of movement of moving ob-
jects. Taking all these factors in building a cost-model for spatio-temporal operators
is challenging.
6.2.3 Context-aware Query Processing
What we introduced in this dissertation is basically a location-aware continuous
query processor. The main idea is that we had to modify and/or add new function-
alities in several layers of the database engine to support the notion of ”location”.
By being a location-aware, two similar quires submitted to the same databases server
would have different answers based on the location of the submitted query. Since
location is considered a context, a larger umbrella of our proposed query processor
is to support any general context, i.e., building a context-aware query engine. Con-
texts other than the location include time, identity, temperature, activity, schedule
agenda, or profile. By being a context-aware, two similar queries submitted to the
same database server would have different answers based on the associated context
with each query.
![Page 123: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/123.jpg)
111
The main challenge in providing a context-aware query processor is that we want
to avoid modifying the database engine with the addition of each new context. In-
stead, we want to build an extensible query engine that is general enough to support
any kind of context. Other challenges include capturing, representing, and pro-
cessing contextual data. To capture context information, generally some additional
sensors and/or programs are required. To transfer the context information to appli-
cations and for different applications to be able to use the same context information,
a common representation format should exist. To be able to obtain the context-
information, applications must include some intelligence to process the information
and to deduce the meaning.
![Page 124: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/124.jpg)
LIST OF REFERENCES
![Page 125: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/125.jpg)
112
LIST OF REFERENCES
[1] Mohamed F. Mokbel, Walid G. Aref, Susanne E. Hambrusch, and Sunil Prab-hakar. Towards Scalable Location-aware Services: Requirements and ResearchIssues. In Proceedings of the ACM Symposium on Advances in Geographic In-formation Systems, ACM GIS, pages 110–117, New Orleans, LA, November2003.
[2] Tamer Nadeem, Sasan Dashtinezhad, Chunyuan Liao, and Liviu Iftode. Traf-ficView: A Scalable Traffic Monitoring System. In Proceedings of the Interna-tional Conference on Mobile Data Management, MDM, pages 13–26, Berkeley,CA, January 2004.
[3] Moustafa A. Hammad, Mohamed F. Mokbel, Mohamed H. Ali, Walid G. Aref,Ann C. Catlin, Ahmed K. Elmagarmid, Mohamed Eltabakh, Mohamed G.Elfeky, Thanaa M. Ghanem, Robert Gwadera, Ihab F. Ilyas, Mirette Mar-zouk, and Xiaopeng Xiong. Nile: A Query Processing Engine for Data Streams(Demo). In Proceedings of the International Conference on Data Engineering,ICDE, page 851, Boston, MA, March 2004.
[4] Timos K. Sellis. Multiple-Query Optimization. ACM Transactions on DatabaseSystems , TODS, 13(1):23–52, 1988.
[5] Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobe. Efficient andExtensible Algorithms for Multi Query Optimization. In Proceedings of theACM International Conference on Management of Data, SIGMOD, pages 249–260, Dallas, TX, May 2000.
[6] Praveen Seshadri and Mark Paskin. PREDATOR: An OR-DBMS with En-hanced Data Types. In Proceedings of the ACM International Conference onManagement of Data, SIGMOD, pages 568–571, Tucson, AZ, May 1997.
[7] Michael J. Carey, David J. DeWitt, Michael J. Franklin, Nancy E. Hall, Mark L.McAuliffe, Jeffrey F. Naughton, Daniel T. Schuh, Marvin H. Solomon, C. K.Tan, Odysseas G. Tsatalos, Seth J. White, and Michael J. Zwilling. Shoring UpPersistent Applications. In Proceedings of the ACM International Conference onManagement of Data, SIGMOD, pages 383–394, Minneapolis, MN, May 1994.
[8] Mohamed F. Mokbel, Xiaopeng Xiong, and Walid G. Aref. SINA: ScalableIncremental Processing of Continuous Queries in Spatio-temporal Databases.In Proceedings of the ACM International Conference on Management of Data,SIGMOD, pages 443–454, Paris, France, June 2004.
[9] Mohamed F. Mokbel and Walid G. Aref. GPAC: Generic and Progressive Pro-cessing of Mobile Queries over Mobile Data. In Proceedings of the InternationalConference on Mobile Data Management, MDM, Ayia Napa, Cyprus, May 2005.
![Page 126: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/126.jpg)
113
[10] Mohamed F. Mokbel. Continuous Query Processing in Spatio-temporalDatabases. In Proceedings of the ICDE/EDBT Ph.D. Workshop, March 2004,Boston, MA, pages 119-128. Selected among best papers for a revised versionin Lecture Notes of Computer Science (LNCS), Current Trends in DatabaseTechnology, EDBT 2004 Workshops Revised Selected Papers. Vol. 3268, pages100-111.
[11] Mohamed F. Mokbel, Xiaopeng Xiong, Moustafa A. Hammad, and Walid G.Aref. Continuous Query Processing of Spatio-temporal Data Streams inPLACE. In Proceedings of the second workshop on Spatio-Temporal DatabaseManagement, STDBM, pages 57–64, Toronto, Canada, August 2004.
[12] Mohamed F. Mokbel, Xiaopeng Xiong, Moustafa A. Hammad, and Walid G.Aref. Continuous Query Processing of Spatio-temporal Data Streams inPLACE. GeoInformatica, 2005. To Appear.
[13] Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref, Susanne Hambrusch,Sunil Prabhakar, and Moustafa Hammad. PLACE: A Query Processor for Han-dling Real-time Spatio-temporal Data Streams (Demo). In Proceedings of theInternational Conference on Very Large Data Bases, VLDB, pages 1377–1380,Toronto, Canada, August 2004.
[14] Mario A. Nascimento and Jefferson R. O. Silva. Towards historical R-trees. InACM symposium on Applied Computing, SAC, pages 235–240, Atlanta, GA,February 1998.
[15] Yufei Tao and Dimitris Papadias. MV3R-Tree: A Spatio-Temporal AccessMethod for Timestamp and Interval Queries. In Proceedings of the Interna-tional Conference on Very Large Data Bases, VLDB, pages 431–440, Rome,Italy, September 2001.
[16] Dieter Pfoser, Christian S. Jensen, and Yannis Theodoridis. Novel Approachesin Query Processing for Moving Object Trajectories. In Proceedings of theInternational Conference on Very Large Data Bases, VLDB, pages 395–406,Cairo, Egypt, September 2000.
[17] V. Prasad Chakka, Adam Everspaugh, and Jignesh M. Patel. Indexing LargeTrajectory Data Sets with SETI. In Proc. of the Conf. on Innovative DataSystems Research, CIDR, Asilomar, CA, January 2003.
[18] Zhexuan Song and Nick Roussopoulos. Hashing Moving Objects. In MobileData Management, pages 161–172, Hong Kong, January 2001.
[19] Sunil Prabhakar, Yuni Xia, Dmitri V. Kalashnikov, Walid G. Aref, and Su-sanne E. Hambrusch. Query Indexing and Velocity Constrained Indexing: Scal-able Techniques for Continuous Queries on Moving Objects. IEEE Trans. onComputers, 51(10):1124–1140, 2002.
[20] Dongseop Kwon, Sangjun Lee, and Sukho Lee. Indexing the Current Positionsof Moving Objects Using the Lazy Update R-tree. In Mobile Data Management,MDM, pages 113–120, Singapore, January 2002.
[21] Mong-Li Lee, Wynne Hsu, Christian S. Jensen, and Keng Lik Teo. SupportingFrequent Updates in R-Trees: A Bottom-Up Approach. In Proceedings of theInternational Conference on Very Large Data Bases, VLDB, pages 608–619,Berlin, Germany, September 2003.
![Page 127: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/127.jpg)
114
[22] Simonas Saltenis, Christian S. Jensen, Scott T. Leutenegger, and Mario A.Lopez. Indexing the Positions of Continuously Moving Objects. In Proceedingsof the ACM International Conference on Management of Data, SIGMOD, pages331–342, Dallas, TX, May 2000.
[23] Simonas Saltenis and Christian S. Jensen. Indexing of Moving Objects forLocation-Based Services. In Proceedings of the International Conference onData Engineering, ICDE, pages 463–472, San Jose, CA, February 2002.
[24] Yufei Tao, Dimitris Papadias, and Jimeng Sun. The TPR*-Tree: An OptimizedSpatio-temporal Access Method for Predictive Queries. In Proceedings of theInternational Conference on Very Large Data Bases, VLDB, pages 790–801,Berlin, Germany, September 2003.
[25] Jignesh M. Patel, Yun Chen, and V. Prasad Chakka. STRIPES: An EfficientIndex for Predicted Trajectories. In Proceedings of the ACM International Con-ference on Management of Data, SIGMOD, pages 637–646, Paris, France, June2004.
[26] Yuni Xia and Sunil Prabhakar. Q+-tree: Efficient Indexing for Moving Ob-ject Database. In Proceedings of the International Conference on DatabaseSystems for Advanced Applications, DASFAA, pages 175–182, Kyoto, Japan,March 2003.
[27] Christos Faloutsos and Shari Roseman. Fractals for Secondary Key Retrieval. InProceedings of the ACM Symposium on Principles of Database Systems, PODS,pages 247–252, Philadelphia, PA, March 1989.
[28] Mohamed F. Mokbel and Walid G. Aref. Irregularity in Multi-DimensionalSpace-Filling Curves with Applications in Multimedia Databases. In Proceedingsof the International Conference on Information and Knowledge Managemen,CIKM, pages 512–519, Atlanta, GA, May 2001.
[29] Mohamed F. Mokbel, Walid G. Aref, and Ibrahim Kamel. Analysis of Multi-dimensional Space-Filling Curves. GeoInformatica, 7(3):179–209, September2003.
[30] Antonin Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching.In Proceedings of the ACM International Conference on Management of Data,SIGMOD, pages 47–57, Boston, MA, June 1984.
[31] Yufei Tao, Dimitris Papadias, and Qiongmao Shen. Continuous Nearest Neigh-bor Search. In Proceedings of the International Conference on Very Large DataBases, VLDB, pages 287–298, Hong Kong, August 2002.
[32] Zhexuan Song and Nick Roussopoulos. K-Nearest Neighbor Search for MovingQuery Point. In Proceedings of the International Symposium on Advances inSpatial and Temporal Databases, SSTD, pages 79–96, Redondo Beach, CA, July2001.
[33] Rimantas Benetis, Christian S. Jensen, Gytis Karciauskas, and Simonas Salte-nis. Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Ob-jects. In Proceedings of the International Database Engineering and ApplicationsSymposium, IDEAS, pages 44–53, Alberta, Canada, July 2002.
![Page 128: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/128.jpg)
115
[34] Man Lung Yiu, Dimitris Papadias, Nikos Mamoulis, and Yufei Tao. ReverseNearest Neighbors in Large Graphs. In Proceedings of the International Con-ference on Data Engineering, ICDE, pages 186–187, Kyoto, Japan, April 2005.
[35] Marios Hadjieleftheriou, George Kollios, Dimitrios Gunopulos, and Vassilis J.Tsotras. On-Line Discovery of Dense Areas in Spatio-temporal Databases. InProceedings of the International Symposium on Advances in Spatial and Tem-poral Databases, SSTD, pages 306–324, Santorini Island, Greece, July 2003.
[36] Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, and Dimitris Papadias.Spatio-Temporal Aggregation Using Sketches. In Proceedings of the Interna-tional Conference on Data Engineering, ICDE, pages 214–225, Boston, MA,March 2004.
[37] Shivnath Babu and Jennifer Widom. Continuous Queries over Data Streams.SIGMOD Record, 30(3):109–120, 2001.
[38] Moustafa A. Hammad, Michael J. Franklin, Walid G. Aref, and Ahmed K. Elma-garmid. Scheduling for shared window joins over data streams. In Proceedings ofthe International Conference on Very Large Data Bases, VLDB, pages 297–308,Berlin, Germany, September 2003.
[39] Moustafa A. Hammad, Thanaa M. Ghanem, Walid G. Aref, Ahmed K. El-magarmid, and Mohamed F. Mokbel. Efficient pipelined execution of sliding-window queries over data streams. Technical Report TR CSD-03-035, PurdueUniversity Department of Computer Sciences, December 2003.
[40] Lukasz Golab and M. Tamer Ozsu. Processing Sliding Window Multi-Joinsin Continuous Queries over Data Streams. In Proceedings of the InternationalConference on Very Large Data Bases, VLDB, pages 500–511, Berlin, Germany,September 2003.
[41] Arvind Arasu and Jennifer Widom. Resource Sharing in Continuous Sliding-Window Aggregates. In Proceedings of the International Conference on VeryLarge Data Bases, VLDB, pages 336–347, Toronto, Canada, August 2004.
[42] Mohamed F. Mokbel, Thanaa M. Ghanem, and Walid G. Aref. Spatio-temporalAccess Methods. IEEE Data Engineering Bulletin, 26(2):40–49, June 2003.
[43] Reynold Cheng, Yuni Xia, Sunil Prabhakar, and Rahul Shah. Change TolerantIndexing for Constantly Evolving Data. In Proceedings of the InternationalConference on Data Engineering, ICDE, pages 391–402, Kyoto, Japan, April2005.
[44] Arvind Arasu, Brian Babcock, Shivnath Babu, J. Cieslewicz, Mayur Datar,K. Ito, Rajeev Motwani, U. Srivastava, and Jennifer Widom. Stream: Thestanford data stream management system, 2004.
[45] Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin,Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel Madden, Vi-jayshankar Raman, Fred Reiss, and Mehul A. Shah. TelegraphCQ: ContinuousDataflow Processing for an Uncertain World. In Proceedings of the Interna-tional Conference on Innovative Data Systems Research, CIDR, Asilomar, CA,January 2003.
![Page 129: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/129.jpg)
116
[46] Lukasz Golab, Shaveen Garg, and M. Tamer Ozsu. On Indexing Sliding Win-dows over Online Data Streams. In Proceedings of the International Confer-ence on Extending Database Technology, EDBT, pages 712–729, Crete, Greece,March 2004.
[47] Jaewoo Kang, Jeffrey F. Naughton, and Stratis Viglas. Evaluating WindowJoins over Unbounded Streams. In Proceedings of the International Conferenceon Data Engineering, ICDE, pages 341–352, Bangalore, India, March 2003.
[48] Samuel Madden, Mehul Shah, Joseph M. Hellerstein, and Vijayshankar Raman.Continuously adaptive continuous queries over streams. In Proceedings of theACM International Conference on Management of Data, SIGMOD, pages 49–60, Madison, Wisconsin, June 2002.
[49] Utkarsh Srivastava and Jennifer Widom. Memory-Limited Execution of Win-dowed Stream Joins. In Proceedings of the International Conference on VeryLarge Data Bases, VLDB, pages 324–335, Toronto, Canada, August 2004.
[50] Graham Cormode and S. Muthukrishnan. Radial Histograms for SpatialStreams. Technical Report DIMACS TR: 2003-11, Rutgers University, 2003.
[51] John Hershberger and Subhash Suri. Adaptive Sampling for Geometric Prob-lems over Data Streams. In Proceedings of the ACM Symposium on Principlesof Database Systems, PODS, pages 252–262, Paris, France, June 2004.
[52] Jimeng Sun, Dimitris Papadias, Yufei Tao, and Bin Liu. Querying about thePast, the Present and the Future in Spatio-Temporal Databases. In Proceedingsof the International Conference on Data Engineering, ICDE, pages 202–213,Boston, MA, March 2004.
[53] Moustafa A. Hammad, Walid G. Aref, and Ahmed K. Elmagarmid. StreamWindow Join: Tracking Moving Objects in Sensor-Network Databases. In Pro-ceedings of the International Conference on Scientific and Statistical DatabaseManagement, SSDBM, pages 75–84, Cambridge, MA, July 2003.
[54] Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. MaintainingStream Statistics over Sliding Windows . In Proceedings of the ACM-SIAMSymposium on Discrete Algorithms, SODA, pages 635–644, San Francisco, CA,January 2002.
[55] Jun Zhang, Manli Zhu, Dimitris Papadias, Yufei Tao, and Dik Lun Lee. Address-based Spatial Queries. In Proceedings of the ACM International Conference onManagement of Data, SIGMOD, pages 443–454, San Diego, CA, June 2003.
[56] Baihua Zheng and Dik Lun Lee. Semantic Caching in Location-DependentQuery Processing. In Proceedings of the International Symposium on Advancesin Spatial and Temporal Databases, SSTD, pages 97–116, Redondo Beach, CA,July 2001.
[57] Iosif Lazaridis, Kriengkrai Porkaew, and Sharad Mehrotra. Dynamic Queriesover Mobile Objects. In Proceedings of the International Conference on Ex-tending Database Technology, EDBT, pages 269–286, Prague, Czech Republic,March 2002.
![Page 130: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/130.jpg)
117
[58] Glenn S. Iwerks, Hanan Samet, and Ken Smith. Continuous K-Nearest NeighborQueries for Continuously Moving Points with Updates. In Proceedings of theInternational Conference on Very Large Data Bases, VLDB, pages 512–523,Berlin, Germany, September 2003.
[59] Yufei Tao, Jimeng Sun, and Dimitris Papadias. Analysis of Predictive Spatio-Temporal Queries. ACM Transactions on Database Systems , TODS, 28(4),2003.
[60] Ying Cai, Kien A. Hua, and Guohong Cao. Processing Range-MonitoringQueries on Heterogeneous Mobile Objects. In Mobile Data Management, MDM,pages 27–38, Berkeley, CA, January 2004.
[61] Bugra Gedik and Ling Liu. MobiEyes: Distributed Processing of ContinuouslyMoving Queries on Moving Objects in a Mobile System. In Proceedings of theInternational Conference on Extending Database Technology, EDBT, pages 67–87, Crete, Greece, March 2004.
[62] Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. NiagaraCQ: AScalable Continuous Query System for Internet Databases. In Proceedings ofthe ACM International Conference on Management of Data, SIGMOD, pages379–390, Dallas, TX, May 2000.
[63] Sirish Chandrasekaran and Michael J. Franklin. Streaming Queries over Stream-ing Data. In Proceedings of the International Conference on Very Large DataBases, VLDB, pages 203–214, Hong Kong, August 2002.
[64] Susanne E. Hambrusch, Chuan-Ming Liu, Walid G. Aref, and Sunil Prabhakar.Query Processing in Broadcasted Spatial Index Trees. In Proceedings of the In-ternational Symposium on Advances in Spatial and Temporal Databases, SSTD,pages 502–521, Redondo Beach, CA, July 2001.
[65] Annita N. Wilschut and Peter M. G. Apers. Dataflow Query Execution in aParallel Main-Memory Environment. In Proceedings of the First InternationalConference on Parallel and Distributed Information Systems, PDIS 1991, pages68–77, Miami, Florida, December 1991.
[66] Tolga Urhan and Michael J. Franklin. XJoin: A Reactively-Scheduled PipelinedJoin Operator. IEEE Data Engineering Bulletin, 23(2):7–18, 2000.
[67] Mohamed F. Mokbel, Ming Lu, and Walid G. Aref. Hash-merge Join: A Non-blocking Join algorithm for Producing Fast and Early Join Results. In Pro-ceedings of the International Conference on Data Engineering, ICDE, pages251–263, Boston, MA, March 2004.
[68] Yufei Tao, Man Lung Yiu, Dimitris Papadias, Nikos Mamoulis, and MariosHadjieleftheriou. RPJ: Producing Fast Join Results on Streams through Rate-based. In Proceedings of the ACM International Conference on Management ofData, SIGMOD, Baltimore, MD, June 2005.
[69] Hanan Samet. The Quadtree and Related Hierarchical Data Structures. ACMComputing Surveys, 16(2):187–260, 1984.
![Page 131: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/131.jpg)
118
[70] Jignesh M. Patel and David J. DeWitt. Partition Based Spatial-Merge Join.In Proceedings of the ACM International Conference on Management of Data,SIGMOD, pages 259–270, Montreal, Canada, June 1996.
[71] Ouri Wolfson and Huabei Yin. Accuracy and Resource Concumption in Trackingand Location Prediction. In Proceedings of the International Symposium onAdvances in Spatial and Temporal Databases, SSTD, pages 325–343, SantoriniIsland, Greece, July 2003.
[72] Gsli R. Hjaltason and Hanan Samet. Distance browsing in spatial databases.ACM Transactions on Database Systems , TODS, 24(2):265–318, 1999.
[73] Thomas Brinkhoff, Hans-Peter Kriegel, and Bernhard Seeger. Efficient Process-ing of Spatial Joins Using R-Trees. In Proceedings of the ACM InternationalConference on Management of Data, SIGMOD, pages 237–246, Washington,D.C, May 1993.
[74] Thomas Brinkhoff. A Framework for Generating Network-Based Moving Ob-jects. GeoInformatica, 6(2):153–180, 2002.
[75] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger.The R*-Tree: An Efficient and Robust Access Method for Points and Rectan-gles. In Proceedings of the ACM International Conference on Management ofData, SIGMOD, pages 322–331, Atlantic City, NJ, May 1990.
[76] Hyun Kyoo Park, Jin Hyun Son, and Myoung-Ho Kim. An Efficient Spa-tiotemporal Indexing Method for Moving Objects in Mobile CommunicationEnvironments. In Proceedings of the International Conference on Mobile DataManagement, MDM, pages 78–91, Melbourne, Australia, January 2003.
[77] Zhexuan Song and Nick Roussopoulos. SEB-tree: An Approach to Index Con-tinuously Moving Objects. In Proceedings of the International Conference onMobile Data Management, MDM, pages 340–344, Melbourne, Australia, Jan-uary 2003.
[78] Daniel J. Abadi, Donald Carney, Ugur Cetintemel, Mitch Cherniack, ChristianConvey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stanley B.Zdonik. Aurora: A New Model and Architecture for Data Stream Management.VLDB Journal, 12(2):120–139, 2003.
[79] Daniel Abadi, Yanif Ahmad, Hari Balakrishnan, Magdalena Balazinska, UgurCetintemel, Mitch Cherniack, Jeong-Hyon Hwang, John Janotti, WolfgangLindner, Sam Madden, Alex Rasin, Michael Stonebraker, Nesime Tatbul, YingXing, and Stan Zdonik. The Design of the Borealis Stream Processing Engine.In Proceedings of the International Conference on Innovative Data Systems Re-search, CIDR, pages 277–289, Asilomar, CA, January 2005.
[80] Rajeev Motwani, Jennifer Widom, Arvind Arasu, Brian Babcock, ShivnathBabu, Mayur Datar, Gurmeet Singh Manku, Chris Olston, Justin Rosenstein,and Rohit Varma. Query Processing, Approximation, and Resource Manage-ment in a Data Stream Management System. In Proceedings of the InternationalConference on Innovative Data Systems Research, CIDR, Asilomar, CA, Jan-uary 2003.
![Page 132: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/132.jpg)
119
[81] Chuck Cranor, Theodore Johnson, Oliver Spataschek, and VladislavShkapenyuk. Gigascope: a Stream Database for Network Applications. In Pro-ceedings of the ACM International Conference on Management of Data, SIG-MOD, pages 647–651, San Diego, California, June 2003.
[82] Joseph M. Hellerstein, Jeffrey F. Naughton, and Avi Pfeffer. Generalized SearchTrees for Database Systems. In Proceedings of the International Conference onVery Large Data Bases, VLDB, pages 562–573, Zurich, Switzerland, September1995.
[83] Walid G. Aref and Ihab F. Ilyas. SP-GiST: An Extensible Database Index forSupporting Space Partitioning Trees. Journal of Intelligent Info. Systems, JIIS,17(2–3):215–240, 2001.
[84] Xiaopeng Xiong, Mohamed F. Mokbel, Walid G. Aref, Susanne Hambrusch,and Sunil Prabhakar. Scalable Spatio-temporal Continuous Query Processingfor Location-aware Services. In Proceedings of the International Conferenceon Scientific and Statistical Database Management, SSDBM, pages 317–328,Santorini Island, Greece, June 2004.
[85] Dimitris Papadias, Qiongmao Shen, Yufei Tao, and Kyriakos Mouratidis. GroupNearest Neighbor Queries. In Proceedings of the International Conference onData Engineering, ICDE, pages 301–312, Boston, MA, March 2004.
[86] Yufei Tao and Dimitris Papadias. Time-parameterized queries in spatio-temporal databases. In Proceedings of the ACM International Conference onManagement of Data, SIGMOD, pages 334–345, Madison, WI, June 2002.
[87] Berthold Reinwald and Hamid Pirahesh. SQL Open Heterogeneous Data Access.In Proceedings of the ACM International Conference on Management of Data,SIGMOD, pages 506–507, Seattle, WA, June 1998.
[88] Berthold Reinwald, Hamid Pirahesh, Ganapathy Krishnamoorthy, GeorgeLapis, Brian T. Tran, and Swati Vora. Heterogeneous query processing throughsql table functions. In Proceedings of the International Conference on DataEngineering, ICDE, pages 366–373, Sydney, Austrialia, March 1999.
[89] Xiaopeng Xiong, Mohamed F. Mokbel, and Walid G. Aref. SEA-CNN: Scal-able Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporalDatabases. In Proceedings of the International Conference on Data Engineering,ICDE, pages 643–654, Kyoto, Japan, April 2005.
[90] Christian S. Jensen, Dan Lin, and Beng Chin Ooi. Query and Update EfficientB+-Tree Based Indexing of Moving Objects. In Proceedings of the InternationalConference on Very Large Data Bases, VLDB, pages 768–779, Toronto, Canada,August 2004.
[91] Ahmed Ayad and Jeffrey F. Naughton. Static Optimization of ConjunctiveQueries with Sliding Windows Over Infinite Streams. In Proceedings of theACM International Conference on Management of Data, SIGMOD, pages 419–430, Paris, France, June 2004.
[92] Sirish Chandrasekaran and Michael J. Franklin. PSoup: a system for streamingqueries over streaming data. VLDB Journal, 12(2):140–156, 2003.
![Page 133: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/133.jpg)
120
[93] Alin Dobra, Minos N. Garofalakis, Johannes Gehrke, and Rajeev Rastogi.Sketch-Based Multi-query Processing over Data Streams. In Proceedings ofthe International Conference on Extending Database Technology, EDBT, pages551–568, March 2004.
[94] Brian Babcock, Mayur Datar, and Rajeev Motwani. Load Shedding for Aggrega-tion Queries over Data Streams. In Proceedings of the International Conferenceon Data Engineering, ICDE, pages 350–361, Boston, MA, March 2004.
[95] Nesime Tatbul, Ugur Cetintemel, Stanley B. Zdonik, Mitch Cherniack, andMichael Stonebraker. Load Shedding in a Data Stream Manager. In Proceedingsof the International Conference on Very Large Data Bases, VLDB, pages 309–320, Berlin, Germany, August 2003.
![Page 134: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/134.jpg)
VITA
![Page 135: SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 · an advice or stuck in a decision, Ahmed was always there by his experience and invaluable](https://reader033.vdocuments.net/reader033/viewer/2022041506/5e256498fc2d0652772e6404/html5/thumbnails/135.jpg)
121
VITA
Mohamed Mokbel was born in Alexandria, Egypt in 1974. His passion for computer
started in his early years where he remembers seeing a lot of punched cards around
in his home. During high school, he obtained his first personal computer; a 12MHZ
CPU with two floppy drives, no hard drive, and a mono orange screen (worth $1300).
He used this computer to help his mother (Ph.D. in Oceanography) in drafting her
research papers. In 1991, Mohamed joined the Faculty of Engineering at Alexandria
University. After a very competitive freshman year, he was ranked among the top
students and joined the Computer Science Department. As a reward, his father
bought him a 60MB hard disk (worth $400). Continuing his passion for computer
science, Mohamed was one of only two students in his class who obtained Distinction
grade in all the five undergraduate years of study. In 1996 and 1999, Mohamed was
awarded his B.Sc and M.Sc degrees in computer science with the highest degree of
honor from the Faculty of Engineering, Alexandria University.
In 2000, Mohamed joined Purdue University as a research assistant with Prof.
Walid Aref. Working with such a wonderful advisor, Mohamed published several
research papers in different areas of core database systems. In summer 2002, Mo-
hamed interned at Lawrence Livermore National Lab (LLNL), one of the top national
labs in the USA. In summer 2004, Mohamed interned with the database group at
Microsoft Research, one of the top research labs world-wide, where he interacted
with many world-class researchers and built his large-scale system experience. Mo-
hamed’s main research interests focus on advancing the state of the art in the design
and implementation of database engines to cope with the requirements of emerging
applications. Mohamed Mokbel graduated with a Ph.D. degree in computer science
from Purdue University in August 2005 and joined the department of Computer
Science at University of Minnesota–Twin Cities as an assistant professor.