uncertainty in geosocial data: friend or foe? · to create geotagged content. in geotagged content,...

8
Uncertainty in Geosocial Data: Friend or Foe? Yaron Kanza AT&T Labs - Research Bedminster, NJ, USA Abstract In data management, uncertainty and lack of information are commonly considered as hurdles to over- come. But when it comes to management of geosocial data, where data can be used to associate people with geographic locations they visited, uncertainty is frequently not just an obstacle but also a desired feature. Uncertainty and lack of information—such as knowing only approximate locations of users or knowing only some of the locations users visited—may reduce the accuracy of data analysis. They, how- ever, may be desired, or even required, for keeping user privacy, and for using data that otherwise would not have been collected and used. Due to the heterogeneity of geosocial data, uncertainty is prevalent in geosocial applications, and should not be overlooked. In this paper, we discuss some of the causes of uncertainty in geosocial data management. We elaborate on some of the advantages and disadvantages of using inaccurate or incomplete geosocial data, and we illustrate some of the methods to cope with uncertainty in geosocial applications. 1 Introduction In the last decade, GPS-enabled devices have become ubiquitous, and millions of people started using them to create geotagged content. In geotagged content, social-media items, such as photos or textual messages, are associated with a tag specifying a location—usually the coordinates of the place where the item was created, e.g., associating a photo with the place where it was taken. Alongside, online social networks have become a popular mean of sharing and disseminating personal and general information among groups of people. Facebook, which as of today is the largest online social networking site, has over 1.71 billion monthly active users, with 1.57 billion mobile active users. Facebook users generate about 4.5 billion “likes” per day (based on Facebook’s statistics for June 2016). Instagram has about 500 million monthly active users, and Twitter has about 300 million monthly active users, who produce around 500 million tweets per day. Part of this massive content- sharing activity is the creation and the sharing of geotagged content. In addition, online geosocial networks like Foursquare (https://foursquare.com/) provide users with the ability to check-in at a place, yielding a record which contains the location, time and venue information of the action. Foursquare has more than 50 million active users, and the overall number of check-ins it collected is estimated to be between 7 and 10 billions. Although only a small percentage of the content in Facebook, Instagram and Twitter is geotagged, geosocial datasets comprising of geotagged items and check-in records are generated nowadays in an unprecedented pace, and they are becoming a valuable source of geospatial information. Check-in records and geotagged items are produced as part of the daily activity of users, thus, they constitute a rich source of information about the mutual influence of the location on user behavior and on social relationships among users. For example, such information reveals that social relationships have a greater influence on long travels than on short travels of people [15], or that users have a greater tendency to interact with social websites like Facebook when they are in a familiar place, like their home, than in places that are unfamiliar to them [33]. 3

Upload: others

Post on 13-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Uncertainty in Geosocial Data: Friend or Foe? · to create geotagged content. In geotagged content, social-media items, such as photos or textual messages, are associated with a tag

Uncertainty in Geosocial Data: Friend or Foe?

Yaron KanzaAT&T Labs - ResearchBedminster, NJ, USA

Abstract

In data management, uncertainty and lack of information are commonly considered as hurdles to over-come. But when it comes to management of geosocial data, where data can be used to associate peoplewith geographic locations they visited, uncertainty is frequently not just an obstacle but also a desiredfeature. Uncertainty and lack of information—such as knowing only approximate locations of users orknowing only some of the locations users visited—may reduce the accuracy of data analysis. They, how-ever, may be desired, or even required, for keeping user privacy, and for using data that otherwise wouldnot have been collected and used. Due to the heterogeneity of geosocial data, uncertainty is prevalentin geosocial applications, and should not be overlooked. In this paper, we discuss some of the causes ofuncertainty in geosocial data management. We elaborate on some of the advantages and disadvantagesof using inaccurate or incomplete geosocial data, and we illustrate some of the methods to cope withuncertainty in geosocial applications.

1 Introduction

In the last decade, GPS-enabled devices have become ubiquitous, and millions of people started using themto create geotagged content. In geotagged content, social-media items, such as photos or textual messages, areassociated with a tag specifying a location—usually the coordinates of the place where the item was created, e.g.,associating a photo with the place where it was taken. Alongside, online social networks have become a popularmean of sharing and disseminating personal and general information among groups of people. Facebook, whichas of today is the largest online social networking site, has over 1.71 billion monthly active users, with 1.57billion mobile active users. Facebook users generate about 4.5 billion “likes” per day (based on Facebook’sstatistics for June 2016). Instagram has about 500 million monthly active users, and Twitter has about 300million monthly active users, who produce around 500 million tweets per day. Part of this massive content-sharing activity is the creation and the sharing of geotagged content. In addition, online geosocial networks likeFoursquare (https://foursquare.com/) provide users with the ability to check-in at a place, yieldinga record which contains the location, time and venue information of the action. Foursquare has more than 50million active users, and the overall number of check-ins it collected is estimated to be between 7 and 10 billions.

Although only a small percentage of the content in Facebook, Instagram and Twitter is geotagged, geosocialdatasets comprising of geotagged items and check-in records are generated nowadays in an unprecedented pace,and they are becoming a valuable source of geospatial information. Check-in records and geotagged itemsare produced as part of the daily activity of users, thus, they constitute a rich source of information about themutual influence of the location on user behavior and on social relationships among users. For example, suchinformation reveals that social relationships have a greater influence on long travels than on short travels ofpeople [15], or that users have a greater tendency to interact with social websites like Facebook when they arein a familiar place, like their home, than in places that are unfamiliar to them [33].

3

Page 2: Uncertainty in Geosocial Data: Friend or Foe? · to create geotagged content. In geotagged content, social-media items, such as photos or textual messages, are associated with a tag

Figure 1: Popular jogging areas Figure 2: Popular places to watch the sunset

The advent of geosocial data has provided researchers with an opportunity to access a valuable source ofinformation about people and places. The research community has studied miscellaneous ways to utilize geoso-cial data, and many prototypes of geosocial applications were developed. This includes studies of the useof large geosocial datasets for location recommendation [3, 42], combining social networks and spatial net-works [18], route recommendation [19, 37], detecting unusual social activities [35], predicting spread of dis-eases [34, 49], detecting natural disasters like earthquakes and depicting realtime geospatial information duringthe event [2,7,38,51], detection of local events in a city [60], and so forth. For example, Fig. 1 and Fig. 2 presentpopular jogging places and popular locations to view the sunset, in or near Manhattan, based on geotagged posts(depicted by red squares and purple polygons). The areas were discovered using the search tool of Pat et al. [43],by finding locations that are associated with the search terms, e.g., places in which a large percentage of the postsmention the search terms. Such a search, which is based on geotagged posts of real users, is different from tra-ditional geographic search that is mainly based on names and categories of geographical entities. The interest ingeosocial data is, however, not limited to the scientific community. For instance journalists use geotagged datato measure popularity of locations, e.g., The Huffington Post relied on geotagged data to evaluate and rank thepopularity of museums around the world.1

When collecting geosocial data, frequently there are two conflicting goals. On the one hand, there is atendency to collect as many data items as possible. On the other hand, there is a desire to collect data that areaccurate, reliable and complete. In many applications it is essential to use a large volume of data. For example,places that are only seldom visited may not be represented in a small dataset, so in location recommendationor in geographic search, such places may never appear in the result if the dataset is too small. In data analysis,clearly the amount of data over which the analysis is conducted can affect the accuracy of the result. So, while ananalysis using a small dataset can lead to inaccurate or uncertain results, methods to increase the dataset often adddata that is imprecise, incomplete or without a clear semantics, as will be elaborated hereafter. Using a datasetthat was enlarged this way requires coping with uncertainty regarding location, time and other attributes. In thispaper, we examine some of the causes of uncertainty in geosocial data management and geosocial applications,and we discuss advantages and disadvantages and of using incomplete or inaccurate geosocial data.

2 Causes of Uncertainty and Lack of Information

We elaborate now on uncertainty in geosocial data. A geosocial dataset is a collection of items posted by usersof an online geosocial network. In a complete dataset, each item has a location, given as a pair of longitude and

1http://www.huffingtonpost.com/entry/most-geotagged-museums-on-instagram us 565f0992e4b072e9d1c42d2c

4

Page 3: Uncertainty in Geosocial Data: Friend or Foe? · to create geotagged content. In geotagged content, social-media items, such as photos or textual messages, are associated with a tag

latitude coordinates, and additional optional attributes such as time, content, and so forth. Each item is associatedwith the user who posted it. Users are connected to one another in the social network, by friendship or othertypes of social relationships. Datasets can be homogeneous, where all the items are similar to one another, e.g.,a dataset of textual tweets or a dataset of geotagged Flickr photos. Other datasets may have multifarious content,with a mixture of textual items, photos, videos, check-in records, and so on. In particular, geosocial datasetsthat are collected from diverse social-media sources may consist of posts of many different people, hence, suchdatasets are frequently heterogeneous, inaccurate or incomplete. To process and use either homogeneous orheterogeneous geosocial datasets, the following obstructions should be taken into account.

Inaccuracy. The location of an item is frequently measured using the global positioning system (GPS),which has a limited accuracy. According to the GPS performance standards2, well designed GPS receivers havebeen achieving horizontal accuracy of 3 meters or better and vertical accuracy of 5 meters or better 95% of thetime. This is, however, affected by atmospheric conditions, sky blockage, and the receiver quality. Such accuracyis insufficient for determining with certainty the building in which a user checked-in or an item was geotagged.For a GPS device on a moving car, the accuracy of the geolocating process is insufficient for determining thelane, so there is some uncertainty in the matching of the GPS measurements to the road on which the cartraveled. The problem of matching a sequence of GPS points (trajectory) to the road network is known as “mapmatching”, and it received a lot of attention in the literature [47].

The time attribute of a check-in record or of a geotagged post can be inaccurate as well, because devicesmay relate to different time zones, may not be updated to reflect daylight saving times, etc. Also other attributesmay have accuracy errors. Applications that use these attributes must take the possibility of inaccuracy intoaccount. For example, in an application that aims to find pairs of users who were at the same place at the sametime, relying on items that were sent from nearby places at near times needs to take into account uncertainty dueto inaccuracy in the recorded locations and times. In geosocial datasets, inaccuracy of the location or of otherattributes can also be due to privacy considerations, aiming not to reveal the exact location of users [44].

Lack of spatio-temporal attributes. Only a small percentage of the content in online social networks isgeotagged [45]. There are estimations that less than 2 percent of the tweets in Twitter3 and 15 to 25 percentof the post in Instagram4 are geotagged. Several studies showed how to increase the size of the dataset bygeolocating untagged posts. This is not an easy task because there is only a partial correlation between textualcontent and locations, in geosocial datasets [28]. For example, a posts in New York that mentions “KinkyBoots” can be mapped, with high certainty, to the location of the theater where the Broadway show by thisname is presented [12], but many posts do not contain any phrase that can be mapped to a specific location withcertainty. Different methods were developed to cope with this problem. Using merely geographic names andpersonal attributes of users can provide an effective but inaccurate method for geolocating tweets, e.g., Schulz etal. [52] mapped 92% of their test dataset with an error of up to 30 kilometers. It has been shown how to computelanguage models of areas based on geotagged posts, and based on that, find the location of untagged posts [32]. Ithas been suggested to use relationships among users in the process, however, this is not an accurate method, e.g.,the median error of such a method, in the study of Compton et al. [16], was greater than 6 kilometers. Moreaccurate methods rely of the use of Gaussian Mixture Model (GMM) and Maximum Likelihood Estimation(MLE) [13, 22, 45], but they can be applied only on some of the content, e.g., Flatow et al. [22] mapped about80% of the Instagram posts in their dataset, with an error of less than 1 kilometer. The geolocating process canfrequently increase the size of the dataset, but it is limited, because there is a tradeoff between the accuracy (thepercentage of mappings in which the item is mapped to the correct location), the precision (the distance betweenthe assigned location and the real location, in a correct matching) and the coverage (percentage of posts that are

2http://www.gps.gov/technical/ps/2008-SPS-performance-standard.pdf3http://blog.gnip.com/twitter-geo-data-enrichment/4http://bits.blogs.nytimes.com/2012/08/16/instagram-refreshes-app-include- photo-maps/

5

Page 4: Uncertainty in Geosocial Data: Friend or Foe? · to create geotagged content. In geotagged content, social-media items, such as photos or textual messages, are associated with a tag

mapped to a location). Hence, coping with datasets that were extended by geolocating untagged items needs totake into account the underlying uncertainty in the location of posts, and the incompleteness of the process.

Anonymous or false data. Users are not always associated with the content they create. In many cases thisis due to privacy concerns [23, 46, 58]. However, there are other reasons for that, e.g., forwarding may causeusers to share geotagged content they have not created, users may share content while using the device andthe account of a friend, etc. In addition, there are cases where users intentionally report a false location. Forinstance, in online geosocial networks that reward people who have many check-ins at a particular location, usersmay create false check-ins to receive the reward [10, 11]. Hence, a geosocial dataset that associates people withvisited locations is expected to include some incorrect associations, which leads to uncertainty in the matchingof people with the places they visited.

3 Coping with Uncertain and Incomplete Data

The problem of coping with uncertain and incomplete geospatial data received a lot of attention, but only someof the proposed solutions are suitable for applications over geosocial data. In many studies, a probabilistic datamodel is defined, to represent uncertainty regarding the relevance or the correctness of geospatial objects [17,31,40,50], or to represent an object location as a spatial distribution [54]. In such a solution, each object is assigneda probability, and query evaluation takes the probabilities of the involved objects into account, e.g., returningonly answers with a high probability of being correct, or associating a confidence value to query answers [14].It has also been shown how to use interactivity to cope with uncertainty in route planning, where users providefeedback regarding the relevance of visited objects, and the route is recomputed accordingly [25,30]. It has beensuggested to integrate data from two or more uncertain sources, to decrease the overall uncertainty [61], althoughin some cases, integration of geospatial data sources leads to uncertainty in the correctness of the association ofthe objects of the different sources [6]. Several papers studied uncertainty in moving objects [53, 55, 57]—theyproposed methods to bound the area in which the object may reside and apply computations accordingly [56,62].It has also been shown how to index objects with an uncertain location [20].

Different methods, including topological [29, 36] and probabilistic [8] methods, were suggested for copingwith the map matching problem, i.e., mapping GPS points of trajectories of moving objects to the underlyingroad network. In map matching it is frequently required to deal with sparse data, e.g., when applying dilution(simplification) to reduce storage space and transmission costs [9, 26], or with partial knowledge about themap [59]. A common technique to cope with inaccurate or sparse trajectories in map matching is by constructinga Hidden Markov Model (HMM) over the trajectory points and the road network, and applying Viterbi algorithmto compute the matching [27, 39]. Emrich et al. [21] and Niedermayer et al. [41] showed how to representtrajectories as stochastic processes.

Many of the methods that were proposed for coping with uncertainty in geospatial applications only providea limited solution for uncertainty in geosocial applications. First, often methods to geolocate untagged itemshave an accuracy of several kilometers, and can only be applied on part of the available items. So, methodsthat bound the error or rely on having a small error in the location of items may not be effective. Secondly,methods to cope with moving objects often rely on the knowledge that the sequence of points in the trajectoryis a continuous travel along roads of the road network, and this helps reducing the uncertainty. This is notalways possible with sporadic check-ins or isolated geotagged items. Thirdly, due to privacy restrictions, it isoften desired to keep locations inaccurate (obfuscated) or concealed. Due to regulations and the desire to protectuser privacy, companies often do not try to reveal the location of any particular user, and deliberately applycomputations over data in which locations or associations between users and locations are concealed.

There were studies on how to use information such as the social relationships of users or other personalattributes of users to cope with uncertainty regarding the location of users and their posts [1, 48, 52], but thisapproach has a very limited accuracy in comparison to GPS. First, the online social relationships of a person are

6

Page 5: Uncertainty in Geosocial Data: Friend or Foe? · to create geotagged content. In geotagged content, social-media items, such as photos or textual messages, are associated with a tag

not necessarily with people who live nearby. Also other personal attributes cannot always indicate how oftenthe person travels and to where. Even if a good estimation of the area in which a user is active can be achievedbased on the locations of the geotagged items and check-ins of that person, these may not always be availableor may be too dispersed. Location information from other sources, such as call detail records (CDRs), couldhave improved the accuracy of a location estimation [5, 24], but such data are proprietary and are inaccessibleto most of the analysts. In addition, CDRs are associated with a cell tower rather than with an exact location, sothe locations they provide are sparse and coarse. Hence, the question of how to analyze the locations of massesof people based on available geotagged data, and gain accurate and reliable results, is still an open question.

4 Discussion

Online geosocial networks provide up-to-date information that can be used for analyzing relationships betweenuser activities and locations. The content that users share in such networks can be utilized by various applica-tions, but this needs to be done with care because geosocial datasets are often heterogeneous, incomplete andinaccurate. In particular, many items are not associated with a location or are associated with an inaccuratelocation. Yet, there are many benefits to using such data and cope with the entailed uncertainty. Such datasetsallow researchers to analyze large volumes of data of real users. But, because the data are of real users, theyshould be handled with care. It is imperative to avoid causing a privacy breach. It is also essential to preventa statistical bias when removing inaccurate or incomplete data. For example, the awareness of location privacyis affected by age, where teenagers are less aware of privacy risks than adults [4], so the age distribution of theusers, in an analysis of geotagged content, may be different from the age distribution of users of online socialnetworks, and different from the age distribution of the entire population. Analytic methods should take this intoaccount and should be applied in a way that mitigates such biases.

By adapting algorithms and applications to use incomplete and inaccurate geosocial data, and cope withthe uncertainty that occurs when using such data, it would be possible to use larger datasets, to increase theaccuracy of data analysis, while not forcing users to give up their privacy. Using large diverse data sources isessential for achieving fairness in applications like recommendation systems, by alleviating the “rich get richer”phenomenon, that is, a case where a system limits its recommendation to a small set of entities, which due tothe recommendations receive more visits and by that increase their chances to be recommended further.

There are many open research questions in this domain, such as how to build recommendation systems thattake uncertainty (e.g., inaccuracy and incompleteness) into account; how to detect events and spatial trends basedon social media data, while taking into account that many items are untagged or have an inaccurate geotag; andhow to apply correctly analysis over inaccurate and incomplete geosocial content.

References[1] L. Backstrom, E. Sun, and C. Marlow. Find me if you can: improving geographical prediction with social and spatial

proximity. In Proceedings of the 19th international conference on World wide web, pages 61–70. ACM, 2010.

[2] E. Bahir and A. Peled. Identifying and tracking major events using geo-social networks. Social science computerreview, 31(4):458–470, 2013.

[3] J. Bao, Y. Zheng, and M. F. Mokbel. Location-based and preference-aware recommendation using sparse geo-socialnetworking data. In Proceedings of the 20th international conference on advances in geographic information systems,pages 199–208. ACM, 2012.

[4] S. B. Barnes. A privacy paradox: Social networking in the united states. First Monday, 11(9), 2006.

[5] R. A. Becker, R. Caceres, K. Hanson, J. M. Loh, S. Urbanek, A. Varshavsky, and C. Volinsky. A tale of one city:Using cellular network data for urban planning. IEEE Pervasive Computing, 10(4):18–26, 2011.

7

Page 6: Uncertainty in Geosocial Data: Friend or Foe? · to create geotagged content. In geotagged content, social-media items, such as photos or textual messages, are associated with a tag

[6] C. Beeri, Y. Kanza, E. Safra, and Y. Sagiv. Object fusion in geographic information systems. In Proceedings of theThirtieth International Conference on Very Large Data Bases - Volume 30, pages 816–827, Toronto, Canada, 2004.VLDB Endowment.

[7] S. Bekhor, S. Cohen, Y. Doytsher, Y. Kanza, and Y. Sagiv. A personalized geosocial app for surviving an earthquake.In 1st ACM SIGSPATIAL International Workshop on the Use of GIS in Emergency Management, 2015.

[8] M. Bierlaire, J. Chen, and J. Newman. A probabilistic map matching method for smartphone gps data. TransportationResearch Part C: Emerging Technologies, 26:78–98, 2013.

[9] H. Cao, O. Wolfson, and G. Trajcevski. Spatio-temporal data reduction with deterministic error bounds. The VLDBJournal, 15(3):211–228, 2006.

[10] B. Carbunar and R. Sion. Private geosocial networking. In Proceedings of the 19th ACM SIGSPATIAL InternationalConference on Advances in Geographic Information Systems, pages 365–368, Chicago, Illinois, 2011. ACM.

[11] B. Carbunar, R. Sion, R. Potharaju, and M. Ehsan. The shy mayor: Private badges in geosocial networks. InInternational Conference on Applied Cryptography and Network Security, pages 436–454. Springer, 2012.

[12] S. Chandra, L. Khan, and F. B. Muhaya. Estimating twitter user location using social interactions–a content basedapproach. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on SocialComputing (SocialCom), 2011 IEEE Third International Conference on, pages 838–843. IEEE, 2011.

[13] H.-w. Chang, D. Lee, M. Eltaher, and J. Lee. @Phillies tweeting from Philly? predicting twitter user locations withspatial word usage. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysisand Mining (ASONAM 2012), pages 111–118, Washington, DC, USA, 2012. IEEE Computer Society.

[14] R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In Proceedingsof the 2003 ACM SIGMOD International Conference on Management of Data, pages 551–562, San Diego, California,2003. ACM.

[15] E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: User movement in location-based social networks. InProceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages1082–1090, San Diego, California, USA, 2011. ACM.

[16] R. Compton, D. Jurgens, and D. Allen. Geotagging one hundred million twitter accounts with total variation mini-mization. In Big Data (Big Data), 2014 IEEE International Conference on, pages 393–401. IEEE, 2014.

[17] X. Dai, M. L. Yiu, N. Mamoulis, Y. Tao, and M. Vaitis. Probabilistic spatial queries on existentially uncertain data.In International Symposium on Spatial and Temporal Databases, pages 400–417. Springer, 2005.

[18] Y. Doytsher, B. Galon, and Y. Kanza. Querying geo-social data by bridging spatial networks and social networks. InProceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks, pages 39–46.ACM, 2010.

[19] Y. Doytsher, B. Galon, and Y. Kanza. Storing routes in socio-spatial networks and supporting social-based routerecommendation. In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based SocialNetworks, pages 49–56. ACM, 2011.

[20] T. Emrich, H.-P. Kriegel, N. Mamoulis, M. Renz, and A. Zufle. Indexing uncertain spatio-temporal data. In Proceed-ings of the 21st ACM International Conference on Information and Knowledge Management, pages 395–404, Maui,Hawaii, USA, 2012. ACM.

[21] T. Emrich, H.-P. Kriegel, N. Mamoulis, M. Renz, and A. Zufle. Querying uncertain spatio-temporal data. In Proceed-ings of the 2012 IEEE 28th International Conference on Data Engineering, pages 354–365, Washington, DC, USA,2012. IEEE Computer Society.

[22] D. Flatow, M. Naaman, K. E. Xie, Y. Volkovich, and Y. Kanza. On the accuracy of hyper-local geotagging of socialmedia content. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pages127–136. ACM, 2015.

[23] D. Freni, C. Ruiz Vicente, S. Mascetti, C. Bettini, and C. S. Jensen. Preserving location and absence privacy ingeo-social networks. In Proceedings of the 19th ACM International Conference on Information and KnowledgeManagement, pages 309–318, Toronto, ON, Canada, 2010. ACM.

8

Page 7: Uncertainty in Geosocial Data: Friend or Foe? · to create geotagged content. In geotagged content, social-media items, such as photos or textual messages, are associated with a tag

[24] V. Frias-Martinez, C. Soguero, and E. Frias-Martinez. Estimation of urban commuting patterns using cellphonenetwork data. In Proceedings of the ACM SIGKDD International Workshop on Urban Computing, pages 9–16,Beijing, China, 2012. ACM.

[25] R. Friedman, I. Hefez, Y. Kanza, R. Levin, E. Safra, and Y. Sagiv. Wiser: a web-based interactive route search systemfor smartphones. In Proceedings of the 21st International Conference on World Wide Web, pages 337–340. ACM,2012.

[26] R. Gotsman and Y. Kanza. Compact representation of gps trajectories over vectorial road networks. In InternationalSymposium on Spatial and Temporal Databases, pages 241–258. Springer, 2013.

[27] R. Gotsman and Y. Kanza. A dilution-matching-encoding compaction of trajectories over road networks. GeoInfor-matica, 19(2):331–364, 2015.

[28] I. Grabovitch-Zuyev, Y. Kanza, E. Kravi, and B. Pat. On the correlation between textual content and geospatiallocations in microblogs. In Proceedings of Workshop on Managing and Mining Enriched Geo-Spatial Data, pages3:1–3:6, Snowbird, UT, USA, 2007. ACM.

[29] J. S. Greenfeld. Matching gps observations to locations on a digital map. In Transportation Research Board 81stAnnual Meeting, 2002.

[30] Y. Kanza, R. Levin, E. Safra, and Y. Sagiv. Interactive route search in the presence of order constraints. Proceedingsof the VLDB Endowment, 3(1-2):117–128, 2010.

[31] Y. Kanza, E. Safra, and Y. Sagiv. Route search over probabilistic geospatial data. In International Symposium onSpatial and Temporal Databases, pages 153–170. Springer, 2009.

[32] S. Kinsella, V. Murdock, and N. O’Hare. ”I’M Eating a Sandwich in Glasgow”: modeling locations with tweets.In Proceedings of the 3rd International Workshop on Search and Mining User-generated Contents, pages 61–68,Glasgow, Scotland, UK, 2011. ACM.

[33] E. Kravi, E. Agichtein, I. Guy, Y. Kanza, A. Mejer, and D. Pelleg. Searcher in a strange land: Understanding websearch from familiar and unfamiliar locations. In Proceedings of the 38th International ACM SIGIR Conference onResearch and Development in Information Retrieval, pages 855–858. ACM, 2015.

[34] R. Lan, M. D. Adelfio, and H. Samet. Spatio-temporal disease tracking using news articles. In Proceedings of theThird ACM SIGSPATIAL International Workshop on the Use of GIS in Public Health, pages 31–38. ACM, 2014.

[35] R. Lee, S. Wakamiya, and K. Sumiya. Discovery of unusual regional social activities using geo-tagged microblogs.World Wide Web, 14(4):321–349, 2011.

[36] R. Levin, E. Kravi, and Y. Kanza. Concurrent and robust topological map matching. In Proceedings of the 20thInternational Conference on Advances in Geographic Information Systems, pages 617–620. ACM, 2012.

[37] X. Lu, C. Wang, J.-M. Yang, Y. Pang, and L. Zhang. Photo2trip: generating travel routes from geo-tagged photos fortrip planning. In Proceedings of the 18th ACM international conference on Multimedia, pages 143–152. ACM, 2010.

[38] S. E. Middleton, L. Middleton, and S. Modafferi. Real-time crisis mapping of natural disasters using social media.IEEE Intelligent Systems, 29(2):9–17, 2014.

[39] P. Newson and J. Krumm. Hidden markov map matching through noise and sparseness. In Proceedings of the 17thACM SIGSPATIAL international conference on advances in geographic information systems, pages 336–343, 2009.

[40] J. Ni, C. V. Ravishankar, and B. Bhanu. Probabilistic spatial database operations. In International Symposium onSpatial and Temporal Databases, pages 140–158. Springer, 2003.

[41] J. Niedermayer, A. Zufle, T. Emrich, M. Renz, N. Mamoulis, L. Chen, and H.-P. Kriegel. Probabilistic nearestneighbor queries on uncertain moving object trajectories. Proc. VLDB Endow., 7(3):205–216, 2013.

[42] A. Papadimitriou, P. Symeonidis, and Y. Manolopoulos. Geo-social recommendations. In RecSys 2011 Workshop onPeMA, 2011.

[43] B. Pat, Y. Kanza, and M. Naaman. Geosocial search: Finding places based on geotagged social-media posts. InProceedings of the 24th International Conference on World Wide Web, pages 231–234. ACM, 2015.

9

Page 8: Uncertainty in Geosocial Data: Friend or Foe? · to create geotagged content. In geotagged content, social-media items, such as photos or textual messages, are associated with a tag

[44] K. Patroumpas, M. Papamichalis, and T. Sellis. Probabilistic range monitoring of streaming uncertain positions ingeosocial networks. In International Conference on Scientific and Statistical Database Management, pages 20–37.Springer, 2012.

[45] R. Priedhorsky, A. Culotta, and S. Y. Del Valle. Inferring the origin locations of tweets with quantitative confidence.In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing,pages 1523–1536, Baltimore, Maryland, USA, 2014. ACM.

[46] K. P. Puttaswamy, S. Wang, T. Steinbauer, D. Agrawal, A. El Abbadi, C. Kruegel, and B. Y. Zhao. Preserving locationprivacy in geosocial applications. IEEE Transactions on Mobile Computing, 13(1):159–173, 2014.

[47] M. A. Quddus, W. Y. Ochieng, and R. B. Noland. Current map-matching algorithms for transport applications: State-of-the art and future research directions. Transportation research part c: Emerging technologies, 15(5):312–328,2007.

[48] D. Rout, K. Bontcheva, D. Preotiuc-Pietro, and T. Cohn. Where’s@ wally?: a classification approach to geolocatingusers based on their social ties. In Proceedings of the 24th ACM Conference on Hypertext and Social Media, pages11–20. ACM, 2013.

[49] A. Sadilek, H. A. Kautz, and V. Silenzio. Predicting disease transmission from geo-tagged micro-blog data. In AAAI,2012.

[50] E. Safra, Y. Kanza, N. Dolev, Y. Sagiv, and Y. Doytsher. Computing a k-route over uncertain geographical data. InProceedings of the 10th International Conference on Advances in Spatial and Temporal Databases, pages 276–293,Boston, MA, USA, 2007. Springer-Verlag.

[51] T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors.In Proceedings of the 19th international conference on World wide web, pages 851–860. ACM, 2010.

[52] A. Schulz, A. Hadjakos, H. Paulheim, J. Nachtwey, and M. Muhlhauser. A multi-indicator approach for geolocaliza-tion of tweets. In ICWSM, 2013.

[53] A. P. Sistla, O. Wolfson, and B. Xu. Continuous nearest-neighbor queries with location uncertainty. The VLDBJournal, 24(1):25–50, 2015.

[54] G. Skoumas, D. Pfoser, and A. Kyrillidis. On quantifying qualitative geospatial data: A probabilistic approach. InProceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered GeographicInformation, pages 71–78, Orlando, Florida, 2013. ACM.

[55] G. Trajcevski, R. Tamassia, I. F. Cruz, P. Scheuermann, D. Hartglass, and C. Zamierowski. Ranking continuousnearest neighbors for uncertain trajectories. The VLDB Journal, 20(5):767–791, Oct. 2011.

[56] G. Trajcevski, R. Tamassia, H. Ding, P. Scheuermann, and I. F. Cruz. Continuous probabilistic nearest-neighborqueries for uncertain trajectories. In Proceedings of the 12th International Conference on Extending Database Tech-nology: Advances in Database Technology, pages 874–885, Saint Petersburg, Russia, 2009. ACM.

[57] G. Trajcevski, O. Wolfson, K. Hinrichs, and S. Chamberlain. Managing uncertainty in moving objects databases.ACM Trans. Database Syst., 29(3):463–507, 2004.

[58] C. R. Vicente, D. Freni, C. Bettini, and C. S. Jensen. Location-related privacy in geo-social networks. IEEE InternetComputing, 15(3):20–27, 2011.

[59] C. E. White, D. Bernstein, and A. L. Kornhauser. Some map matching algorithms for personal navigation assistants.Transportation research part c: emerging technologies, 8(1):91–108, 2000.

[60] C. Xia, R. Schwartz, K. Xie, A. Krebs, A. Langdon, J. Ting, and M. Naaman. Citybeat: real-time social mediavisualization of hyper-local city data. In Proceedings of the 23rd International Conference on World Wide Web,pages 167–170. ACM, 2014.

[61] B. Zhang and G. Trajcevski. The tale of (fusing) two uncertainties. In Proceedings of the 22Nd ACM SIGSPATIALInternational Conference on Advances in Geographic Information Systems, pages 521–524, Dallas, Texas, 2014.

[62] K. Zheng, G. Trajcevski, X. Zhou, and P. Scheuermann. Probabilistic range queries for uncertain trajectories on roadnetworks. In Proceedings of the 14th International Conference on Extending Database Technology, pages 283–294,Uppsala, Sweden, 2011. ACM.

10