big data, big tourism
TRANSCRIPT
Big Data, Big TourismTourism and Mechanics
https://www.slideshare.net/sirmmo/big-data-big-tourism
What are «Big Data»?
• Excel gets stuck working a dataset? => «medium» data
• Stata/R suffer working a dataset? => «big» data
Where do we get the data?
• Tourists• Have sensors
• Are sensors
• Are actors
• Attractions• Are sensors
• Are actors
• Hotels, restaurants• Are sensors
• Have sensors
Can we access the data?
• Tourists• Have sensors
• Are sensors
• Are actors
• Attractions• Are sensors
• Are actors
• Hotels, restaurants• Are sensors
• Have sensors
Can we access the data?
• Tourists• Have sensors
• Are sensors
• Are actors
• Attractions• Are sensors
• Are actors
• Hotels, restaurants• Are sensors
• Have sensors
Can we access the data?
• Tourists• Have sensors
• Are sensors
• Are actors
• Attractions• Are sensors
• Are actors
• Hotels, restaurants• Are sensors
• Have sensors
Government
Can we access the data?
• Tourists• Have sensors
• Are sensors
• Are actors
• Attractions• Are sensors
• Are actors
• Hotels, restaurants• Are sensors
• Have sensors
Private Sector
Can we access the data?
• Tourists• Have sensors
• Are sensors
• Are actors
• Attractions• Are sensors
• Are actors
• Hotels, restaurants• Are sensors
• Have sensors
Private SectorGovernment
Open(able/ish) Data
Almostalways
Ok so who owns that data?
• Government• Bureaucracy-driven data• Incoherent• Inconsistent• Irregular production
• Private Sector• Deeply integrated with user
experience• Very «behavioral», and as such
very «real»• Very business-oriented metrics
Ok so who owns that data?
• Government• Bureaucracy-driven data• Incoherent• Inconsistent• Irregular production
• Private Sector• Deeply integrated with user
experience• Very «behavioral», and as such
very «real»• Very business-oriented metrics
Ok so who owns that data?
• Government• Bureaucracy-driven data• Incoherent• Inconsistent• Irregular production
• Private Sector• Deeply integrated with user
experience• Very «behavioral», and as such
very «real»• Very business-oriented metrics
Scraping
• Time consuming
• Power consuming
• Illegal (up to a certain point)
• Unavoidable (up to a certainpoint)
Scraping
• It relies on the fact that (most) web is based on HTML• And HTML is text
• And JavaScript is text
• And CSS is text
• Everything can be read beforethe render…
Scraping
• It relies on the fact that (most) web is based on HTML• And HTML is text
• And JavaScript is text
• And CSS is text
• Everything can be read beforethe render…
• Or after the render
Tools
• Not easy for «complex» sites• Some cases come up
• Some tools help• Maybe knowledge of Xml Query
Language or CSS required
• Some tools are very advanced• Selenium browser driver
• «headless» browsers
• Chrome• https://chrome.google.com/webstore/detai
l/scraper/mbigbapnjcgaffohmbkdlecaccepngjd?hl=en
• https://chrome.google.com/webstore/detail/web-scraper/jnhgnonknehpejjnehehllkliplmbmhn?hl=en
• https://chrome.google.com/webstore/detail/advanced-web-scraper/gpolcofcjjiooogejfbaamdgmgfehgff
• Firefox• https://addons.mozilla.org/en-
US/firefox/addon/datascraper/
• Web• https://www.import.io/• https://scrapinghub.com/portia/
Cases and issues of scraping
• Booking.com • Amazing website
• Easy navigation for the user
• Issues• They know!!!
• The website gets a complete structural overhaul every 6-9 months
• They tend to hate scrapers
• The webpage is empty at the beginning
Cases and issues of scraping
• Booking.com • Amazing website
• Easy navigation for the user
• Issues• They know!!!
• The website gets a complete structural overhaul every 6-9 months
• They tend to hate scrapers
• The webpage is empty at the beginning
Cases and issues of scraping
• AirBnB• Nice navigation
• Full overhaul every 3 months
• Issues• The page really tracks what kind of
user is accessing
• The visible pages are 13 (only)
• They are randomly generatedevery day for the major areas
Cases and issues of scraping
• Weather• Many sources
• Many formats
• Issues• Normalization of vocabulary
• Bad weather == Rain == Rainy == Cloud Icon == ???
• Normalization of ranges
• Normalization of numbers
• Normalization of periodicity
Apps
Questionnaireto get user to explicitly givedata
Information drivenapplication to track user data
Gamificationand/or information platform to elaborate and give data back
Explicit data
• Relies on the user’s knowingactions
• Requires real willing acceptancefor sharing information
• Stops at politically correctness
• Implies (almost always) anonimity
• Questionnaire
• In-place review
• In-place comment
• Bureaucracy
Behavioral data
• Almost always true
• Difficult to get
• Easily contextualizable
• Interactive
• Interconnected
• Application
• Platform
• Social Media integration
• Gamification
• Social Media involvement
Cool, so what can be done?
Getting Data
• Municipalities are setting up open wireless networks. • Users can be tracked.
• Services can be offered (and instrumented)
• Museums can track users withintheir premises
• Social Media interactions
Using Data
• Analysis of context of specificbehaviours
• Automated storytelling for city visits
• Pricing methodologies
• Destination brand analysis
Big and Big-ish Data Tools
• The problem is computationalpower
• Lots of work on AI• Classification
• Generation
• Machine Learning
• Correlations
• DataWarehouses• Mondrian -
http://community.pentaho.com/projects/mondrian/
• Big Data DBs• Cassandra - http://cassandra.apache.org/• Hadoop - http://hadoop.apache.org/
• Big Data Search• BigQuery -
https://cloud.google.com/bigquery/• GraphQL - http://graphql.org/
• Big Data AI/ML• TensorFlow -
https://www.tensorflow.org/• ScikitPy - https://www.scipy.org/
A few open questions
• Impact of crowdfunding on tourism-bound projects
• Impact of meta-search-engines on pricing
• Impact (or lack thereof) of destination information websites on user decisions
• How can the user be «vetted» in order to tailor the touristicexperience around her?• Would such vetting process impact on customer return decisions?
One more thing: Watch out!!
Thanks! Questions?
@[email protected]://ingmmo.com, https://medium.com/@ingmmosirmmohttp://it.linkedin.com/in/montanarim/https://www.facebook.com/marco.montanarimarco.montanari
https://www.slideshare.net/sirmmo/big-data-big-tourism