a practical approach to big data in tourism: a low cost raspberry pi cluster
Upload: international-federation-for-information-technologies-in-travel-and-tourism-ifitt
Post on 15-Jul-2015
27 views
TRANSCRIPT
ENTER 2015 Research Track Slide Number 1
A practical approach to big data in tourism:
a low cost Raspberry Pi cluster
Mariano d’Amore, Rodolfo Baggio, and Enrico ValdaniBocconi University, Italy
ENTER 2015 Research Track Slide Number 4
OSN & Big Data
• Online Social Networks (OSNs) contain innumerable trails of people’s activities
• Big Data
– an incredible opportunity for its supposed capacity to provide answers to practically any question that could be asked about people’s behaviors, views and feelings
ENTER 2015 Research Track Slide Number 5
Big Data: what
• Size, but mainly fragmentation, variability, and mixture of structured and unstructured data
• Tend to give rigorous answers to ambiguous questions– overrate irrelevant phenomena, unjustified clusterings etc.
• Risky correlations– significant but meaningless correlations, correlation/causation …
• Can be influenced and biased (Google bombing)
– watch out for propagation of wrong results
• Semantic analysis still too «clumsy»– language and cultural difficulties
• Hype: as for all fashions: many excesses …• Good complement to «traditional» research, but no total
replacement• Still need clear objectives and rigorous data collection
ENTER 2015 Research Track Slide Number 6
Big Data: how
• Data collection from OSNs via APIs:
– from providers (Gnip, Datasift, Topsy)
– plugins for other SW applications (NodeXL, Gephi)
– autonomous equipment• some programming skills (open source libraries)
– Python
• dedicated hardware
– long elapsed times due to APIs constraints
– NB: better use more machines in parallel to avoid IP blocking
ENTER 2015 Research Track Slide Number 7
Big Data research
• Research approach remains the same, the use of technology makes the process different
• Without a strong background on research methodology, use of technology alone is useless
ENTER 2015 Research Track Slide Number 8
Our work: objective
• Build a system for accessing OSNs Big Data autonomously
– scalable open source technology
– make the collection of OSN data manageable while focused on specific objectives
– functionality demonstration via a simple mapping ‘exercise’
• A solution:
– cluster of Raspberry Pi machines running Linux with Python programs for accessing OSN APIs
ENTER 2015 Research Track Slide Number 11
A credit card sized single-board low-cost computer developed by
the Raspberry Pi Foundation (www.raspberrypi.org)
• Embedded Single Board Computer (SBC) – small size: 85,60mm x 53,98mm (credit-card size)
• Project started in 2006 – objective: build a cheap open source computer for educational
purposes (cost = $35)
• Officially available on 29 February 2012 at 6.00 UTC – original plan: 10 000 pieces– sold (mid 2014) > 3 million machines
ENTER 2015 Research Track Slide Number 12
Technicalities• SoC (System on Chip) Broadcom BCM2835
– CPU 700 MHz ARM11 ARM1176JZF-S (ARMv6)– No real time clock– GPU Broadcom VideoCore IV, OpenGL ES 2.0, OpenVG 1080p30 H.264 high-profile
encode/decode– SDRAM 256/512MB partially shared with GPU
• SDcard as mass memory (4GB; Class 4)
• GPIO & UART 26 pin connector
• HDMI + RCA Video Composite + 3.5mm stereo jack audio
• DSI (Serial Digital Interface) & CSI (Camera Serial Interface) connectors
• 5 status LEDs, Powered DC 5V, 1A via Micro USB connector
• Two models– A: 256MB SDRAM, 1 USB2 port via BCM2835, NO ethernet– B: 512MB SDRAM, 2 USB2 ports via LAN9512, Ethernet 10/100Mbps– new models arriving…
• Linux based OSs (Raspbian)– Python, etc.
• More info on Embedded Linux Wiki (elinux.org/R-Pi_Hub)
ENTER 2015 Research Track Slide Number 20
Geolocation exercise
• Retrieved all geotagged elements– Python OSN API libraries– Area: Lugano 5 km– Sources: Facebook, Twitter, Instagram, Foursquare– Time period: 2 weeks– Heatmap: heatmap.js on OpenStreetMap– Markers map: Google Maps API
in 2 weeks, collected:
ENTER 2015 Research Track Slide Number 22
Markers map
round solid=Facebook
round light=Twitter
square light=Instagram
square solid=Foursquare
ENTER 2015 Research Track Slide Number 23
Concluding remarks
• Big Data
– good opportunity, but a number of drawbacks & issues
– one problem is resources needed (mainly for SMEs)
• Rasberrry Pi
– a low-cost computing usable system
• Field test: Raspberry Pi cluster
– low-cost (both acquisition & operations)
– «easy» implementation• open source libraries for accessing OSN make task relatively
simple and affordable also by SMEs
– now fully functional