a practical approach to big data in tourism: a low cost raspberry pi cluster

23
ENTER 2015 Research Track Slide Number 1 A practical approach to big data in tourism: a low cost Raspberry Pi cluster Mariano d’Amore, Rodolfo Baggio, and Enrico Valdani Bocconi University, Italy

Category:

Education


1 download

TRANSCRIPT

ENTER 2015 Research Track Slide Number 1

A practical approach to big data in tourism:

a low cost Raspberry Pi cluster

Mariano d’Amore, Rodolfo Baggio, and Enrico ValdaniBocconi University, Italy

ENTER 2015 Research Track Slide Number 2

Dedicated to all those who loved messing around with:

ENTER 2015 Research Track Slide Number 3

OSN & Big Data

ENTER 2015 Research Track Slide Number 4

OSN & Big Data

• Online Social Networks (OSNs) contain innumerable trails of people’s activities

• Big Data

– an incredible opportunity for its supposed capacity to provide answers to practically any question that could be asked about people’s behaviors, views and feelings

ENTER 2015 Research Track Slide Number 5

Big Data: what

• Size, but mainly fragmentation, variability, and mixture of structured and unstructured data

• Tend to give rigorous answers to ambiguous questions– overrate irrelevant phenomena, unjustified clusterings etc.

• Risky correlations– significant but meaningless correlations, correlation/causation …

• Can be influenced and biased (Google bombing)

– watch out for propagation of wrong results

• Semantic analysis still too «clumsy»– language and cultural difficulties

• Hype: as for all fashions: many excesses …• Good complement to «traditional» research, but no total

replacement• Still need clear objectives and rigorous data collection

ENTER 2015 Research Track Slide Number 6

Big Data: how

• Data collection from OSNs via APIs:

– from providers (Gnip, Datasift, Topsy)

– plugins for other SW applications (NodeXL, Gephi)

– autonomous equipment• some programming skills (open source libraries)

– Python

• dedicated hardware

– long elapsed times due to APIs constraints

– NB: better use more machines in parallel to avoid IP blocking

ENTER 2015 Research Track Slide Number 7

Big Data research

• Research approach remains the same, the use of technology makes the process different

• Without a strong background on research methodology, use of technology alone is useless

ENTER 2015 Research Track Slide Number 8

Our work: objective

• Build a system for accessing OSNs Big Data autonomously

– scalable open source technology

– make the collection of OSN data manageable while focused on specific objectives

– functionality demonstration via a simple mapping ‘exercise’

• A solution:

– cluster of Raspberry Pi machines running Linux with Python programs for accessing OSN APIs

ENTER 2015 Research Track Slide Number 9

ENTER 2015 Research Track Slide Number 10

ENTER 2015 Research Track Slide Number 11

A credit card sized single-board low-cost computer developed by

the Raspberry Pi Foundation (www.raspberrypi.org)

• Embedded Single Board Computer (SBC) – small size: 85,60mm x 53,98mm (credit-card size)

• Project started in 2006 – objective: build a cheap open source computer for educational

purposes (cost = $35)

• Officially available on 29 February 2012 at 6.00 UTC – original plan: 10 000 pieces– sold (mid 2014) > 3 million machines

ENTER 2015 Research Track Slide Number 12

Technicalities• SoC (System on Chip) Broadcom BCM2835

– CPU 700 MHz ARM11 ARM1176JZF-S (ARMv6)– No real time clock– GPU Broadcom VideoCore IV, OpenGL ES 2.0, OpenVG 1080p30 H.264 high-profile

encode/decode– SDRAM 256/512MB partially shared with GPU

• SDcard as mass memory (4GB; Class 4)

• GPIO & UART 26 pin connector

• HDMI + RCA Video Composite + 3.5mm stereo jack audio

• DSI (Serial Digital Interface) & CSI (Camera Serial Interface) connectors

• 5 status LEDs, Powered DC 5V, 1A via Micro USB connector

• Two models– A: 256MB SDRAM, 1 USB2 port via BCM2835, NO ethernet– B: 512MB SDRAM, 2 USB2 ports via LAN9512, Ethernet 10/100Mbps– new models arriving…

• Linux based OSs (Raspbian)– Python, etc.

• More info on Embedded Linux Wiki (elinux.org/R-Pi_Hub)

ENTER 2015 Research Track Slide Number 13

RasPi cluster

ENTER 2015 Research Track Slide Number 14

RasPi cluster

ENTER 2015 Research Track Slide Number 15

RasPi cluster

ENTER 2015 Research Track Slide Number 16

RasPi cluster

ENTER 2015 Research Track Slide Number 17

RasPi cluster

ENTER 2015 Research Track Slide Number 18

An exercise in geolocation

ENTER 2015 Research Track Slide Number 19

Python programs

ENTER 2015 Research Track Slide Number 20

Geolocation exercise

• Retrieved all geotagged elements– Python OSN API libraries– Area: Lugano 5 km– Sources: Facebook, Twitter, Instagram, Foursquare– Time period: 2 weeks– Heatmap: heatmap.js on OpenStreetMap– Markers map: Google Maps API

in 2 weeks, collected:

ENTER 2015 Research Track Slide Number 21

Heatmap

ENTER 2015 Research Track Slide Number 22

Markers map

round solid=Facebook

round light=Twitter

square light=Instagram

square solid=Foursquare

ENTER 2015 Research Track Slide Number 23

Concluding remarks

• Big Data

– good opportunity, but a number of drawbacks & issues

– one problem is resources needed (mainly for SMEs)

• Rasberrry Pi

– a low-cost computing usable system

• Field test: Raspberry Pi cluster

– low-cost (both acquisition & operations)

– «easy» implementation• open source libraries for accessing OSN make task relatively

simple and affordable also by SMEs

– now fully functional