access open data with open source software tools

51
Access Open Data with Open Source Software Tools Sammy Fung [email protected]

Upload: sammy-fung

Post on 14-Jul-2015

134 views

Category:

Technology


1 download

TRANSCRIPT

Access Open Datawith Open Source

Software Tools

Sammy Fung

[email protected]

Sammy Fung

● Developer

● Founder, JobFOL

● President of Open Source Hong Kong

Creatingvalues to us

and community

Open Data

Open Data

● Discoverable

– Available and Searchable on Internet.

● Structured

– Open and Machine-readable Format.

● Unconditional

– Legal Framework allows to reproduce an repurposethe data.

Open Source

Open Source

● Software Development Model

● Free Software (1985)

– Free = Freedom

– Run the program (Freedom 0)

– Study the source code and change it (Freedom 1)

– Redistribute copies (Freeom 2)

– Distribute your modified version in same license (Freedom3)

● Open Source (1998)

Open Source Web ApplicationSoftware Stack

● LAMP

– Linux (1991): Operating System

– Apache (1995): Web Server

– MySQL (1995): Database Server

– PHP (1995): Server-side Scripting Language

● Other Alternatives:

– LNMP: Replacing Apache with Nginx

– Another M of LAMP: MariaDB, MongoDB

Python

● Programming Language

– Since 1991

– Widely used general purpose

– High-level

– Open Source

● Another P of LAMP

My Open Data related Projects

● TV Timetable of Live Football Matches (2004)

● Weather Information (2006)

● Public Transportation Information (2006)

● LegCo Vote Information (2013)

● Air Quality Information (2014)

● Restaurant Information (2014)

TCTrack

● Plot a map of typhoon path of different observationagencies

● Google Map API

– First Typhoon Map in HK using Google API

– Sammy.HK TCTrack → Weather Underground → Hong KongObservatory

● Twitter API

– Posting typhoon updates from any potential formation oftropcial cyclone in Northwest Pacific Ocean.

● Data Sources: HKO, JTWC.

Interview by MetroPop in 2009

Open Data onHong Kong

Restaurant &Food Licenses

Licensed Restaurants in Hong Kong

● Open Data from Data.One PSI

● Open Source Software Tools

– Python

– Scrapy Web Scraping Framework

● Source Codes are released on GitHub

– https://github.com/sammyfung/LP_Restaurants_Scrapy

Creating environment of a Scrapy project

● Requirements

– Python, Python-Dev, virtualenv, pip

● Creating a virtual enviornment for pythonproject

– virtualenv ~/env

– source ~/env/bin/activate

– pip install scrapy

Creating a Scrapy project

● Creating a new Scrapy project with spider

– scrapy startproject LP_Restaurants_Scrapy

– cd LP_Restaurants_Scrapy

– scrapy genspider rlxml fehd.gov.hk

● Creating a scrapy data model

● Doing some tests with scrapy shell.

– scrapy shell <URL>

– http://www.fehd.gov.hk/english/licensing/license/text/LP_Restaurants_EN.XML

● Writing the parse function of a scrapy spider.

● Try and test the spider

– scrapy crawl rlxml -t json -o restaurant_licenses.json

Open Data

Open Source

Creatingvalues to us

and community