1 supported by eu projects 12/12/2013 athens, greece open data in agriculture hands-on with data...

30
1 Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products

Upload: maud-welch

Post on 25-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

1

Supported by EU projects

12/12/2013Athens, Greece

Open Data in Agriculture

Hands-on with data infrastructures that can power your agricultural data

products

2

OpenLearn andthe SPARQL endpoint

Maths, Computing and Technology FacultyThe Open UniversityWalton HallMilton KeynesMK7 6AA

www.open.ac.ukmct-research.open.ac.uk

Jane Bromley David King David Morse

4

*open*

5

Objectives

An introduction to the Open University’s free material

• Show available metadata

• Talk about RDF – the format used for graph databases

• How to query the material through SPARQL

6

http://www.open.edu/openlearn/body-mind/the-real-story-behind-cereals

7

http://www.open.edu/openlearn/nature-environment/good-food-destroying-biodiversity

8

http://www.open.edu/openlearn/science-maths-technology/science/biofuels/content-section-0

9

Open Research Online – publications originating from OU researchersOU PodcastsCourse DescriptionsSome KMi datasetsAnd…

10

http://data.open.ac.uk/site/datasets.html

Available through standard formats (RDF and SPARQL)

11

Resource Description Framework • one of the basic building blocks forming web of semantic data• defines a graph database• format defines statements comprising:

Subject is the T-shirt Predicate (property) is the colour Object is white

subject->predicate->object relationship is called a triple.

RDF

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:feature="http://www.linkeddatatools.com/clothing-features#">

<rdf:Description rdf:about="http://www.linkeddatatools.com/clothes#t-shirt <feature:color rdf:resource="http://www.linkeddatatools.com/colors#white"/> </rdf:Description></rdf:RDF>

RDF/XML - the XML form of RDF

12

http://data.open.ac.uk/query

The SPARQL endpoint

13

select distinct ?props from <http://data.open.ac.uk/context/openlearn> where { ?subj ?props ?obj }

14

15

http://www.open.edu/openlearn/science-maths-technology/science/biofuels/content-section-0

16 http://data.open.ac.uk/page/openlearn/s173_1

17

How to find agriculturally useful material in OpenLearn?

18

A three step process:

1. Find all the subjects and chose those relevant to agriculture

2. Find all the OpenLearn Units that have just these subjects

3. Collect the metadata for each of the selected Open Learn units

19

20

(1130) as of end of October 2013http://data.open.ac.uk/topic/psychologyhttp://data.open.ac.uk/topic/sociologyhttp://data.open.ac.uk/topic/social_carehttp://data.open.ac.uk/topic/educational_practicehttp://data.open.ac.uk/topic/biologyhttp://data.open.ac.uk/topic/herbicideshttp://data.open.ac.uk/topic/energyofficial1342688874openlearn_teamadminhttp://data.open.ac.uk/topic/unitsdefault1330523206frank_siebertzz884926http://data.open.ac.uk/topic/pre_course_workdefault1263940536linda_smithlps32http://data.open.ac.uk/topic/employmentofficial1342688874richard_howesrh4685http://data.open.ac.uk/topic/using_mathsdefault1231080717peter_mcalisterzz298445http://data.open.ac.uk/topic/numbersdefault1330523196elizabeth_ellisee944http://data.open.ac.uk/topic/nuclearofficial1342688874lucy_hendylmf7http://data.open.ac.uk/topic/environmental_sciencehttp://data.open.ac.uk/topic/audiohttp://data.open.ac.uk/topic/cctvhttp://data.open.ac.uk/topic/social_workhttp://data.open.ac.uk/topic/scotlandhttp://data.open.ac.uk/topic/personalisationhttp://data.open.ac.uk/topic/religious_studieshttp://data.open.ac.uk/topic/religion…

21

40 topics chosen:<http://data.open.ac.uk/topic/agriculture>, <http://data.open.ac.uk/topic/environment>, <http://data.open.ac.uk/topic/the_environment>, <http://data.open.ac.uk/topic/nature_&amp_environment> <http://data.open.ac.uk/topic/environmental_science>,<http://data.open.ac.uk/topic/herbicides>,<http://data.open.ac.uk/topic/ecology>,<http://data.open.ac.uk/topic/genetics>,<http://data.open.ac.uk/topic/diversity>,<http://data.open.ac.uk/topic/global_warming>,<http://data.open.ac.uk/topic/biodiversity>,<http://data.open.ac.uk/topic/pollution>,<http://data.open.ac.uk/topic/conservation>,<http://data.open.ac.uk/topic/the_environment>,<http://data.open.ac.uk/topic/climate>,<http://data.open.ac.uk/topic/environmental_studies>,<http://data.open.ac.uk/topic/climate_change>,<http://data.open.ac.uk/topic/sustainability>,<http://data.open.ac.uk/topic/biogas>,<http://data.open.ac.uk/topic/biofuels>,

<http://data.open.ac.uk/topic/photosynthesis>,<http://data.open.ac.uk/topic/waste_management>,<http://data.open.ac.uk/topic/landfill>,<http://data.open.ac.uk/topic/economic_growth>,<http://data.open.ac.uk/topic/waste>,<http://data.open.ac.uk/topic/acid_rain>, <http://data.open.ac.uk/topic/weather>, <http://data.open.ac.uk/topic/meteorology>, <http://data.open.ac.uk/topic/natural_resources>,<http://data.open.ac.uk/topic/animals>, <http://data.open.ac.uk/topic/ecological_sustainability>,<http://data.open.ac.uk/topic/overfishing>, <http://data.open.ac.uk/topic/ecosystem>, <http://data.open.ac.uk/topic/the_end_of_nature>,<http://data.open.ac.uk/topic/survival_of_the_fittest>,<http://data.open.ac.uk/topic/barter>,<http://data.open.ac.uk/topic/plants>,<http://data.open.ac.uk/topic/freshwater>,<http://data.open.ac.uk/topic/maps>,<http://data.open.ac.uk/topic/food>..

Topics relevant to agriculture?

22

A three step process:

1. Find all the subjects and chose those relevant to agriculture

2. Find all the OpenLearn Units that have just these subjects

3. Collect the metadata for each of the selected Open Learn units

23

select distinct ?olu from <http://data.open.ac.uk/context/openlearn>where { ?olu <http://purl.org/dc/terms/subject> ?topic . filter ( ?topic in ( <http://data.open.ac.uk/topic/agriculture>, <http://data.open.ac.uk/topic/environment>, .. .. etc. ) )}

→ 85 OpenLearn units

Units are extracts from OU courses with multiple pages of material and expected to take many hours of study.

24

http://data.open.ac.uk/openlearn/s250_3http://data.open.ac.uk/openlearn/sdk125_1http://data.open.ac.uk/openlearn/t123_1http://data.open.ac.uk/openlearn/t206_2http://data.open.ac.uk/openlearn/t213_1http://data.open.ac.uk/openlearn/s173_1http://data.open.ac.uk/openlearn/u116_3http://data.open.ac.uk/openlearn/s278_19http://data.open.ac.uk/openlearn/t306_3http://data.open.ac.uk/openlearn/s189_1http://data.open.ac.uk/openlearn/s344_1http://data.open.ac.uk/openlearn/s324_1http://data.open.ac.uk/openlearn/s250_2……

25

http://data.open.ac.uk/openlearn/s250_2http://www.open.edu/openlearn/science-maths-technology/science/ environmental-science/social-issues-and-gm-crops/content-section-0

This unit is an adapted extract from the course Science in context (S250)

26

A three step process:

1. Find all the subjects and chose those relevant to agriculture

2. Find all the OpenLearn Units that have just these subjects

3. Collect the metadata for each of the selected Open Learn units

27

import urllib.parseimport urllib.request

# To run: python get_SPARQL_from_OpenData.py# Edit this file in two places to choose output format as json or rdf/xml

def run_SPARQL(course_id): ''' returns results of SPARQL query''' # EDIT HERE # place course_id in request # req = urllib.request.Request('http://data.open.ac.uk/openlearn/{}'.format(course_id), headers={'Accept': 'application/rdf+json'}) req = urllib.request.Request('http://data.open.ac.uk/openlearn/{}'.format(course_id), headers={'Accept': 'application/rdf+xml'}) # fire off the query f = urllib.request.urlopen(req) # pass back the query result having rendered it readable first return(f.read().decode('utf-8'))

if __name__ == '__main__': llist = ['a180_2', 'b823_1', 'd837_1', 'dd100_7', 'e500_11', 'k111_1', …] for course_id in llist: print(course_id) # run query with chosen course id # result = run_SPARQL(course_id) # EDIT HERE # with open('{}.json'.format(course_id), 'w', encoding='utf-8', newline='\n') as f: with open('{}.xml'.format(course_id), 'w', encoding='utf-8', newline='\n') as f: f.write(result)

Python script to dump the metadata

28

{ "http://data.open.ac.uk/openlearn/s250_2" : { "http://purl.org/dc/terms/language" : [ { "type" : "literal" , "value" : "en-gb" , "datatype" : http://www.w3.org/2001/XMLSchema#string } ] , "http://data.open.ac.uk/openlearn/ontology/relatesToCourse" : [ { "type" : "uri" , "value" : http://data.open.ac.uk/course/s250 } ] ,

"http://purl.org/dc/terms/title" : [ { "type" : "literal" , "value" : "Social issues and GM crops" , "datatype" : http://www.w3.org/2001/XMLSchema#string }……

<rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:j.0=http://dbpedia.org/property/ xmlns:j.1="http://xmlns.com/foaf/0.1/" xmlns:j.3=http://web.resource.org/cc/ xmlns:j.2=http://www.w3.org/TR/2010/WD-mediaont-10-20100608/ xmlns:j.4=http://purl.org/dc/terms/ xmlns:j.5=http://data.open.ac.uk/openlearn/ontology/ xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <j.1:Document rdf:about="http://data.open.ac.uk/openlearn/s250_2"> <j.2:locator rdf:resource="http://www.open.edu/openlearn/nature-environment/the-environment/environmental-science /social-issues-and-gm-crops/content-section-0"/> <j.5:relatesToCourse rdf:resource="http://data.open.ac.uk/course/s250"/> <j.4:creator rdf:resource="http://data.open.ac.uk/organization/the_open_university"/> <j.4:subject rdf:resource="http://data.open.ac.uk/topic/risk"/> <j.4:published rdf:datatype=http://www.w3.org/2001/XMLSchema#dateTime >2011-06-02T23:00:00Z</j.4:published>……

rdf/xml format

json format

29

Summary:

A three step process:

1. Find all subjects/keywords relevant to agriculture 2. Identify OpenLearn Units with these subjects 3. Collect the metadata for each Open Learn unit

All the scripts (and more) are available