bbc programmes ontology xtech2008

43
The Programmes ontology Tom Scott & Nick Humfrey BBC FM&T for Audio & Music Clearly the BBC broadcasts a lot of programmes both on TV and Radio - but until recently we haven’t done a wonderful job of supporting these programmes on the web. What we would like to talk to you about today is a recent project - to provide all BBC programmes a permanent web presence and to formally describe and model this information through an ontology.

Upload: tom-scott

Post on 01-Nov-2014

16.731 views

Category:

Technology


3 download

DESCRIPTION

An overview of of the BBC's work on exposing an API for programme metadata as presented at XTech08. More information on the Radio Labs blog: http://www.bbc.co.uk/blogs/radiolabs/2008/05/helping_machines_play_with_pro.shtml

TRANSCRIPT

Page 1: BBC Programmes Ontology XTech2008

The Programmes ontology

Tom Scott & Nick Humfrey

BBC FM&T for Audio & Music

Clearly the BBC broadcasts a lot of programmes both on TV and Radio - but until recently we haven’t done a wonderful job of supporting these programmes on the web.

What we would like to talk to you about today is a recent project - to provide all BBC programmes a permanent web presence and to formally describe and model this information through an ontology.

Page 2: BBC Programmes Ontology XTech2008

Historically

The BBC has been providing programme support for sometime

The BBC has been building programme support websites for a while now - all the radio networks have websites and the major TV shows also have sites. So the BBC does have a legacy of providing programme support - unfortunately there are problems.

For starters it’s not complete - most programmes don’t have webpages and the support sites that do exist aren’t comprehensive, they don’t provide a complete record of a programme... for example generally we haven’t published episode pages.

So there are gaps. Also...

Page 3: BBC Programmes Ontology XTech2008

But the data has been siloed...

Historically

flickr.com/photos/sheeshoo/13902422/The data has been siloed - it’s not easily accessible to the rest of the web.

The data is trapped within a bunch of web pages - with little or no thought for machine consumption.

Historically our focus has been on building microsites for end users - hand crafted html pages, without a consideration for their semantic mark-up. Let alone exposing the data in a structured fashion via an API.

Page 4: BBC Programmes Ontology XTech2008

...and ephemeral

Historically

banksy.co.uk

And unfortunately we also have a tendency to delete our web pages or reuse our URLs.

So the data we do make available doesn’t hang around on the web.

Page 5: BBC Programmes Ontology XTech2008

...which is a shame because we broadcast between 1,000 and 1,500 programmes a day

Historically

flickr.com/photos/jamescridland/18768141/

During the week and excluding local radio programme the BBC broadcasts about 1,500 programmes a day.

So if we can solve these problems then we can make a lot of useful metadata and media available to the web.

And this is where BBC Programmes comes in.

Page 6: BBC Programmes Ontology XTech2008

/programmes

BBC Programmes launched in October aims to solve these problems

We launched BBC Programmes last October with the aim of providing every programme a permanent, findable place on the web.

We also wanted to make a fully fledged web 2.0 citizen; exposing BBC Programmes using the principles of Linked Data.

We wanted to make a RESTful API exposing the data in a variety of formats for people and machines.

Page 7: BBC Programmes Ontology XTech2008

One page per programme

At the heart of this service is the idea of one URL per programme. Or more accurately a page for every Episodes, Series and Programme Brand.

All with unique IDs - and persistent URLs

Page 8: BBC Programmes Ontology XTech2008

Useful aggregations

Date (schedule)

Genre

Format

A-Z

Topic (coming soon)

We’ve also publishing a bunch of aggregations for example genre, format, schedules.

And in the very near future by subject or topic and music artist.

Page 9: BBC Programmes Ontology XTech2008

Design approach

Identify your resources

Make them addressable

Combine resources to make a page

banksy.co.uk

I wanted to touch on our design approach because it has made our lives very much easier when developing the ontology and in creating alternate views.

How we’ve gone about this is to first identify our resources - those things that we want to reuse or let people consume.

Once we’ve done that we need to make them addressable at persistent URLs. So that each resource, each concept has it’s own URL

And then combine those resources into public facing web pages. In other words each of our web pages are made up of reusable resources.

Page 10: BBC Programmes Ontology XTech2008

Resources

So for example if you take an episode page it’s made up of...

Page 11: BBC Programmes Ontology XTech2008

Resources

Embedded Media Player (iPlayer)

The embedded iPlayer

Page 12: BBC Programmes Ontology XTech2008

Resources

bbc.co.uk/programmes/:id/credits

A list of cast and crew. Addressable at /programmes/:id/credits

Page 13: BBC Programmes Ontology XTech2008

Resources

bbc.co.uk/programmes/:id/broadcasts

And a list of broadcasts at /programmes/:id/broadcasts (not currently linked in)

Page 14: BBC Programmes Ontology XTech2008

Mobile representations

Different set of resources

Different view

This means that we creating alternate representations relatively straightforward.

We’re working on a mobile view. But rather than creating an entirely separate parallel site or applying a mobile css to the existing site we adopted a third option. Instead we can pick and choose from a common set of resources but combine them in a different fashion to create a mobile site optimised for that platform.

And the same logic applies when creating the RSS, RDF or any other machine views.

Page 15: BBC Programmes Ontology XTech2008

JSON

YAML

XML

RDF

Microformats

Linked data

flickr.com/photos/jazzmasterson/3038597/

All we need to do is decide what resources need to be made available and a set of specific views to create different representation of this data for machines to consume.

We’re in the process of making RDF, JSON, RSS, ATOM, iCal views. Oh and by the way all the html pages are marked up with microformats.

We are doing all this with the objective of make it as easy as possible for people to integrate with our content.

And this resource centric approach has made this job easier - it also made designing the ontology very straight forward - because in essence the resources were the classes within the ontology.

We designed the site to be a web of data from the outset.

Page 16: BBC Programmes Ontology XTech2008

The ontologyBrands

Series

Episodes

Programme

Service

Version

Event Broadcast

Content

Publishing

purl.org/ontology/po/

(Creative Commons license)

This then is the ontology.

Its released under a creative commons license so anyone is free to use it or modify it

It provides web identifiers for the concepts that make up a tv or radio programme such as brand, series, and episode. It’s divided into two main parts. First, it captures categorical information about programmes, and the relations between such categories.

Page 17: BBC Programmes Ontology XTech2008

The ontology

Brands

Series

Episodes

Programme

ServiceContent

So the content.

A programme must have at least one episode but multiple episode can be grouped into one or more series or a brand.

You can think of series and brands as a decorator pattern to the episode.

In any case an episode or set of episodes grouped into series and brands constitutes a programme which in turn is then owned by a service, such as Radio 1 or BBC 2.

Page 18: BBC Programmes Ontology XTech2008

Brand

So for example Waking the Dead programme brand.

Page 19: BBC Programmes Ontology XTech2008

Series

Which is currently in it’s 7th series.

Page 20: BBC Programmes Ontology XTech2008

Sub series

Each weekly story is divided into...

Page 21: BBC Programmes Ontology XTech2008

Episode

Two individual episodes.

All these are independent of the broadcast - and it is these content objects that are the primary, addressable resources within BBC Programmes. Not the individually broadcasts.

Page 22: BBC Programmes Ontology XTech2008

Episode

It’s worth noting at this point that the idea is not for Programmes to become a new brand nor portal.

Rather all the major brands - those with existing sites - will incorporate these pages into their sites.

Page 23: BBC Programmes Ontology XTech2008

The ontology

Service

Version

Event Broadcast

Publishing

The other half of the ontology addresses the publishing of that content.

An episode can have multiple versions - for example Torchwood has an adults and a children’s version.

A service broadcasts a version.

Page 24: BBC Programmes Ontology XTech2008

URL design

bbc.co.uk{/:service}/programmes/genres/:genre

bbc.co.uk/bbcone/programmes/genres/music

Our primary focus when it came to designing the URLs for BBC Programmes was to ensure that they remain persistent – with each URL representing a single concept.

What we’ve ended up with are two classes of page: aggregations and objects.

The aggregations are human readable, easily hackable and return a list of objects. They include schedule views, aggregation by genre, format and a to z.

For example, here is the url for a list of music programmes on BBC 1

Page 25: BBC Programmes Ontology XTech2008

URL design

bbc.co.uk/programmes/:id

bbc.co.uk/programmes/b00b257s

The objects however follow a simpler pattern:bbc.co.uk/programmes/:id

Where :id is an eight digit alphanumeric code.

We’ve been asked before why the urls are opaque

Our decision to use opaque URLs was driven by the need to provide persist identifiers. If we had included information about the programme brand (e.g. the Today Programme) or service (e.g. Radio 3) then there would have been a high risk that the URLs would have either changed or no longer reflected the ‘owning’ brand since programmes are often rebroadcast on different services. By stripping the URL back to a unique identifier we removed any future (or current) ownership issues.

Page 26: BBC Programmes Ontology XTech2008

Show me the data

Each representation is addressed by appending the name of the appropriate serialization to the end of the URL

Currently we have XML, YAML and JSON representations for the various schedules.

These can be accessed by added the name of the appropriate serilisation to the end of the URL.

Page 28: BBC Programmes Ontology XTech2008

YAML

This is what you get

Page 29: BBC Programmes Ontology XTech2008

Show me the XML

bbc.co.uk/radio4/programmes/schedules/fm.xml

Or Radio 4’s FM schedule as XML

Page 30: BBC Programmes Ontology XTech2008

XML

Returns this

Page 31: BBC Programmes Ontology XTech2008

RDF

Not quite live yet (sorry)

Try it out at:

http://bbc-programmes.dyndns.org

Page 32: BBC Programmes Ontology XTech2008

RDF

Page 33: BBC Programmes Ontology XTech2008

RDF

Page 34: BBC Programmes Ontology XTech2008

RDF

Page 35: BBC Programmes Ontology XTech2008

radio POP

The near future

More representations

SPARQL interface

APML

XMPP

Music and more

Sky Captain and the World of Tomorrow

In the very near future we’ll be launching with more representations: for example iCal, Atom, RSS

and all of the view - not just the current limited set.

Page 36: BBC Programmes Ontology XTech2008

SPARQL

D2R Server

Maps relational database to RDF

Around 5 million RDF triples

By using D2R to map the relational database to RDF we can expose the data via a SPARQL end point.

Through the use of SPARQL, we can query the data using a variety of constraints that cannot be easily expressed through the Programmes web interface. We are also able to semantically connect to external data sources such as DBpedia to provide extra information that is not present in our dataset, such as date and place of birth of cast members.

Page 37: BBC Programmes Ontology XTech2008

APML

Programmes pay attention to artists and topics

bbc.co.uk/programmes/:id/apml

flickr.com/photos/jonasb/1429721252/

Programmes can be thought of as paying attention to music artists and topic. For a while now we’ve been feeding the artist data into Last.fm.

The plan is to expose this and data about topics (people, places, time periods and subjects) as APML.

Page 38: BBC Programmes Ontology XTech2008

XMPP Pubsub

Allows a person or application to publish information so that an event notification (with or without payload) is broadcast to all authorized subscribers

http://flickr.com/photos/kaelr/2078003992/

We’re also investigating the use of XMPP - as a way of pushing epg data to interested devices, both programme metadata and 'on now' notifications

Of course this does mean that you need a pub-sub client i.e. one that can handle the XML output. Of which there aren’t very many!

Page 39: BBC Programmes Ontology XTech2008

XMPP Publish Message

If you do get your hands on a client this is what you get.

At the start of every radio broadcast XMPP publishes metadata about that show to its station's node, wrapped in an Atom Entry. For your Linked Data entertainment it's also serialised as Turtle RDF conforming to the Programmes Ontology.

If you would like to find out more have a look at the BBC’s Radio Labs blog.

Page 40: BBC Programmes Ontology XTech2008

XMPP Pubsub

Extra bonus demo... [last minute hacking from Patrick]

http://flickr.com/photos/kaelr/2078003992/

Programmes ontology over XMPP notifications via Growl

Page 41: BBC Programmes Ontology XTech2008

More of the graph

Music, events, people, topics and more

What I’ve been talking about today is the work on Programmes - we are also working on exposing music information in a similar fashion. One page for every artist the BBC plays.

And Ben Smith, who is also talking at XTech, is working on the user end of the graph.

That leaves Topics - which will be connected to Programmes next month. And events - live events - which is a little way off.

And - we’re in the process of adding: - World Service- The back catalogue (75 years of data)- Local Radio

Page 42: BBC Programmes Ontology XTech2008

Thanks to

Michael Smethurst Patrick Sinclair

Yves Raimond Matt Wood

Paul Clifford Duncan Robertson

Jamie Tetlow Rija Menage

Steve Butler

Finally we would like to thank those that have done most of the work.

And if you are interested in joining us there we’re currently hiring software engineers.

Page 43: BBC Programmes Ontology XTech2008

Questions?