spire news

69
Spire News Joel Sachs [email protected]

Upload: mufutau-allen

Post on 30-Dec-2015

32 views

Category:

Documents


1 download

DESCRIPTION

Spire News. Joel Sachs [email protected]. Semantic Web Tools. UMD MIND SWAP. Semantic CAIN Ontology Development Dissemination. Spire Semantic Prototypes In Ecoinformaics. UMBC Ebiquity. Infrastructure. UC Davis ICE. Agents Information Retrieval. NBII. Ontology of - PowerPoint PPT Presentation

TRANSCRIPT

Spire News

Joel Sachs

[email protected]

Spire Semantic Prototypes In

Ecoinformaics

UMBCEbiquityUMBC

Ebiquity

UMD MIND SWAP

UMD MIND SWAP

NASAGSFCNASAGSFC

RMBLPeace RMBLPeace

UC DavisICE

UC DavisICE

NBIINBII

Semantic Web Tools

Agents

Information Retrieval

Invasive Species Forecasting SystemRemote Sensing Data

Food Webs

Semantic CAINOntology DevelopmentDissemination

Prototype applications

Infrastructure

Ontology of Ecological Interaction

Overview of Talk

• What (and why) is the semantic web?– History– The tragic legacy of ontologies– Hope for the future

• Some Spire achievements– Elvis, Ethan, Swoogle, Tripleshop, RDF123

• Semantic Eco-blogging– Spotter, Splickr, Fieldmarking– Bioblitzes

• Linked Data– Why? How?

• A tiny data browsing demo

Semantic Web?

• The Semantic Web arose out of a confluence of 3 communities.– Hypertext; AI; Electronic publishing

• The AI component achieved early dominance.– Knowledge representation; Ontologies; First order logic, etc.

• This was exciting for some, and confounding for others.

The next 3 slides are from “The Suggested Upper Merged Ontology (SUMO) at Age 7: Progress and Promise”, by Adam Pease

High Level Distinctions

The first fundamental distinction is that between ‘Physical’ (things which have a position in space/time) and ‘Abstract’ (things which don’t)

Entity

Physical Abstract

High Level Distinctions

Partition of ‘Physical’ into ‘Objects’ and ‘Processes’

Physical

Object Process

ProcessesDualObjectProcess Substituting Transaction Comparing Attaching Detaching Combining SeparatingInternalChange BiologicalProcess QuantityChange Damaging ChemicalProcess SurfaceChange Creation StateChangeShapeChange

IntentionalProcess IntentionalPsychologicalProcess RecreationOrExercise OrganizationalProcess Guiding Keeping Maintaining Repairing Poking ContentDevelopment Making Searching SocialInteraction ManeuverMotion BodyMotion DirectionChange Transfer Transportation Radiating

Interoperability through Simplicity

Spire So far: Ontologies• “The Big Experiment”

– A collection of linked ontologies enabling highly detailed descriptions of ecological interaction.

– Supports WoW - Webs on the Web

• SpireEcoConcepts

– Medium size. Used for expressing trophic links and related information, including bibliographic info on studies.

• ETHAN

– Evolutionary trees and natural history.

– Huge.

• Observation ontology

– For semantic eco-blogging.

– Tiny.

• Invasives ontology

– Lightweight and extensible in the most trivial of manners.

ETHAN Engineering

• The semantics behind an arbitrary relation can often be expressed using the rdfs:subClassOf relation, as opposed to rdf:property. Doing so has a number of benefits:

• It seems to be more computationally efficient. (We have no hard evidence for this, yet.)

• It makes it easy to introduce a new concept, especially in a distributed manner. (See our discussion of conservation information below.)

• It leads to fewer disagreements among scientists and, therefore, greater chance of ontology adoption (We have anecdotal evidence for this.)

A Brief Tour of Some Relevant Ontologies

• http://spire.umbc.edu/ontologies/InvasivesOntology.owl

• http://spire.umbc.edu/ontologies/lists/

• http://spire.umbc.edu/ontologies/lists/USFWSInjuriousAnimals.owl

• http://spire.umbc.edu/ontologies/lists/Cal-IPC.owl

Spire So far …

• ELVIS– A suite of tools motivated by the belief that food web

structure plays a role in determining the success or failure of potential species invasions.

– Species List Constructor.• Give a location, get a species list.

– Food Web Constructor.• Give a species list, get a food web.

– Evidence Provider.• Drill down on a predicted trophic link, and see evidence for and

against the existence of that link.

• This illustrates our general attitude of moving away from “answer providers” to “evidence providers”.

Bacteria

Microprotozoa

Amphithoe longimana

Caprella penantis

Cymadusa compta

Lembos rectangularis

Batea catharinensis

Ostracoda

Melanitta

Tadorna tadorna

ELVIS: Ecosystem Localization, Visualization, and Information System

Oreochromis niloticusNile tilapia

??

. . .

Species list constructor

Food web constructor

Food Web ConstructorPredict food web links using database and taxonomic reasoning.

In a new estuary, Nile Tilapia could compete with ostracods (green) to eat algae. Predators (red) and prey (blue) of ostracods may be affected

Food Web Constructor generates possible links

Evidence provider gives details

So far: Integration

• Swoogle– Google for the semantic web.– Crawls and indexes RDF documents.– Computes metadata, including “ontoRank”.

• Tripleshop– A SPARQL query engine.

• Leave out the FROM clause.• Data comes from Swoogle

– Semi-automatic dataset constructor– Our main platform for integration

Google has made us smarter

But what about our agents?

tell

register

Agents still have a very minimal understanding of text and images.

By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size.

80 ontologies were found that had these three terms

Let’s look at this one

Basic MetadatahasDateDiscovered:  2005-01-17 hasDatePing:  2006-03-21 hasPingState:  PingModified type:  SemanticWebDocument isEmbedded:  false hasGrammar:  RDFXML hasParseState:  ParseSuccess hasDateLastmodified:  2005-04-29 hasDateCache:  2006-03-21 hasEncoding:  ISO-8859-1 hasLength:  18K hasCntTriple:  311.00 hasOntoRatio:  0.98 hasCntSwt:  94.00 hasCntSwtDef:  72.00 hasCntInstance:  8.00

These are the namespaces this ontology uses. Clicking on one

shows all of the documents using the namespace.

All of this is available in RDF form for the

agents among us.

Here’s what the agent sees. Note the swoogle and wob (web of belief) ontologies.

10K terms associatged with “person”! Ordered by use.

Let’s look at foaf:Person’s metadata

UMBC Triple Shop

• http://sparql.cs.umbc.edu/tripleshop2• Online SPARQL RDF query processing based

on HP’s Jena and Joseki with several interesting features• Selectable level of inference over model• Automatically finds SWDs for give queries using Swoogle

backend database– Provide dataset creation wizard– Dataset can be stored on our server or downloaded– Tag, share and search over saved datasets

Who knows Anupam Joshi?Show me their names, email address and pictures

The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles

No FROM clause!

Constraints on wherethe data comes from

Swoogle found 292 RDF data files that appear relevant to answering our query

Let’s save the dataset before we use it

And tag it so we and others can find it more easily.

He has many friends!

Semantic Eco-Blogging: Some Background

1/3 of all new web content is user generated• Scientific data is increasingly a part of Web 2.0/3.0• How easy can we make semantic annotation?

Climate change drives ecological change• Alters species distribution

Wuethrich, B. How Climate Change Alters Rhythms of the Wild Bernice Wuethrich (4 February 2000) Science 287 (5454), 793.

• Drives evolutionBradshaw, W. E., and Holzapfel, C. M. 2001. Genetic shift in

photoperiodic response correlated with global warming. Proc. Nat. Acad Sci. USA. 98:14509-14511

Semantic Eco-blogging.• Eco-blogs are popping up all over the place.

– Bloggers are both amateur nature-lovers, and working biologists.

• “On April 24 in Washington DC, I saw a leopard slug. Here’s a picture.”

• These observations are, potentially, an important part of the ecological record.– “What was the earliest sighting of a robin hatching?”– “What was the Northernmost sighting of the Asian Longhorn

Beetle?”– Etc.

• System concept: global human sensor net.• SPOTTER

– A firefox plugin for creating OWL from field observations.– Spotter map lets you see all “spots”– Being tested at http://ebiquity.umbc.edu/fieldmarking/ and other

blogs near you.

You can download spotter at http://spire.umbc.edu/spotterTry it out, and then view your observations on the Spotter map:

http://spire.umbc.edu/spotter/spotterMap.php

The Blogger Bioblitz

• Bioblitz: a 24 hour inventory of all living things in a given area.– Dual aims of establishing degree of biodiversity and popularizing science.

• The recent Blogger bioblitz.– 17 bloggers from:

– Sitka, Alaska; Greece; Toronto; Santa Cruz; DC; etc.

• 1200 observations.• Tripleshop was able, by combining the observations with background data,

to respond to a number of ad-hoc queries.– E.g. “Show all observations of species listed as being either invasive or injurious.”

resulted in 47 hits.

Splickr

• Flickr has been handling geotagged pictures since August 2006.

• Roughly 30 million geotagged photos in the first year.– 2.1 million so far this month.

• Splickr is a Flickr/Yahoo maps mashup that makes it easy to find pictures of particular species in a given area.– All data gets represented in OWL.

RDF123

• A flexible and graphical means to map from spreadsheets to RDF

• The mapping is stored as an OWL file• An RDF123 webservice takes a Google spreadsheet

and a map as input, outputs RDF.• So you can do all your work, collaboratively, in the

spreadsheet, and you never have to export to RDF!

Taxonomy for biologists is a little bit tricky. Columns A-F (Phylum, Class, Order, Family, Genus, Species) has a rule: i. If there is a value for Column F (Species), then the value of Columns E (Genus) and F should be joined with an underscore, and mapped to ob#hasTaxon. ii. If there is no value for Column F, then the rightmost column, amongst columns A-E, that has a value gets mapped to ob#hasTaxon.

Eco-Blogging: Next steps

• Make every bioblitz a blogger bioblitz– Use RDF123

– Rock Creek, MD and LA county coming up

• Drop-down invasives lists in Splickr– E.g. find all photos in Europe of species on the “Worst Invaders of

Europe” list

• Mining other sources– E.g. birdwatcher listservs

• Making semantic eco-blogging easier– We will continue to work with children.

• Aggressively pursue a Linked Data approach.

A Few Words on Linked Data

• “Linked Data on the Web” is a collection of best practices for publishing data on the semantic web.

– Distinguishing between Information and non-information resources.– 303 redirects and content negotiation.– HTTP URIs for everything on Earth.– owl:sameAs

• It is also, to an extent, a rebranding of the semantic web.– Much more emphasis on links amongst datasets.– Much less emphasis on formal semantics.

• Linked data can be browsed, in much the same way we browse the traditional web.

– So we can find data either by searching for it (with Swoogle/Tripleshop) or by surfing our way to it.

Some Context

• Before search engines, we found things on the web by browsing.

• Browsing still has its charms.– And benefits.

• On the semantic web:– One way to build a dataset: Swoogle/Tripleshop– Another: data browsing …

• A “thing-centric” approach.

Other Thoughts and Deeds

• Web 2.0/3.0 is designed for accommodating a multiplicity of perspectives and worldviews.– Neutrality not required

• Spotter as a general purpose annotation tool?

• Experiment in integrating water quality and invasive species occurrence data.– EPA, USGS, GBIF, EEA(?)– SODA

• Pacific Rim data

• New ELVIS: Extinction patterns in Sierra Nevada lakes.– Invasive trout are causing local extinctions.– We can compare with model predictions made by our

PEaCE lab partners.

GBIF Scenarios

Check out the 3 climate change scenarios (land use, health, and agriculture) from the presentation by Hannu Saarenmaa and Jeremy Kerr at

http://circa.gbif.net/Public/irc/gbif/ict/library?l=/presentations/gbif_scenarios_ppt/_EN_6.0_&a=d

8 Step Scenario Development Processi. Decide on selected species.ii. Set criteria for data. (spans 30 years, georeferenced, etc.)iii. Investigate data availability. (GBIF, GAP, etc.)iv. Improve quality and access to data.v. Choose modeling approach. (Eg. Ecological Niche Modeling with

Open Modeller Framework.)vi. Acquire and transform climate change and environment data. vii. Execute models.viii. Present the results.

Could be build a toolkit to ease the “data” steps, i.e. steps 2, 3, 4, 6

Acknowledgements

Cynthia Parr

Andriy Parafiynyk

Lushan Han

Rong Pan

Li Ding

David Wang

Tim Finn

NSF

NBII

Some References

For a walk-through of Spotter, Tripleshop, Elvis, or our other tools, email [email protected]

Two relevant papers from our research group:

Adding Semantics to Social Websites for Citizen Science http://ebiquity.umbc.edu/paper/html/id/365/Adding-Semantics-to-Social-Websites-for-Citizen-Science

Using the Semantic Web to Support Ecoinformatics, http://ebiquity.umbc.edu/paper/html/id/319/Using-the-Semantic-Web-to-Support-Ecoinformatics

An introduction to linked data: How to Publish Linked Data on the Web, http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/