fabrizio gagliardi emea & latam director technical computing msr external research microsoft...
Post on 20-Dec-2015
215 views
TRANSCRIPT
Semantic Application forDigital Repositories
Fabrizio GagliardiEMEA & LATAM Director
Technical Computing MSR External ResearchMicrosoft Corporation
• Advancement of Science• Global Collaboration
• Technology Excellence• Interoperability
Microsoft Research’s Commitment to Science
Putting computing into science…Applying Microsoft products and research technologies to advance the scientific research and engineering innovation process
Putting science into computing…Ensuring that research community requirements are factored into future versions of Microsoft software
Scholarly Communications: Project Overview• Current or Completed Projects
o Cornell – arXiv.org + Word 2007 (and repository interoperability via SWORD) o MIT / Broad Institute – Authoring (Word 2007) + data for research reproducibility o MSR – CMT++ interoperability with data + metadata transfer/exchange (conference management tool
enhancements) o LiveLabs – eJournal publishing online service (community publishing tool)o UC San Diego / PLoS – Semantic mark-up of scholarly articles (+ submission)o Chem4Word with Office & Cambridge University – Create add-in to Word 2007 to facilitate drawing of
chemical compounds and equations o Johns Hopkins University – Digital Archive for Astronomy/Astrophysics data (storage, preservation and
access) o Planets Project / EU (with MSR – Cambridge) OpenXML and file format preservation + interoperabilityo eChemistry Project (Cornell, Penn State, Indiana, Cambridge, Southampton) – ORE exemplar: access
to compound chemical info objects (cross-repository access to open chemistry data)o British Library – Researcher Information Centre (RIC) online workflow tool for scientists and
researcherso Creative Commons Add-in for Office 2007 – evolving the Word 2003 efforto University of Southampton (UK) – Port ePrints Repository Software for installation on the Windows
platformo University of Manchester / “MyExperiment” Project – social networking for scientists o ORE Acceleration Project (OAI – Object Reuse & Exchange) – Alpha spec developmento Indiana University – Toolbox for Social Networking (SRT)o UK National Archives – Virtual PC / Emulation of legacy systems to facilitate preservationo National Library of Medicine / NCBI – “PubMed Int’l” UK version of PubMed + NLM DTD
• Pipelineo DRIVER 2 (EU) – Infrastructure integration of across a network of European research repositories
Research Output Repository Platform
Goals• A platform for building services and tools for research
output repositories• Papers, Videos, Presentations, Lectures,
References, Data, Code, etc.• Relationships between stored entities
• Enable a tools and services ecosystem for “research output” repositories on MS technologies
Execution• Utilizing OAI-ORE, SWORD, and other
community protocols• In development, deployment within MSR in early Q4• Beta release to the community in late Q4• Built on SQL Server 2008 + Entity Framework
• Using WPF and Silverlight for UI
Research
output reposito
ry platfor
m
Goals• Create a platform for building
“research output” repositories• Engage with the digital library and
scholarly communications community
• Become the “research output” repository for MSR (RMCr project)
– Papers, Videos, Presentations, Lectures, References, Data, Code, etc.
• Support an ecosystem of services and tools
• Available to the community for free (we are still considering the open source route)
• Build an easy-to-install collection of basic services and tools
Non-goals• A generic platform for asset
management• Support the lifecycle of publications• Compete with existing repository
solutions
Research Output Repository Platform
Services/tools
Microsoft.Famulus.Framework
Microsoft.Famulus.Core(Based on the Entity Framework Model + extensions)
SQL Server 2008, MS data storage technologies, Entity Framework runtime
Researchers manage their personal research entities(data, citations, documents, workflows, etc.)
Entities + Relationships can be synched to cloud storage so that they are:
- Always Available - Sharable - Mixable - Harvestable
An Ecosystem of Research RepositoriesSupport of harvesting & federation to/from Institutional Repositories
- arXiv.org - DSpace - ePrints - Fedora - etc.
• Limit Tech Preview release due June 2008• Public Beta targeted for Aug/Sept 2008
For more details– Contact:
• Alex Wade (Program Manager) / [email protected]
– Community Forum:• http://community.research.microsoft.com/forums/90.aspx
Current Project Status
eScience and Semantic Computing meet the Cloud
The cyberinfrastructure for the next generation of researchers
9
• Expect scientific research environments will follow similar trends to the commercial sector– Leverage computing and data storage in the cloud– Scientists already experimenting with Amazon S3 and EC2
services, with mixed results;• For many of the same reasons
– Siloed research teams, no resource sharing across labs– High storage costs– Low resource utilization– Excess capacity– High costs of reliably keeping machines up-to-date– Little support for developers, system operators
The Future: Software plus Services for Science?
• Collective intelligence– If last.fm can recommend what song to broadcast to me
based on what my friends are listening to, why cannot the cyberinfrastructure of the future recommend articles of potential interest based on what the experts in the field that I respect are reading?
– Already examples emerging but the process is manual (Connotea, BioMedCentral Faculty of 1000 ...)
• Automatic correlation of scientific data• Smart composition of services and functionality• Cloud computing to aggregate, process, analyze and
visualize data
A smart cyberinfrastructure
• Important/key considerations– Formats or “well-known” representations
of data/information– Pervasive access protocols are key (e.g.
HTTP)– Data/information is uniquely identified
(e.g. URIs)– Links/associations between
data/information
• Data/information is inter-connected through machine-interpretable information (e.g. paper X is about star Y)
• Social networks are a special case of ‘data networks’
A world where all data is linked…
Attribution: Richard Cyganiak
…and stored/processed/analyzed in the cloud
scholarly communications
domain-specific services
The Microsoft Technical Computing mission to reduce time to scientific insights is exemplified by the June 13, 2007 release of a set of four free software tools designed to advance AIDS vaccine research. The code for the tools is available now via CodePlex, an online portal created by Microsoft in 2006 to foster collaborative software development projects and host shared source code. Microsoft researchers hope that the tools will help the worldwide scientific community take new strides toward an AIDS vaccine. See more.
instant messaging
identity
document store
blogs &social networking
notification
searchbooks
citations
visualization and analysis services
storage/data services
computeservices
virtualization
Project management
Reference management
knowledge management
knowledge discovery
Vision of Future ResearchEnvironment with bothSoftware + Services
• Thousand years ago – Experimental Science– Description of natural phenomena
• Last few hundred years – Theoretical Science– Newton’s Laws, Maxwell’s Equations…
• Last few decades – Computational Science– Simulation of complex phenomena
• Today – eScience or Data-centric Science– Unify theory, experiment, and simulation – Using data exploration and data mining
• Data captured by instruments• Data generated by simulations• Data generated by sensor networks
– Scientists overwhelmed with data– Computer Science and IT companies
have technologies that will help
(With thanks to Jim Gray)
Emergence of a New Research Paradigm?
2
22.
3
4
a
cG
a
a
Web users...• Generate content on the Web
– Blogs, wikis, podcasts, videocasts, etc.
• Form communities– Social networks, virtual worlds
• Interact, collaborate, share– Instant messaging, web forums,
content sites
• Consume information and services
– Search, annotate, syndicate
Scientists...• Annotate, share, discover data
– Custom, standalone tools
• Conferences, Journals– Publication process is long,
subscriptions, discoverability issues
• Collaborate on projects, exchange ideas
– Email, F2F meetings, video-conferences
• Use workflow tools to compose services
– Domain-specific services/tools
Today
Data and services can be easily composed
SensorMapFunctionality: Map navigationData: sensor-generated temperature, video camera feed, traffic feeds, etc.
A
D
CB
ETaverna Workflow
Compose services from the Web
Today…
storing computing
managing indexing
huge amountsof data
For example, Google and Microsoft both have copies of the Web for indexing purposes
Computers aregreat tools for
Tomorrow…
acquisition discovery aggregati
on
organization
correlation analysis
interpretation inference
We would likecomputers to also
help with theautomatic
of the world’sinformation
storing computing
managing indexing
huge amountsof data
Computers will stillbe great tools for
• Set of concepts and technologies– Data modeling– Relationships– Ontologies– Machine learning (entity extraction)– Inference, reasoning– Data, information, knowledge…
What is Semantic Computing?
Data
Information
Knowledge
Intelligenc
e
WisdomCurrent technologies
Possibilities for innovation
• Term used to refer to the concept of “meaning”• The linguistics, AI, Natural Language Processing,
etc. communities have been working on “meaning” and ”knowledge” related technologies for decades
• Pragmatic approach to Semantic Computing– Emergence of a new breed of technologies to capture
meaning (RDF, OWL, etc.)– Combine with the pervasiveness of the Web
community technologies such as folksonomies …
Semantics
• The term is used to describe a set of technologies used to represent data, concepts, and their relationships– Become a buzzword like Web 2.0
• Prefer to use the term “Semantic Computing” which is about modeling data in ways that can be automatically processed by computers
A word about the “Semantic Web”
• Some efforts are driven by the traditional “knowledge engineering” community– Engaged in building well-controlled ontologies– Important for domain-specific vocabularies with data formats
and relationships specific to a community– Model does not easily scale to the Internet
• Some efforts are driven by the Web 2.0 community– Focus on the pervasiveness of Web protocols/standards– Emphasis on microformats (small, flexible, embeddable
structures)– Exploit evolving and ever-expanding vocabularies such as
folksonomies and tag clouds
Semantic Computing