national center for supercomputing applications towards a rich-context participatory...
TRANSCRIPT
National Center for Supercomputing Applications
Towards A Rich-Context Participatory Cyberenvironment
Yong Liu
Robert E. McGrath
James D. MyersJoe Futrelle
{yongliu, mcgrath, jimmyers, futrelle}@ncsa.uiuc.edu
GCE 2007 Workshop, Nov.11-12, 2007
Supercomputing Conference 2007
National Center for Supercomputing Applications
Outline• Motivation• Web 2.0 and Where 2.0• Definition of Participatory Cyberenvironment• Cyberenvironment Technology Stack
– CyberCollaboratory Portal– Approach and Goals of a Rich-context Participatory
Cyberenvironment
• The Role of Contexts– Social, Geospatial, Provenance, Conceptual Contexts– Science Drivers and Our Work So Far and Next Steps– Two Examples on How These Contexts Can Play Together
• Concluding Remarks• Acknowledgements
National Center for Supercomputing Applications
Motivation
• Increasingly Collaborative Scientific Efforts– Cross-disciplines, laboratories, observatories and organizations
• Heterogeneous Scientific Resources– Sensors, software components, data/databases, networks,
computers
• Avoiding Data Silos– Most existing portals are creating data silos– Like to access a context-relevant knowledge network– Like to exchange information across application boundaries
(desktop vs. web-based, portal A vs. portal B)
• Promoting User Participation– Allow individual user innovation and contribution to community
cyberenvironment
National Center for Supercomputing Applications
Web 2.0 and Where 2.0
• Architecture of Participation– Software and Data (Mashup)– People (Social Networking,
Collaboration)
• Open, Light Weight (de-facto) Standards and Formats– RDF (Resource Definition
Framework)– Microformats– Variants of XML (such as KML,
obsKML etc.)
• “Where 2.0” highlights the importance of spatial context– It is estimated that over 80% all
information have geospatial components
The mind-map constructed by Markus Angermeier on November 11, 2005
National Center for Supercomputing Applications
Participatory Cyberenvironment
• A Web 2.0 and Semantic Web approach for Cyberinfrastructure• An architecture of participation for scientific activity
– This refers to both human and software/data participation• Human-to-human collaboration and social networking (using blog, message
board etc.) and user-generated scientific artifacts (e.g. workflow)• Software participation means mashup
– API-based and Content/Data-based
• An open service platform – Reusable and standard-compliant service components/interface must be
built and presented for third-party application use/reuse• E.g. NCSA CyberIntegrator ( a desktop Java-based workflow application can use
the CyberCollaboratory open service API (SOAP, or JSON) to query user/group affiliation and publish workflow template to the CyberCollaboratory’s document library
• An integration and presentation platform for knowledge network– Knowledge network about sensor, data, model, workflow, people,
publication, computing resources etc• Dynamically generated and proactively presented in the portal
– Exchanging information across application boundaries
Cyberenvironment Technology Stack
CyberIntegrator
Workflow Development and Publication
Event-triggered Workflow Execution
Context (social, geospatial, provenance, ontologies, …), metadata fabric
Tupelo semantic content management middleware
Workflow/Model
Registries and Data Storage
Data/Documents/Content
CyberCollaboratory
Portal/Group Workspace
External Data Services
High-End Visualization
High-res ApplicationsVisual Orchestration
Auto-stereo Visualization
External sensor networks and data stores
GIS
Workspace mgmt.
Visualization, Graphing, Reporting
Single Sign-On SecurityModeling Analysis/
Translation
Computational Resources
ScienceApplications
Services Clouds
Infrastructure
Note: Boxes with yellow background are this talk’s focus
National Center for Supercomputing Applications
CyberCollaboratory Portal• Since its inception in 2004, over 400
users have registered– Built on top of open source portal framework
Liferay with additions/changes/integrations using NCSA technologies
• Group Spaces• Document/Image Library, discussion forums,
announcements, wiki, blog, RSS reader, etc. …
• Production/Pilot Deployment in multiple projects
– NSF-funded WATERS (WATer and Environmental Research Systems) Network Project office (in production-mode since 2004)
– NSF-funded multiple WATERS Testbed projects
– NCSA Infectious disease informatics project– NSF-funded Hydro Synthesis Project– EPA-funded Small Water Public Systems
Project– NCASSR-funded Palantir collaborative
computer security investigation Portal– Office of Naval Research (ONR)-funded
Education Project
http://www.linux.com/feature/118675August 23, 2007
Evolving Towards a Rich-context Participatory Cyberenvironment
• Hybrid Approach– Leverage Web 2.0 pattern/technologies
• Architecture of participation
– Leverage Semantic Web technology (RDF)• Through the use of NCSA Tupelo as the semantic
content repository middleware
• Goals– Break data silos created by different portals, or non-
web-based applications
– Enable user participation and content-based mashup
National Center for Supercomputing Applications
The Role of Contexts
• Context: – “the parts of a discourse that surround a word or
passage and can throw light on its meaning” • From Merriam-Webster Online Dictionary
• Semantic Contexts for Cyberenvironment– Social Context (Who ?)– Geospatial Context (Where ?)– Causal Context (Why ? and How? )– Conceptual Context (What ?)
• Role: the above four areas build the foundation so that heterogeneous tools/portals can have a shared view and the ability to interact
National Center for Supercomputing Applications
Social Context (Who ?)• What’s It About?
– People, Group, Community, Virtual Organization– Who am I, Who are my friends and/or collaborators, team members– Social Networking (People-to-People)
• How Does It Work ?– RDF-based: FOAF (Friend-of-A-Friend)– Microformats-based: XFN (Xhtml Friends Network), hCard
• What Are the Scientific Use Case Drivers?– Environmental Observatories involve lots of researchers/stakeholders from diverse
disciplines nationally and internationally• Collaboration on complementary expertise• Find out who works on what and has what kind of expertise• Filtering information
– Research in social network area has shown that people will more likely to respond to collaboration requests if you know them (directly or indirectly through the person-to-person network)
– Complex coupled human-nature system science research calls for “Participatory Science”
National Center for Supercomputing Applications
Social Context (contd.)• What Have We Done So far?
– The key is to promote user participation to help build the virtual community in the CyberCollaboratory
– Production Implementations• My Page, My Menu, My Groups
navigation• Streamlined group creation
– Group template • Email invitation to both registered and non-
registered users to join group• Harvesting emails and associated
attachments into message boards and document library from mailinglist to allow full-text search
– Pilot Implementations• Social Network Analysis/Visualization• Recommender System
– People reads/uses this paper/tool also reads/uses other papers/tools
National Center for Supercomputing Applications
Social Context (contd.)• What Are Our Next Steps?
– Expose group/personal page information as microformats (hCard)
• Yahoo! Local etc. can find such group information
– Learn lessons from and exchange ideas with similar efforts in other scientific collaborative portals
• MyExperiment.org• OurSpaces.net
– Build dynamic social network graph
– Help build up the momentum of “social grid” (from Tony Hey, Microsoft Research)
National Center for Supercomputing Applications
Geospatial Context (Where ?)• What’s It About?
– Location, Location, Location– Point, line, polygon, …
• Intersection, overlap, coverage …..
– The advent of GeoSpatial Web or GeoWeb
• How Does It Work?– Lightweight formats and APIs/Services facilitate geo-referenced information
representation, exchange and mashup• GeoRSS, GeoURL, KML, Geo Microformat, GeoJSON, W3C Geo• GeoIQ, Google Map API, Microsoft Virtual Earth Visual SDK
– Easy-to-use Virtual Globe software puts earth metaphor right in front of users
• 3D/2D Geo-centric browsers allow non-GIS specialist to explore geo-referenced information
– Microsoft Virtual Earth, Google Earth, NASA WorldWind
– Standardized efforts promote geospatial services/data interoperability• OGC (Open Geospatial Consortium) geospatial standards
National Center for Supercomputing Applications
GeoSpatial Context (contd.)• What are the Scientific Use
Case Drivers?– Environmental Observatory data
needs to be interpreted within a geospatial context to enable holistic study of the system
• Common location components are important integration vehicle to link diverse information across different domains
• Eg. Digital Watershed data integration requires explicit geospatial context
– Spatial analysis in computational modeling of complex watershed science study also requires geo-referenced data
Urban Watershed
Hydrology
Meteorology/Hydrometerology
Social Science/Economics
Geology/Hydrogeology
Biology/Pathobiology
Water Chemistry
HumanPsychology/
Events
Sensor Networks/Engineered Infrastructure
Public Policy/Water as Commodity
A Complex System
With Many Interactions/Feedback
National Center for Supercomputing Applications
GeoSpatial Context (contd.)
• What Have We Done So Far?– Pilot Implementation
• Google Map-based sensor network map portlet
– Allow user to subscribe to both raw and derived data streams from the sensors
• What are Our Next Steps?– Incorporate geo-location information
into user profile– Build geo-social network
• Group formation based on geographical boundary
– Virtual observatory and digital watershed geo-referenced data integration using OGC-standards
National Center for Supercomputing Applications
Causal Context (How? And Why?)
• What’s It About?– Also known as Provenance– Describes the causal relationships and history
• among artifacts (e.g., data, people, instruments/sensors, publications, etc.) and
• events (e.g., processing steps, accession, custody) in a complex work process
– Useful for experiment validation and reuse of workflow, data products etc.
• How Does It Work?– RDF Triples – Open Provenance Model (OPM)
National Center for Supercomputing Applications
Causal Context (contd.)
• What Are the Scientific Use Case Drivers?– Researchers are using more data from Environmental
observatories and from others where they won't otherwise know the history
– More pieces of the data processing pipeline/workflowwill be changing and will need to be tracked
– Interdisciplinary/systems-oriented projects such as the watershed-scale human-nature interaction study will have more moving part
– Dynamic generation of knowledge network requires provenance data for events, workflow etc. across application boundaries
National Center for Supercomputing Applications
Causal Context (contd.)• What Have We Done So Far?
– Production Implementations • User activities/events in the CyerCollaboratory have been
harvested into RDF triple store through Tupelo middleware• documents, images, blog access/upoad/download• group mgmt (creation, user add/remove/invite)
– Pilot Implementations• Provenance tracking in CyberIntegrator (workflow)• Knowledge network creation based on provenance
• What Are Our Next Steps?– Ubiquitous provenance tracking cross portal boundaries and non-web-
based tools– Data QA/QC and workflow provenance are the major efforts at this moment
• Work with environmental observatory community on various use cases
– Geo-referenced provenance map for visualization of sensor data processing pipeline
National Center for Supercomputing Applications
Conceptual Context (What ?)• What’s It About?
– Mainly for domain-specific semantic concept relationships, i.e., ontologies• How Does It Work?
– Community consensus• Control vocabulary
– Folksonomy• User-generated metadata, tagging
– Hybrid approach• Allow user to add new control vocabulary to existing ontology
• What Are the Scientific User Case Drivers?– Ontology driven data search/integration has been recognized in many
scientific domains (including environmental observatory community)• E.g.:ODM (Observation Data Model)
– CUAHSI: Consortium of Universities for the Advancement of Hydrological Sciences, Inc.
• Semantic mediator to reconcile different ways of describing data– This is usually a community effort
Conceptual Context (contd.)
• What Have We Done So Far?– Production Implementation
• CyberCollaboratory allows user tagging on many tools, such as blogging, document library, wiki etc.
– Pilot Implementation• CyberIntegrator starts to build workflow ontology/tagging and
allow such information to be exposed to Portal user for filtering and searching workflow templates
• What Are Our Next Steps?– Leveraging environmental observatory ontology efforts
(such as CUAHSI ODM) for data integration and dissemination
– Establishing a set of control vocabulary for cyberenvironment development needs so that different tools can use consistent representation
National Center for Supercomputing Applications
How Would These Contexts Actually Play Together?
• Independently-produced context metadata in different portlets, portals, or desktop tools can be merged using RDF triples using Tupelo
• Allow non-invasive sharing data/information cross application boundaries without using same database schema– Portal A vs. Portal B– Desktop Application vs. Web-based Application
• Allow generation of knowledge network– Web-scale data integration and presentation
• Two examples– A production implementation with event/provenance capture and content-
based mashup• Mainly uses provenance context cross portal boundaries and desktop-web
boundary– An End-to-End pilot implementation which uses all contexts we discussed
so far
National Center for Supercomputing Applications
Tupelo Semantic Content Repository Middleware Fabric
Relational Database: MySQL
RDF StoreSesame
User1 User2
CyberCollaboratory Portal Instance 1
Portal Event Listener(Add/Update/Delete/Read)
Event/Provenance RDF Triples
Harvesting Remixing &Presenting
Example 1: Event/Provenance Capture & Content-based Mashup
Relational Database: MySQL
CyberCollaboratory Portal Instance 2
Portal Event Listener(Add/Update/Delete/Read)
CyberIntegrator
Provenance
National Center for Supercomputing Applications
Example 2: A Pilot End-to-End Implementation Using Participatory Cyberenvironment
• Environmental Observatory Use Case
– Sensor data anomaly detection in Corpus Christi Bay of Texas
– A group was created for this testbed project (social context)
– A google-map-based sensor map portlet to allow user to subscribe to sensor data stream (both raw and derived) (geospatial context, API-based mashup)
– User can monitor the sensor data and invoke another workflow in a different observatory from a proactively generated knowledge network which presents relevant sensors, workflows, publications, and people (provenance, ontologies context, content-based mashup, knowledge network)
• Individual researcher uses and contributes back tocommunity infrastructures
– Participatory Science needs/uses Participatory Cyberenvironment !
Individual User’s Desktop Dashboard Alert
Workflow remote executionwith modification
NewDerivedData Stream
National Center for Supercomputing Applications
Concluding Remarks
• Paradigm shifting in science are driving a need for increased sharing of contents across applications/systems
• Our research on four contexts (social, geospatial, causal, and conceptual) helps us take the Web 2.0/Semantic Web approach for CyberCollaboratory portal and other tools to enable such sharing
• Semantic middleware Tupelo can manage these contexts – Make a standard portal such as CyberCollaboratory more context-sensitive– Make cross-application boundaries content-based mashup possible
• Initial experiences with using these contexts have been positive
Concluding Remarks (contd.)
• Participatory cyberenvironments enable individual researcher to directly customize and then share their enhancements to community infrastructures– Participatory Science!
• Further research & development are being made at NCSA towards the full realization of the vision of a participatory cyberenvironment
National Center for Supercomputing Applications
Acknowledgements
• Teams:– NCSA ECID (Environmental CI Demo) team – Corpus Christi Bay WATERS Testbed team – WATERS Project Office – NCSA TRECC Year-8 Project Team
• Funding sources:– NSF grants BES-0414259, BES-0533513, and
SCI-0525308– Office of Naval Research grant N00014-04-1-
0437