1 web 2.0 and grids for scholarly research peking university july 27 2006 geoffrey fox computer...

16
1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http:// www.infomall.org

Upload: abigayle-pearson

Post on 18-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

11

Web 2.0 andGrids for Scholarly Research

Peking UniversityJuly 27 2006

Geoffrey Fox

Computer Science, Informatics, PhysicsPervasive Technology Laboratories

Indiana University Bloomington IN 47401

[email protected]://www.infomall.org

Page 2: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

22

Application Drivers Science Informatics for document analysis as in case of

chemistry which has very precise naming rules for compounds that allow accurate searches in documents• Suggesting how to tag scientific documents either

when writing it or after the fact Journal web site of the future as illustrated by Nature

building social bookmarking tool Connotea Conference support tools as can benefit from features

needed by journals This gives document enhanced Cyberinfrastructure

(CI)

Page 3: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

Community Tools e-mail and list-serves are oldest and best used Kazaa, Instant Messengers, Skype, Napster, BitTorrent for P2P

Collaboration – text, audio-video conferencing, files del.icio.us, Connotea, Citeulike, Bibsonomy, Biolicious manage

shared bookmarks MySpace, Bebo, Hotornot, Facebook, or similar sites allow you to

create (upload) community resources and share them; Friendster, LinkedIn create networks• http://en.wikipedia.org/wiki/List_of_social_networking_websites

Writely, Wikis and Blogs are powerful specialized shared document systems

ConferenceXP and WebEx share general applications Google Scholar tells you who has cited your papers while

publisher sites tell you about co-authors• Windows Live Academic Search has similar goals

Note sharing resources creates (implicit) communities• Social network tools study graphs to both define communities

and extract their properties

Page 4: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

How to use Web2.0 Community tools in CI Nearly all of them have “profiles”, “users”, “groups”, “friends”

etc.• Need to integrate these

P2P File Sharing: Maybe this is useful for sharing files in research groups (virtual organizations)• Will modify Maze http://maze.pku.edu.cn – popular Chinese social P2P

system with 2.5 million users BitTorrent: more popular than FTP – why not use for higher

performance fault tolerant cached file sharing? MySpace etc.: Could consider MyGridSpace or MyScienceSpace

that supports a similar document sharing model with users uploading pictures, papers and even data/services of interest• Could include uploaded material in workflows

Social Bookmarking and linking: discuss later• http://gf6.ucs.indiana.edu:48990/SemanticResearchGrid/

Page 5: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

5

ExistingUser Interface

Document-enhanced Cyberinfrastructure

etc.

Google Scholar

ManuscriptCentral

Science.gov

Windows Live Academic Search

Citeseer

CMT Conference

Management

Existing Documentbased Research Tools

Web serviceWrappers

New Document-enhancedResearch Tools

Integration/EnhancementUser Interface

Community Tools

Generic Document Tools

MyResearchDatabase

Bibliographic Database

Export:RSS, BibtexEndnote etc.

CiteULike

Connotea

Del.icio.us

Bibsonomy

BioliciousPubChem

PubMed

TraditionalCyberinfrastructure

Page 6: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

Strategy Doesn’t seem useful to build the 251st community tool In fact a major barrier to use of existing tools is

• What happens when a better tool comes along and/or chosen tool disappears (unsupported/removed from Web)

So assume use existing tools but wrap them all as web services so can transfer information to new tools and integrate information between tools• Need some “glue” logic, a “unification” database and minimal user

interface Bookmarking tools: del.icio.us, Connotea, CiteULike (includes

plug-ins to major publisher sites) Document: Google Scholar, Windows Live, Citeseer tools,

OSCAR3 for Chemistry, Science.gov (later) Journals: Manuscript Central Conferences: CMT from Microsoft or ?

Page 7: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

77

Delicious Semantic Web/Grid http://del.icio.us purchased by Yahoo for ~$30M http://www.CiteULike.org http://www.connotea.org (Nature) Associate metadata with Bookmarks specified by

URL’s, DOI’s (Digital Object Identifiers) Users add comments and keywords (called tags) Users are linked together into groups (communities) Information such as title and authors extracted

automatically from some sites (PubMed, ACM, IEEE, Wiley etc.)

Bibtex like additional information in CiteULike This is perhaps de facto Semantic Web – remarkable

for its simplicity

Page 8: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

88

Connotea

Page 9: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

99

Connotea queried by SERVOGrid

Page 10: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

1010

Document-enhanced Cyberinfrastructureaka Semantic Scholar Grid I

Citeseer and Google Scholar scour the Internet and analyze documents for incidental metadata• Title, author and institution of documents• Citations with their own metadata allowing one to match

to other documents Science.gov extracts metadata from lots of US Government

databases These capabilities are sure to become more powerful and to

be extended• Give “Citation Index” in real time• Tell you all authors of all papers that cite a paper that

cites you etc. (Note it’s a small world so don’t go too far in link analysis)

• Tell you all citations of all papers in a workshop

Page 11: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

1111

Document-enhanced Cyberinfrastructureaka Semantic Scholar Grid II

It is natural to develop core document Services such as those used in Citeseer/Google Scholar but applied to “your” documents of interest that may not have been processed yet • As just submitted to a conference perhaps

These tools can help form useful lists such as authors of all cited or submitted papers to a journal

OSCAR2/3 (from Peter Murray-Rust’s group at Cambridge) augment the application independent “core” metadata (Title, authors, institutions, Citations) with a list of all chemical terms • This tool is a Service that can be applied to “your” document or to a set of

documents harvested in some fashion

• Other fields have natural application specific metadata and OSCAR like tools can be developed for them

Such high value tools could appear on “publisher” sites of future (or else publishers will disappear)

Page 12: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

12

OSCAR3 Service from Cambridge UK Oscar3 is a tool for shallow, chemistry-specific

natural language parsing of chemical documents (i.e. journal articles).

It identifies (or attempts to identify): Chemical names: singular nouns, plurals, verbs etc., also

formulae and acronyms. Chemical data: Spectra, melting/boiling point, yield etc. in

experimental sections. Other entities: Things like N(5)-C(3) and so on.

Uses SMILES, InChI and CML There is a larger effort, SciBorg, in this area

http://www.cl.cam.ac.uk/~aac10/escience/sciborg.html

http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Oscar3

Page 13: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

1313

OSCAR2 Chemistry Document analysis

It detects “magic” chemical strings in text and then• Stores them as

metadata associated with document

Queries ChemInformatics repositories to tell you lots of information about identified compounds

Tells you which other documents have this compound

Page 14: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

Clustering Documents from chemicalproperties

Page 15: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

1515

Provenance and Delicious CI We can use del.icio.us style interface to annotate

Application Data with (extra) provenance and user comments of any type (describing quality of data or a keyword relating different data etc.)• All data should be labeled by a URI to enable this• One has in addition Citeseer/OSCAR metadata

Current major tagging systems support flat list of tags without name=value (RDF triple) or schema organization• Tradeoff between features and pervasive deployment

Some extra features are easy to add as a custom service Features not supported by del.icio.us can be uploaded

as comments

Page 16: 1 Web 2.0 and Grids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

1616

Current Status Google Scholar, Windows Live Academic Search, del.icio.us,

Connotea, CiteULike, OSCAR3 are Web Services Debugging on 500 presentations and papers from my CGL

research group Experiment with GGF Presentations, Broad collection of

Chemical Informatics resources (explore science document CI link) and Concurrency&Computation: Practice&Experience Web site (?business model for journals)