deroure repo3
TRANSCRIPT
How Repositories can avoid the Failings of the Grid
David De Roure
IEEE e-Science 2008
“But the Grid is successful!”
So why are there three projects addressing lack of uptake?
Adoption of e-Research Technologies
...and a theme in the e-Science Institute?
How did we get here?!
Early adopter successThen rollout of infrastructure servicesAnd then wondering where the users are
Heard at another repositories event...
“How do we persuade researchers to populate our repositories?”
Due to the complexity of the software and the backend infrastructural requirements, e-Science projects usually involve large teams managed and developed by research laboratories, large universities or governments.
e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.
What are we really trying to achieve here?
Not just accelerated but new
A. Everyone using the Grid/Repositories?
B. Research advances on an everyday basis that would not have happened otherwise?
How do we move from heroic scientists doing heroic science with heroic infrastructure to everyday scientists doing science they couldn’t do before?humanists
archaeologistsgeographersmusicologists...researchers!
research
It’s the democratisation of e-Research
Jim Downing came up with the idea of “Long Tail Science”... So we are exploring how big science and long-tail science work together to communicate their knowledge. Long-tail science needs its domain repositories - I am not sanguine that IRs can provide the metalayers (search, metadata, domain-specific knowledge, domain data) that are needed for effective discovery and re-use.
Peter Murray-Rust
Technical Reports
Reprints
Peer-Reviewed Journal &
Conference Papers
Preprints &
Metadata
researchers
LocalWeb
Repositories
Graduate Students
Undergraduate Students
Virtual Learning Environment
Certified Experimental
Results & Analyses
experimentation
Data, Metadata Provenance WorkflowsOntologies
Digital Libraries
The social process of science 2.0
Everyday researchers doing everyday research
• Not just a specialist few doing heroic science with heroic infrastructure
• Chemists are blogging the lab• Everyone is mashing up• Everday hardware – multicore
machines and mobile devices
1
A data-centric perspective, like researchers
• Data is large, rich, complex and real-time
• There is new value in data, through new digital artefacts and through metadata e.g. context, provenance, workflows
• This isn’t “anti-computation” –design interaction around data
2
Collaborative and participatory
• The social process of science revisited in the digital age
• Collaborative tools – blogsand Wikis
• e-Science now focuseson publishing as well as consuming
• Scholarly lifecycle perspective
3
Benefitting from the scale of digital science activity to support science
• This is new and powerful!• Community intelligence• Review• Usage informing
recommendation• e.g. OpenWetWare• e.g. myExperiment
4
Increasingly open
• Preprints servers and institutional repositories
• Open journals• Open access to data• Science Commons• Object Reuse & Exchange
5
Better not Perfect
• The technologies people are using are not perfect
• They are better• They are easy to use• They are chosen by
scientists
6
Empowering researchers
• The success stories come from the researchers who have learned to use ICT
• Domain ICT experts are delivering the solutions
• Anything that takes away autonomy will be resisted
7
About pervasive computing
• e-Science is about the intersection of the digital and physical worlds
• Sensor networks• Mobile handheld
devices
8
• e-Research is now enabling researchers to do some completely new stuff!
• As the individual pieces become easy to use, researchers can bring them together in new ways and ask new questions
• “The next level”
Onward and Upward
“Standing on theshoulders of giants”
www.w3.org/2007/Talks/www2007-AnsweringScientificQuestions-Ruttenberg.pdf
(Everyday researchers are giants too)
Repositories
• Absolutely key role in future research. So think of a better word!
• Think of a park / reserve / gardens / zoo– Visitors, rangers, wardens, gardeners, experts,
security, volunteers, ...– Curation by providers,
experts and consumers
Repositories
Those 8 Repository points
www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
1. Not just a specialist few doing heroic science with heroic infrastructure – repositories for all!
2. There is new value in data, through new digital artefacts and through metadata e.g. context, provenance, workflows
3. e-Science now focuses on publishing as well as consuming4. Usage informing recommendation5. Researchers work with collections - Object Reuse &
Exchange6. They are easy to use7. Anything that takes away autonomy will be resisted8. e-Science is about the intersection of the digital and physical
worlds (not 1970s library catalogue interfaces)
Curation of process• Find a process based on what it and find copies or
similar services usable as alternates.• Understand how and when it works, how to
operate it correctly and predict its performance.• Know the conditions for use: permissions, licenses,
platforms, and costs.• Judge the benefits of adoption based on its
reputation, provenance and validation by peers.• Estimate the risk of adoption based on its
reliability and stability.• Get assistance for its incorporation into
applications and workflows.
Go
ble
& D
e R
ou
re E
du
cau
se R
evi
ew S
ep
/Oct
200
8And we need to curate processes too!
• To understand where we’re going, look at communities which have been early to embrace new technology.
• e-Science is one. What can we learn?• Incidentally, so is music and broadcast!
– Vinyl was like books– Now the process is digital from the studio through to
playback on an iPod– People create content– People publish content– Has the business adapted?
Transformation is already underway
Note to Reader. The next slides are not intended to be anti-grid. Everyone working on Grid is doing great work.
Don’t think rollout of technologies...
Think roll-in of researchers...
MassUse byResearchers
MassUse byResearchers
Knowledge co-production vs Service Delivery!
N2
N
N
Without middleware we need lots of bits of software to join things together
One Middleware2N
N
N
With middleware there are fewer arrows!
Middleware?
N
N
Middleware
Middleware
Middleware
Middleware
MiddlewarePolynomial involving N1,N2 and M
But this is what happened. Now the picture with lots of thin arrows isn’t quite so scary!
Grid
use Web 2.0 here
Gridcloud HPC
Web is being embraced for usability and programmability e.g. mashups
And Grid is trying to come to terms with multicore and clouds!
How would this repository ecosystemself-organise to support Research 2.0?
Imagine Eprints/Dspace/Fedora isn’t something you download and run on a local server Imagine instead that you just go to the cloud and make one*
Would there be institutional repositories?
A Thought Experiment
* (Actually you can!)
Tension between data being “out on the Web” (user view) or in an institutional machine room (provider view)What is the curator view?Issues perceived differently for metadata servers and data servers
Is it a wave or is it a particle?web
Linked Data
1. Understand what the users will need by going on the journey together
2. Be open-minded: are we solving the right problem? (Don’t forget curation of process!)
3. Don’t create artificial distinctions from Web4. Beware standards as a barrier to adoption5. Think cloud, outside the institutional box:
imagine the repository factory6. Think of a new name for repositories!
How Repositories can avoid Failing like the Grid