brown bag: new models of scholarly communication for digital scholarship, by stephen griffin,...
DESCRIPTION
In his talk for the MIT Libraries Program on Information Science, Steve Griffin discusses how how research libraries can play a key and expanded role in enabling digital scholarship and creating the supporting activities that sustain it.TRANSCRIPT
New Models of Scholarly Communication for Digital Scholarship
MIT LibrariesOctober 2014
Stephen GriffinUniversity of [email protected]
New computation and data intensive modes of inquiry and experimentation are often referred to as cyberscholarship
or digital scholarship
New Modes Of Inquiry In The Digital Era
Digital Scholarship
(videos in ppt)
Digital Scholarship: New Forms of Inquiry
Data Middleware
ICT Infrastructure
Digital Content
- Computational Science
- Data-driven Research
- Data-intensive Research and Scholarship
New interfaces and tools and data access, management, analysis, interpretation, presentation,
dissemination and reuse [in collaborative environments]
+ exponential increases
+
-portable devices-wireless high-bandwidth-collaboration environments-open access/open source
(eScience, “big data”, ... )
New Research and Study Approaches
• theoretical/analytical (new theories are formulated and proven using a priori axioms and definitions)
• empirical/observational (inquiry based on detectable and measurable evidence; hypotheses driven and often aimed at theory building which in turn can yield new hypotheses and identify potential new theories)
• computational (typically, large-scale computation applied to mathematical models using high performance computers to produce simulations of physical phenomena often in displayed in the form of scientific visualizations [e-Science])
• data-driven (analysis of very large numerical or textual data sets with the goal of elucidating patterns or discovering new correlations or relationships from which hypotheses might be constructed)
• data-intensive (analysis and combination of large diverse heterogeneous data stores with the goal of identifying and discovering new connections or relationships ….)
• Recent developments that can also be considered as extensions or new forms of theoretical and empirical methods.
A Few Consequences of the Digital Era
New modes of scholarly inquiry and research
A necessity for interoperability at scale
Transformation of scholarly communication
New organizational forms to exploit information
Multiple models for scholarly communication
Evolving social mores among individuals
New creative environments and resources for scholarly work
From 1980-2010: three decades of massive technological change and integration of information and communication technologies into the conduct of research, scholarship and aspects of everyday life across the planet • exponential increase in processor speed• exponential increase in memory capacity• exponential increase in global bandwidth, internet nodes and users• exponential increase in online digital content• exponential decrease in component size/power consumption (!!!)• exponential decrease in cost
+ wireless high-bandwidth, portable devices, collaboration environments, open access/open source, data management and access efforts, new interfaces and tools for analysis, interpretation, presentation, reuse of scholarly resources ...
Historic Context (1): Technological Change
--> New Methods of Research and Scholarship
The emergence of a culture of sharing has accompanied the growth of the internet and new communities involved in creation and management of digital content and resources. At the same time there has been a general movement toward openness of internet-based content and resources (when appropriate and legal)
• Open Data• Open Access Journals• Open Repositories and Archives• Open Source Software• Open Architectures• Open Educational Resources• Open and Transparent Governance• Open Scholarly and Practitioner Communities• Open Scholarship
Historic Context (2) Cultural Change
The Digital Libraries Initiative
Supercomputer Centers Program NSFnet
Important Early Programs at NSF
Cyberinfrastructure Program
Digital Repositories Continue to Evolve in Terms of Interoperability, Scaling and Accessibility
Background reading: Early and Accurate PredictionsInteroperability, Scaling, and the Digital Libraries Research AgendaClifford Lynch, Hector Garcia-Molina, 1995 Report
Interoperability is Broadly Defined
Organizational (implications on support structures) Resource ownership and control Staff (changing skill needs and user communities)
Inter-community (supporting new relationships) Multi-disciplinary research Cross-sector operations (e.g., libraries, museums and archives)
International (bridging diversity) Differences in culture, law, and practice Differences in language
Analytic (understanding intent) Context & task dependencies Temporal & spatial relationships
Adapted from http://www.ukoln.ac.uk/interop-focus/about/
Syntactic (structural relationships within data) Communication, transport, storage and representation Z39.50, ISO-ILL, XML, …
Semantic (interpretation of term usage and meaning) Different terms to describe similar concepts Identical terms to describe different concepts
Ron Larsen, University of Pittsburgh
Interoperability In Terms of Abstraction Levels
Abstract
ConcreteBriefing Paper: Digital Preservation Europe, Stefan Gradmann
Middleware
services layer
ITC Infrastructure
Processors, memory, network
Digital repositories (1990s)
Scientific DBs
Digital Libraries and other repositories
Institutional and disciplinary repositories (2000) federated across repositories [DSpace; Fedora; ePrints]
Very large repositories and global data infrastructures (2010) federation at data level via semantic web tech-nologies, linked open data principles
interoperability across repositories via OAI-PMH compound object packaging formats, etc..
interlinked data over the web using URIs, RDF, links, vocabularies, relations; abstractions; graphs …
functional individual repositories; metadata catalogues and diverse information objects …
software between the network and the applications providing authentication, identification, authorization, directories, security …
processors, networks, storage, codes, compilers, tools, algorithms, software libraries …
increasing capacities and capabilities time
Repository Development Trendstime
new
dev
elop
men
ts
Progress is Being Made in Developing Very Large Scale, Highly Functional Global Data Infrastructures Containing a
Rich Diversity of Information Objects
Merging to become universally accessible, carefully maintained knowledge Infrastructure based on common Principles and practices
Clear evidence suggests that these can now support a full range of scholarly communication activities and develop into a persistent scholarly communication infrastructure
Middleware
services layer
ITC Infrastructure
Processors, memory, network
Digital repositories (1990s)
Scientific DBs
Digital Libraries and other repositories
Institutional repositories 2000federated across repositories[DSpace; Fedora; ePrints]
Repositories and global data infrastructures (2010) federation at data level via semantic web tech-nologies, linked open data principles
ApplicationsSpace
humanities
social sciences
natural sciences(experimental methods)
natural sciences(computational simulations)
Potential Application Areas Increase With Data Infrastructure Scale, Scope and Functionality
formal sciences
Middleware
services layer
ITC Infrastructure
Processors, memory, network
Digital repositories (1990s)
Scientific DBs
Digital Libraries and other repositories
Institutional repositories 2000federated across repositories[DSpace; Fedora; ePrints]
Repositories and global data infrastructures (2010) federation at data level via semantic web tech-nologies, linked open data principles
Levels of Activity and Investment
humanities
social sciences
natural sciences(experimental methods)
natural sciences(computational simulations)
big data
eScience
Research and Scholarship Funding is Not Seen as Equitably Distributed
Scholarly Communication MeetingPittsburgh, January, 2013
Participants
Ron Larsen, U PittsburghSteve Griffin, U PittsburghBill Arms, Cornell UJohan Bollen, Indiana UFran Berman, Rensselaer PolyBob Pego, Carnegie Mellon UMicah Altman, MIT LibrariesGreg Crane, Tufts U
Spencer Keralis, U North TexasJosh Greenberg, Sloan FoundationVictoria Stodden, Columbia UTom Moritz, consultantEd Fox, VPIChuck Henry, CLIRCarole Goble, U Manchester
Unable to attend due to travel/other circumstances:
Lewis Lancaster, U Cal, BerkeleyDon Waters, Mellon Foundation
Carl Lagoze, U MichiganSandy Payette, Cornell UJohn Unsworth, Brandeis U
S Griffin/University of Pittsburgh
Meeting Goals
The meeting goals were to identify new means and opportunities for enhancing scholarly communication across disciplines and to explore
new models for documenting and disseminating a comprehensive record of computational and data-centered research
S Griffin/University of Pittsburgh
• new methodologies, reach and affordances of digital scholarship
• technologies and activities to capture of a more complete record of stages in the scholarly research workflow
• effective frameworks (existing and proposed) to accelerate the repurposing and reuse of open data resulting from scholarly work and research
• robust document models for presenting and ”bundling" the processes, resources, outputs and potential impacts of scholarly work
• new means for dissemination to increase the diffusion and reach of new concepts and findings
• accurate measures to ensure appropriate and fair attribution, acknowledgement, credit and reward for those involved in carrying out the work
Meeting Presentations and Group Discussion Foci
S Griffin/University of Pittsburgh
A Persistent and Recurring Theme- A New Burden of Evidence -
Defining features of science include repeatability and reproducibility. Repeatability refers to the ability to duplicate an experiment under the same conditions many times and obtain the same result. Reproducibility refers to the ability for others to replicate the work in different environments and obtain the same results, setting the stage for extending the work in new directions. These requirements hold for theoretical and empirical research and apply to the formal, natural and social sciences. Replication of results using proven, rigorous methodologies confirms the veracity of a research process and outcome.
Carole GobleVictoria StoddenTom Moritz
S Griffin/University of Pittsburgh
Some necessary conditions for reproducibility …
access to a comprehensive record of the research process and scholarly workflow including:
process records: algorithms, software pipelines and versioning, datasets and transformations, storage formats and protocols, event tracing, ...
resource descriptions: journals, logs, tools, methods, dialog, collaborative activities and external contributions, ...
intermediate forms: temporary models, concept changes, recursion points, software versions, external dialogs and contributions ...
workflow artifacts: transcriptions, translations, annotations, steps taken to acknowledge distribution of effort, attribution and credit, ...
S Griffin/University of Pittsburgh
Digital scholarship often involves new types of information objects, data analytic processes, resources, tools and heuristic representation of findings that cannot be accurately or completely described or communicated in traditional print or in print + electronic venues. What new expressive forms, document models, practices and venues might help remedy this situation.
What can be done to effectively capture, document, and prepare the information flow associated with each stage of a research project or scholarly work so that they can become part of a larger, global knowledge and scholarly communications infrastructure.
Driving Questions
S Griffin/University of Pittsburgh
1.
2.
Points to consider
The researchers will be disinclined to do this for multiple reasons. Is there a possibility of automating this? If so, would this be most tractable during the individual stages of the research workflow or after the project is complete. What might this entail?
How can digital stewardship become a central activity as part of major research projects? The purpose would be to document the research process and prepare resources and artifacts to enable reproducibility. Is this already being done to a certain degree in some disciplinary areas? Why? What has been the benefit to the larger scholarly communities involved?
S Griffin/University of Pittsburgh
Recommendations #1
• capture a comprehensive record of research process and scholarly production to support verification and reproducibility of results
• create full research process record: logs, applications, methods, datasets, dialog, collaborative activities
• prepare workflow artifacts for repurposing and reuse
• develop a protocol model for scholarly output that allows for modularity, distributes effort and credit, and facilitates democratic access
• develop methods for managing release of components of scholarly output from all stages of the scholarly workflow
implies digital stewardship across the workflow stages
Recommendations #2
Need a new “bundled” modular research document model in which elements are linked semantically, released when ready (staged release) and capable of being recombined at any time and in different environments
• Provides a variety of presentation forms to accommodate disciplinary domain requiring different expressive forms
• Facilitates its own automated retrieval
• Gives direct access to datasets, tools and other workflow elements
• Anticipates future needs for storage, access and use (curation, stewardship, provenance issues)
• Capable of aggregation at the component level with other research documents
• Annotation and relationship friendly; indefinite versioning
Greg CraneCarole Goble
S Griffin/University of Pittsburgh
Occurrences of “scholarly communication” from Google’s Ngram Database: 1950 - 2008
Ngram Database of >5,000,00 Books
scholarly communication
S Griffin/University of Pittsburgh
Scholarly Information is Communicated in Many Ways and Forms*
articles
exhibits
monographs
video
multimedia
data sets
software tools
websites information visualization
bibliographies
journal & other articles
white papers
lectures
conference posters
software
dialogues
maps
archivesblogs
endorsements
reports
proposals
scholarly commentary
* Dan Cohen
models
reviews
project descriptions
social media & web genres
(and there appears to be a trend of “Informal” modes gaining status over time)
institutional repositories
recommendations
emailsmanuscripts
critical editions
A Very General Scholarly Research Workflow Model- example of simple and traditional form -
inspiration,explore, discover area of interest
conduct research, analyze results
prepare findings, disseminate results
Information flows into and out of the project at each stage
primarily informal processes
journal articles, monographs, conference papers(copyright)
activity
discovered, referenced, accessed, gathered, transformed, analyzed, presented
mix of dialog, data and resources from individuals, the
web, libraries, archives, etc.
primarily formal processes
data
lowhigh
formulateproblem, design research, collect data
Libraries, Academic Departments, Individuals, ...t
S Griffin/University of Pittsburgh
loosely organized activities to collect and prepare artifacts for future repurposing and reuse by others [event tracing, versioning, logs, journals, data documentation, intermediate forms, temporary models, concept changes, recursion points, transcription, translation, annotation, ...]
and,sometimes:
Current Scholarly Research Workflow and Communication Model
inspiration,explore discover area of interest
conduct research, analyze results
prepare findings, disseminate results
activity:
discovered, studied, accessed, collected, transformed, analyzed, prepared, presented
data and research cyberinfrastructure: digital libraries, scientific databases, reports, publications, ETDs, software & code libraries, executable documents, 1st and 2nd generation repositories (linked open data; semantic web technologies ...), processing, storage and grid services
subscription & open access journals, self-published documents & pre-prints, hybrid dissemination models
Information flows into and out of project
recently emerging global data and resources infrastructures
formulateproblem, design research, collect data
conversant/discursive web: social media, blogs, chat rooms, project sites, commentaries, ...
hosting institutions (libraries, archives, other content and service providers)
data:
t
S Griffin/University of Pittsburgh
A New General Model for Scholarly Communication Infrastructure Based on Scholarly Workflow
Information Flow Into and From Workflow Stages
Stewardship of workflow artifacts for reference, repurposing and reuse
conversant/discursive web: social media, blogs, chat rooms, project sites, commentaries, ...
evaluation mechanisms
inspiration,explore, discover area of interest
formulateproblem, design research, collect data
conduct research, analyze results
prepare findings, disseminate results
prepare and deliver research assets for reuse
distribution, managing and services entities
global data and research cyberinfrastructure: research data infrastructures, digital libraries, scientific databases, reports, publications, ETDs, software & code libraries, executable documents, 1 st and 2nd generation repositories (linked open data; semantic web technologies ...), processing, storage, cloud and grid services, ...
workflow information management mechanisms
scholarly communications layer: dynamic research reports with detailed descriptive information of the methods and concepts as well as access to software, data and other experimental assets, provenance and citation linkages, etc. meeting community-adopted practices for presentation, access, preservation and archiving, ...
t
S Griffin/University of Pittsburgh
New Roles for Libraries, Archives and Service Providers
In this model the role of Libraries evolves from one of holders and providers of knowledge resources to one of being an active partner in the research process and publishing and disseminating results. Libraries and librarians can provide tools and expertise that expedite research and scholarly work. Libraries have the institutional structure and many of the resources needed to publish data and scholarly journals.
S Griffin/University of Pittsburgh
Libraries Can Lead Efforts to Advance Digital Scholarship and Develop New Scholarly
Communication Models
• accelerate the research cycle • expand the impact of research endeavors• inspire new forms of interoperable scholarly resources• promote new creative efforts and works• support intellectual freedom in scholarly endeavors across disciplines
Desired Impact
Libraries have begun to create centers for digital scholarships
Libraries are exploring new roles as publishers of journals and data
S Griffin/University of Pittsburgh
Role of Libraries in SupportingData-Intensive Scholarship
eScience simulation data
experimental data (automated collection and preparation)
experimental data(human involvement)
create new repositories for sharing digital content and tools for domain
and cultural studyPriority
lowest
highest
However, comprehensive descriptive information and links to research activities on and off-site is very important
ongoing digitization and linking of of analog special collections
expand and create new mechanisms to
support scholarly communication
S Griffin/University of Pittsburgh
There is a Need for New Document Models for Full Reporting of the Workflow and Results of
Digital Scholarship
S Griffin/University of Pittsburgh
Research Information Network and British library “Patterns of information use and exchange: case studies of researchers in the life sciences” http://www.rin.ac.uk/system/files/attachments/Patterns_information_use-REPORT_Nov09.pdf
Challenges for Document Models: Describing Complex Projects
S Griffin/University of Pittsburgh
A Richer Document Model: One Example
Stefan GradmannS Griffin/University of Pittsburgh
New Tools Can Help
S Griffin/University of Pittsburgh
Nowakowski et al. (2011) The Collage Authoring Environment Procedia Computer Science v4 http:// dx.doi.org/10.1016/j.procs.2011.04.064
S Griffin/University of Pittsburgh
Dynamic Project Sites are Complex Scholarly Documents
http://ocw.mit.edu/ans7870/21f/21f.027/home/index.htmlwww.ecai.org
S Griffin/University of Pittsburgh
One Example of Deep History
S Griffin/University of Pittsburgh
Should greater value (and investment)be placed on short-term sensor data withfew data types than the great variety of information contained in ship’s logs createdby thousands of people over several centuries?
The former receives enormous financialsupport; the latter is crowd-sourced ... http://www.oldweather.org/
Information Visualization is Becoming an Essential Part of Digital
Scholarship
S Griffin/University of Pittsburgh
Using Visualizations for Music Discovery ISMIR 2009Justin Donaldson Paul LamereISMIR 2009
S Griffin/University of Pittsburgh
Graphical History Of Rock & Roll:
United States Twitter-Flicker Activity Infographic
S Griffin/University of Pittsburgh
S Griffin/University of Pittsburgh
Real Time Cyber Attacks
S Griffin/University of Pittsburgh
http://map.ipviking.com/
Real-Time Flight Tracker
http://www.flightradar24.com/39.58,-80.59/6
http://network.bepress.com/
S Griffin/University of Pittsburgh
New Interfaces for Navigating Collections
“‘culturomics,’ focuses on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.”
New Interdisciplines Emerge from Data-intensive Scholarship*
*
S Griffin/University of Pittsburgh
Acerbi,Lampos,Garnett,Bentley, “The Expression of Emotion in 20th Century Books”, 2013, PLOS
Acerbi,Lampos,Garnett,Bentley, “The Expression of Emotion in 20th Century Books”, 2013, PLOS
S Griffin/University of Pittsburgh
Open Access is Continues to Grow in Number and Types of Documents and Mandates by Institutions and Funders
http://poeticeconomics.blogspot.ca/2006/08/dramatic-growth-of-open-access-series.html
For Metrics and Related Information on New Developments See:
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch http://lod-cloud.net/”
A Web of Data is Being Developed Offering Extensive New Functionalities
S Griffin/University of Pittsburgh
The Larger Question: Will Academic Departmental Norms for Measuring Faculty Research Productivity
Change to Reflect New Realities?
Publications in “Peer Reviewed” Subscription Print
Journals
• Publications in Open Access Journals• Publication in Institutional and Disciplinary Repositories• Publication of Data, Algorithms, etc.• Publication of Open Source Tools and Resources, etc.
S Griffin/University of Pittsburgh
“Cultural historical research means understanding 'possible pasts', the facts, events, material, social and psychological influences and motivations. It lives from understanding contexts, by pulling together bits and pieces of related facts from disparate resources, which can typically not be classified under subjects in an obvious way. It lives from taking into account all known facts.
… Under these conditions, the global network of knowledge can reveal deep “stories” built out of an immense number of concatenated primary facts, and a thing impossible for a traditional library.”
Martin Doerr - Principal Researcher, Forth - Hellas
Benefits from Richer Document ModelsOne Example: Cultural Historical Research
S. Griffin/U Pittsburgh/CNI Spring 2013 Meeting
Thank You
S Griffin/University of Pittsburgh