time to harvest: electronic doctoral theses in italy stefania arabito, daniela cermesoni, paola...
TRANSCRIPT
Time to Harvest:Electronic Doctoral Theses in Italy
Stefania Arabito, Daniela Cermesoni, Paola Galimberti, Marialaura Vignocchi
ETD 2008, Aberdeen, 4th-7th June
ETD 2008Aberdeen, 4th – 7th June 2
Pieter Bruegel the Elder, The Harvesters 1565 - Oil on Wood - New York, The Metropolitan Museum of Art.
ETD 2008Aberdeen, 4th – 7th June 3
The CRUI Working Group on Open Access
Set up in April 2006 within the Library Committee of the Conference of the Rectors of Italian Universities (CRUI).
The OA WG aims to implement the principles of the Berlin Declaration. The starting assumption was to ensure visibility, dissemination and impact to doctoral theses by archiving them into OA IRs, as the first step towards establishing OA policies for all research outputs.
ETD 2008Aberdeen, 4th – 7th June 4
The accessibility of doctoral thesesin Italy...
The role of the National Libraries in Florence and Rome Preserve doctoral theses Ensure public access
Often searchable in local University OPACs
ETD 2008Aberdeen, 4th – 7th June 5
…with some restraints
Physical access to doctoral theses delayed because shipment from Universities cataloguing procedures
are extremely time consuming. Doctoral theses cannot be checked out; They cannot be requested on interlibrary
loan; No photocopying services are provided on
premises.
ETD 2008Aberdeen, 4th – 7th June 6
How to enhance doctoral theses:
Immediate deposit in IRs, to the advantage of both post graduates and the Institutions they belong to;
Open Access availability in compliance with the Berlin Declaration and the recent recommendations of the European Commission(Council calls for swift progress to knowledge economy, http://cordis.europa.eu/search/index.cfm?fuseaction=news.document&N_RCN=29243).
ETD 2008Aberdeen, 4th – 7th June 7
The Guidelines (1)
The Linee guida per il deposito delle tesi di dottorato negli archivi aperti (Guidelines for archiving doctoral theses in IRs) were approved by CRUI in November 2007 and are meant to promote best practices for capturing storing disseminating electronic doctoral
theses.
ETD 2008Aberdeen, 4th – 7th June 8
The Guidelines (2)
Models provided by Germany, The Netherlands, Great Britain, Denmark and Sweden, where harvesting services, established at a national level, have carried out the European E-Theses project.
The Guidelines aim at supplying a practical toolkit for all Italian Universities planning to deposit doctoral theses in IRs.
Legal forms and clauses are included A metadata scheme has been devised for the sake of
interoperability. Practicable solutions offered to simplify both
administrative and technical procedures.
ETD 2008Aberdeen, 4th – 7th June 9
Legal issues (1)
the PhD student is the only author
as such holds all moral rights and economic exploitation rights
hence, s/he has the right to prevent his/her thesis from being publicly available.
The laws regulating PhD courses (DPR 07/11/80, n. 382, art 73, D.M. n. 224 04/30/99) conversely hold BNCF responsible for making all PhD theses publicly available.
Italian copyright law
ETD 2008Aberdeen, 4th – 7th June 10
Legal issues (2)
There is no specific law catering for electronic material and copyright issues.
In fact most Universities do not provide full access to e-theses lest copyright infringements may ensue. These Universities have not set up an IR yet.
However, the assumption of the OA WG is that accessibility should be nation-wide and by no means institution-wide.
ETD 2008Aberdeen, 4th – 7th June 11
Institutional Repositories (1)
Doctoral theses become public as soon as they are defended
CRUI OA WG Guidelines
Italian Universities have the right and the duty to mandate self-archiving of doctoral theses in their IRs.
ETD 2008Aberdeen, 4th – 7th June 12
Institutional Repositories (2)
The institutional policies regulating PhD courses have to be suitably modified.
notification of theses availability in the IR after defence is a requirement for being granted a PhD degree
students will have to comply with such provision but for a few exceptions where an embargo period will be allowed
For ongoing courses immediate deposit can be mandated.
All universities should adopt similar strategies and go for self-archiving instead of mediated deposit by librarians.
ETD 2008Aberdeen, 4th – 7th June 13
Institutional Repositories (3)
All doctoral theses will be submitted to, or harvested by, BNCF and BNCR, which are in charge of legal deposit long-term preservation national discovery and accessibility.
Integration and interoperability require the adoption of standard protocols and metadata.
ETD 2008Aberdeen, 4th – 7th June 14
Copyright clearance and embargo
Copyright issues jeopardize the free availability of theses:
use of third party owned materials third parties involved (possible infringement of privacy) patentable discoveries ongoing publication of data (according to the publisher
policy)
EMBARGO PERIOD,in compliance with the immediate deposit/optional access model.
metadata immediately searchable and retrievable e-theses embargoed for a period from 6 to 12 months.
ETD 2008Aberdeen, 4th – 7th June 15
Sharing the metadata (1)
Metadata set attached to the Guidelines the OA WG made a comparative analysis between
the metadata sets commonly used in Italian IRs and current European practices. interoperable repositories not only in Italy but also
in Europe share standardized procedures.
All the main harvesting servers can convert the metadata set of an IR to a standard format.
It is crucial to build upon a common metadata set, to avoid subsequent interventions to normalize metadata both within and outside Italy.
ETD 2008Aberdeen, 4th – 7th June 16
Sharing the metadata (2)
Presently, the only viable objective is to attain the interoperability level of sharing a common protocol for data
exchange (OAI-PMH), information exchange and data structure, with an accurate definition of the meanings of all fields.
Simple Dublin Core has been considered inaccurate for an advanced search, ensured by a DC qualified metadata set.
Some fields are mandatory, others are recommended or optional.
ETD 2008Aberdeen, 4th – 7th June 17
Sharing the metadata (3)
Set of mandatory fields dc.title: title of the work; dc.creator: author of the work (surname, name); dc.description: abstract (better if in English); dc.language: language (format ISO639-1); dc.identifier: URL of the thesis full-text or of a halfway page; dc.type: Doctoral Thesis (only in English); dc.contributor: tutor/supervisor (surname, name); dc.date: date of publication (ISO 8601), i.e. date of defence; this is
the only date in metadata dc.publisher: name of the University dc.format: dimension in bytes/MIME type.
Set of recommended or optional fields dc.subject: classification of subject fields according to the Ministry of
Education and Research; dc.rights: embargo or immediate availability
ETD 2008Aberdeen, 4th – 7th June 18
File formats (1)
administrative documents
need to secure and endure Authenticity Integrity fixity
for the sake of long-term digital preservation.
They should also be assigned suitable archival metadata.
bibliographic items
criteria of web accessibility lack of domain expertise in the
field of digital curation by authors and limited informatics toolset
lack of awareness of format obsolescence
interest in protecting their work from unintentional/malicious altering after web publishing
files must not be encrypted, in order to permit refreshing.
The “right” file formats is still a controversial issue, especially in the domain of e-theses.
ETD 2008Aberdeen, 4th – 7th June 19
File formats (2)
What to do: Educate on long-term preservation issues. Provide with the correct tools to produce
xml files. Liaise with faculty. User-friendly and effective authoring tools. A nation-wide strategy. All the parties involved should share their
expertise.
ETD 2008Aberdeen, 4th – 7th June 20
File formats (3)About PDF
PDF/A has eventually been chosen – in accordance with the requirements of the National Libraries.
it does not allow text mining; has become a de facto standard all over Europe and
beyond (apart from Germany); Digital Preservation Coalition recommendations
Betsy A. Fanning, Preserving the Data Explosion: Using PDF. DPC, 2008. www.dpconline.org/docs/reports/dpctw08-02.pdf
But
trouble and not straightforward choice for Universities; need to start collecting and store e-theses; yet, the matter is not settled at all; a constant evaluation of file formats is required;
ETD 2008Aberdeen, 4th – 7th June 21
File formats – archivists’ point of view
Long-term preservation vs accessibility Archive and dissemination files should be
different. Possibility to duplicate the uploaded thesis
applying different metadata to the archive file. Authorization by the author required.
Effective collaboration between librarians and archivists.
ETD 2008Aberdeen, 4th – 7th June 22
IRs and National Libraries (1)
July 2007: legal deposit of doctoral theses could rely on digital technology / paper copies were no longer needed.
The OA WG immediately started a project with BNCF and BNCR to test the feasibility of the new system:
analyze the workflow of the legal deposit of e-theses
implement the necessary technological infrastructure to automate the procedure.
ETD 2008Aberdeen, 4th – 7th June 23
IRs and National Libraries (2) At this early stage, two possible technical
procedures have been put forward BNCF will harvest both the metadata and full-texts via OAI-
PMH of the doctoral theses of the Universities which have already implemented their IRs.
alternative upload via web form for the Universities which have not set up an OAI-PMH compliant repository yet.
Planned workflow only partially accomplished. Harvesting via OAI-PMH tested successfully with the IR of
the University of Bologna, but SHA1 hashes have not been sent back and metadata have not been validated.
even the metadata issue has not been settled yet. BNCF required simple and not qualified Dublin Core. The National Libraries and Universities will have to jointly develop functional metadata for legal deposit procedures, including the automated management of digital rights.
ETD 2008Aberdeen, 4th – 7th June 24
E-theses and IRs in Italy:work in progress (1)
Last January the OA WG carried out a survey amongst Italian Universities. Here are the main outcomes of the questionnaire:
25 Universities were collecting or about to collect electronic doctoral theses in IRs, mostly to make them OA available (table 1);
Resolutions of Academic Senates (table 2); the local output ranges from 50 to 500-800 doctoral theses
per year, according to the size of the institutions, with an average of 200-300 theses per year;
librarians mainly advocated depositing in the IR; workflow procedures were taken in charge mainly by
librarians, with the cooperation of administrative staff;
ETD 2008Aberdeen, 4th – 7th June 25
Availability of collected doctoral theses (table 1)
Most of the 25 Universities collecting doctoral theses in IRs make them OA available.
OA availability
31%
no availability8%
not all theses OA available
15%
consultation on premises
15%
out to 31%
ETD 2008Aberdeen, 4th – 7th June 26
E-theses and IRs in Italy:work in progress (1)
Last January the OA WG carried out a survey amongst Italian Universities. Here are the main outcomes of the questionnaire:
25 Universities were collecting or about to collect electronic doctoral theses in IRs, mostly to make them OA available (table 1);
Resolutions of Academic Senates (table 2); the local output ranges from 50 to 500-800 doctoral theses
per year, according to the size of the institutions, with an average of 200-300 theses per year;
librarians mainly advocated depositing in the IR; workflow procedures were taken in charge mainly by
librarians, with the cooperation of administrative staff;
ETD 2008Aberdeen, 4th – 7th June 27
Academic Senates (table 2)
only a few Academic Senates have officially mandated the deposit of doctoral theses in IRs;50% put a mandate on the deposit only, 50% put a mandate on OA availability
no mandate50%
mandate on deposit
25%
mandate on public
availability25%
ETD 2008Aberdeen, 4th – 7th June 28
E-theses and IRs in Italy:work in progress (1)
Last January the OA WG carried out a survey amongst Italian Universities. Here are the main outcomes of the questionnaire:
25 Universities were collecting or about to collect electronic doctoral theses in IRs, mostly to make them OA available (table 1);
Resolutions of Academic Senates (table 2); the local output ranges from 50 to 500-800 doctoral theses
per year, according to the size of the institutions, with an average of 200-300 theses per year;
librarians mainly advocated depositing in the IR; workflow procedures were taken in charge mainly by
librarians, with the cooperation of administrative staff;
ETD 2008Aberdeen, 4th – 7th June 29
E-theses and IRs in Italy:work in progress (2)
DSpace and Eprints are the most popular software tools. IRs are in most cases integrated with other databases, particularly administrative databases, authentication systems, research archives, OPACs, cross-search utilities;
self-archiving is a common ingestion procedure; in most cases however librarians deposit e-theses;
embargo is usually allowed for periods ranging from six months to three years. Twelve months is the standard embargo period for almost 50% of the Universities. The length of the embargo is very rarely left to the choice of PhD students. In all cases they are required to sign a declaration stating the reasons for the embargo. In all cases metadata are immediately searchable and retrievable (table 3);
services added to basic e-theses management (print on demand, legal deposit, statistical services, legal advice for users, permanent preservation) are supplied by 50% of the institutions; 50% is planning to implement them;
ETD 2008Aberdeen, 4th – 7th June 30
Embargo (table 3)
IRs and embargo
allowed 77%
in the future…
15%
not allowed8%
ETD 2008Aberdeen, 4th – 7th June 31
E-theses and IRs in Italy:work in progress (2)
DSpace and Eprints are the most popular software tools. IRs are in most cases integrated with other databases, particularly administrative databases, authentication systems, research archives, OPACs, cross-search utilities;
self-archiving is a common ingestion procedure; in most cases however librarians deposit e-theses;
embargo is usually allowed for periods ranging from six months to three years. Twelve months is the standard embargo period for almost 50% of the Universities. The length of the embargo is very rarely left to the choice of PhD students. In all cases they are required to sign a declaration stating the reasons for the embargo. In all cases metadata are immediately searchable and retrievable (table 3);
services added to basic e-theses management (print on demand, legal deposit, statistical services, legal advice for users, permanent preservation) are supplied by 50% of the institutions; 50% is planning to implement them;
ETD 2008Aberdeen, 4th – 7th June 32
E-theses and IRs in Italy:work in progress (3)
libraries grant financial support in 50% of the Universities; research and/or other institutional units contribute to maintain 50% of IRs;
the workflow relies on the interaction between librarians, computer specialists, administrative staff, and tutors. Seldom are librarians the only actors involved;
half IRs are dependent on outsourcing contracts, half are managed with internal resources;
all Universities would appreciate nation-wide interoperability, namely on syntactic (federated search on multiple archives) and semantic (multilingual, discipline and subject search) bases. Semantic interoperability is considered to be very hard to achieve, though. A dedicated Italian harvester would be top-priority.
ETD 2008Aberdeen, 4th – 7th June 33
E-theses and IRs in Italy:work in progress (4)
According to the surveyed institutions, the following issues should be tackled at a European level to enhance the value of electronic doctoral theses (table 4):
widespread archiving programmes; common standards; a European portal/network for doctoral theses; ongoing advocacy; joint international projects; joint participation to international conferences; equivalent systems of higher education in Italy and in Europe; economic, legal and technical support; unlocking the potential of PhD research; networking with publishers.
ETD 2008Aberdeen, 4th – 7th June 34
Issues to be tackled at a European level… (table 4)
0 0,5 1 1,5 2 2,5
w idespread archiving programmes
a European portal/netw ork for doctoral theses
ongoing advocacy
joint international projects
equivalent systems of higher education in Italy and in Europe
joint partecipation to international conferences
unlocking the potential of PhD research
netw orking w ith publishers
common standards
economic support
legal support
technical support
ETD 2008Aberdeen, 4th – 7th June 35
Case study (1) - Bolognahttp://amsdottorato.cib.unibo.it/
Not less than 650/800 doctoral theses per year. 2006: launch of the project aiming at setting up an OAI-
PMH compliant IR in order to collect, organize and provide access to the doctoral theses produced at the University of Bologna.
Now the repository stores, indexes and provides access to all the theses defended in 2007 and 2008, and is harvested by the BNCF for legal deposit.
The project has not evolved into a fully legitimate institutional procedure yet.
The various scientific communities have reacted to this indication differently, according to the disciplines, Impact Factor heavily influences research outputs evaluation (young researchers transform their doctoral theses in their first monographic publication).
ETD 2008Aberdeen, 4th – 7th June 36
Case study (2) - Bologna
Bologna University has not put a true mandate on OA publication for doctoral theses yet (but only a small number of documents are actually not accessible at all and not merely embargoed).
A statistical analysis shows a good acceptance of OA. Out of 565 already published theses, 7% only are subjected to
availability restrictions. The most common reasons for restricting the availability of their
theses proves once again how the commercial publishing system is deeply rooted in the academic domain.
OA is not actually rejected, it is rather considered to be a sort of second best choice.
Need to focus on the different needs of the various research communities.
Find better and new ways to make the repository more appealing in order to transform it into a real service supporting research.
ETD 2008Aberdeen, 4th – 7th June 37
Case study (3) - Triestehttp://www.openstarts.units.it Roughly 200 doctoral theses per year. 2006: OpenstarTs started as an institutional project managed by the
Library System grant theses higher visibility and maximize their impact by
depositing them in an OAI-PMH compliant repository Sustainability was a major constraint (lack of human resources). The project could cover only start-up expenses, namely the technical
support of an external consultant; it was therefore crucial to take a “lean” approach through full cooperation with the PhD department and interoperability with existing databases.
Integrated with LDAP for authentication and with the Registrar’s Department data warehouse for the relevant metadata.
2007: the system was tested by PhD volunteer students, who only upload their PDF.
User friendliness of the procedure appreciated: only abstracts and keywords were entered, all the other metadata had already been entered (and validated) via the Registrar’s Department data warehouse and subsequently mapped in the repository.
ETD 2008Aberdeen, 4th – 7th June 38
Case study (4) - Trieste
2008: self archiving doctoral theses in the repository was made compulsory as part of the requirements for defence and for the award of a PhD degree. Post graduate students were allowed to opt for a one-year embargo and asked to specify the reason for their request.
The 2008 output (181 doctoral theses) will soon be harvested by the National Library of Florence.
ETD 2008Aberdeen, 4th – 7th June 39
Conclusion (1) The aim of the OA WG was to publish the
Guidelines, as a reference tool for the Universities planning to deposit doctoral theses in IRs and to make them OA available.
The goal was the dissemination of doctoral theses, given their importance as research outputs and their lack of visibility on the web and consequent lost impact.
The Guidelines have been greatly appreciated by all the surveyed institutions. In all cases, the Guidelines represent a milestone for putting into effect a national common strategy on OA.
The mission of the OA WG has not been accomplished yet. Pervasive advocacy is needed to expand existing IRs and neutralize resistance to change.
ETD 2008Aberdeen, 4th – 7th June 40
Conclusion (2)
It is important to keep constantly and closely in touch with other European and international working groups to monitor their progress and keep up with their activities.
Establish value added services for this kind of materials, at European level too: syntactic and semantic interoperability, the use of common standards, a dedicated Italian harvester.
It will also be vital to instruct PhD students on their rights as authors and on the correct use of third party materials.
Last but not least, an exchange of ideas with archivists on the preservation and certification of doctoral theses is top priority.
ETD 2008Aberdeen, 4th – 7th June 41
Acknowledgements
Stefania Arabito, Paola Galimberti, Marialaura Vignocchi.
OA-WG CRUI - Prof. R. Delle Donne Insubria University, SiBA, Dr. A. Bezzi, Prof.
A. Sdralevich.
Thank you!Thank you!
[email protected]@uninsubria.itbria.it