dirk pieper/friedrich summann bielefeld ul
DESCRIPTION
Bielefeld Academic Search Engine (BASE): an End-user Oriented Institutional Repository Search Service. Dirk Pieper/Friedrich Summann Bielefeld UL. Part 1: Institutional Repository Servers BASE: concept and content Creating a special view on institutional repository server collections - PowerPoint PPT PresentationTRANSCRIPT
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Bielefeld Academic Search Engine (BASE):
an End-user Oriented Institutional Repository Search Service
Dirk Pieper/Friedrich Summann
Bielefeld UL
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Part 1:Institutional Repository ServersBASE: concept and contentCreating a special view on institutional repository server collectionsDemo: BASE user-interface and further visions
Part 2:OAI dataflow, BASE dataflowRepository information in registriesOAI harvesting problemsFurther developments of BASE
Overview:
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Definition: “A digital collection capturing and preserving the intellectual output of a single or multi-university community.” (Raym Crow, http://www.arl.org.sparc/IR/ir.html)IR servers exist of course also outside the university community IR servers appear as simple web sites, database systems with OAI interface, …
Institutional Repository Servers:
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
BASE uses Fast Data Search BASE contains intellectual selected resources with focus on OAI-Servers but also web crawled contentBASE displays result lists as bibliographic data and full text hitsBASE frontend is written in PHP using the search API from Fast Data SearchBASE offers sorting, search refinement and search history
BASE: concept and content
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Search API
Pipeline
QU
ERY &
RESU
LTPR
OC
ESSINGDO
CU
MEN
TPR
OC
ESSING
Pipeline
Pipeline
FILETRAVERSER
FILTER
SEARCH
INDEXFILES
CO
NN
ECTO
RS
TUNING, ADMINISTRATION and DEBUGGING
WEBCRAWLER
BASE: concept and content
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
BASE: concept and content At present 2,7 mio documents in 189 collections,
15 of them web crawled data
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Projekt Gutenberg-DE
Internet Library of Early Journals Oxford
Various Institutional Repositories
Springer Link Metadata
Cornell HistMath Fulltext Crawl
University Michigan Historical Math
CiteSeer Zentralblatt Mathematik
Bielefeld Univ: Math. Preprints
ArXiv OPAC UL Bielefeld
Ifo Institute Munich
Zeitschriften der Aufklärung (Bielefeld UL)
BASE: concept and content
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Special view on IR server collections Collections are listed in configuration file
[ftubirmingham]url = "http://eprints.bham.ac.uk/"desc_de = "The Univ. of Birmingham: Eprints Archive"desc_en = "The Univ. of Birmingham: Eprints Archive"descdd_de = "Birmingham Univ."descdd_en = "Birmingham Univ."
Collections can be clustered for user-interface, e.g. “Institutional Repositories Europe” consists of [ftubarcelona], [ftubath], [ftubristol] , [ftuhelsinki], …
Parametric search possible
Frontend is ready for multi view (independent views with own configuration and layouts on the same backend)
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Try your search on Google Scholar ...
Vision: search in Google Scholar
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Check citations (citing articles) in Google
Scholar ...
Vision: check citations in Google Scholar
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
OAI-Data
Harvesting
BASE Internal Index (FAST)
OPAC
Article Database
Dissertations,monographs
(fulltext)
Articles(fulltext)
PubMed, Euclid,ArXiv, CiteSeer,
Citebase, DOAJ articles
All ressources(texts, images,
video,references ....
OAI dataflow at Bielefeld UL
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
OAI-Data Web PagesDatabaseRecords
Harvesting Pre-Processing
Processing
Internal Index (FAST)
User interface (PHP)
BASE dataflow
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Eprints Registry (607)
Openarchives.org (383)
DSpace Registry (28)
Directory of Open Archive Repositories (324)
Univ. of Illinois Registry (1000)
Repository information in registries
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
2
1612
12
5514
6
33
4
2
18
17
3
3
USA 76Canada 13South America 2Africa 2 India 3Australia 11New Zealand 1
3
OAI-compliant univ. repositories in BASE
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
OAI Registry Watcher(Bielefeld UL, Perl)
Open Source Harvester (FS Consulting, Perl with modifications) XML Validator and Repairer
(Bielefeld UL, based on Perl XML modules
OAI Harvest Watcher(Bielefeld UL, Perl)
OAI Resource Updater(Bielefeld UL, Perl)
Tools for the Harvesting Environment
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Repositories do not response or deliver Error Messages
Data contain only References without any Fulltext
Links to the Document do not work
Access to fulltext is restricted
XML file is not well-formed
Field content varies
OAI harvesting challenges
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es <source>http://xxx.xxx.uni-xxxxx.de/publications/
ELibD905_diplom_allnoch.pdf</source>
<dc:creator>Barry Wellman,Jeffrey Boase,Kakuko Miyata</dc:creator> <dc:subject>Barry Wellman,Jeffrey Boase,Kakuko Miyata The Mobile-izing ....</dc:subject>
<dc:title>Talk P. Bruzzone</dc:title> <dc:creator>Bruzzone </dc:creator> <dc:creator>Pierluigi</dc:creator>
Reproductive Biology and Endocrinology 2004, 2:52 doi:10.1186/1477-7827-2-52
<dc:date>2004-07-05</dc:date> <dc:type>Review </dc:type><dc:identifier>http://www.rbej.com/content/2/1/52</dc:identifier>
OAI Harvesting: Problems in Practice 1
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
EN: 9910ENG: 771En: 566Eng: 1English: 24084English (United States): 63English and Greek: 1English and Russian: 1English/Japanese: 1English; Russian: 1English=en: 1Translation into English: 2
en: 1279115en-CA: 865en-US: 3en-es: 5en-us: 8en;: 2en_UK: 618en_US: 18456eng: 186787eng : 92eng + dut: 2eng;: 17eng; fre; ger;: 141 ....
OAI Harvesting: Problems in Practice 2- Variations of <dc:language>
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Standard repository software is great - for OAI harvesting as well
Small collections – small problems
Getting the related fulltext is complicated
Libraries produce better metadata
Data aggregation may produce problems
Writing e-mails helps - sometimes
Some Rules from Harvesting Practice
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Search form (working)
HTTP calls (working)
Web Service (in development)
Federated Search (Vascoda) (in discussion)
Further Developments: BASE Interfaces
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
<form action="http://www.base-search.net/index.php" method="post" accept-charset="UTF-8"> <input maxlength="512" name="q" type="text" size="50" /> <input value="Search!" type="submit" /> <input value="all" name="s" type="hidden" /></form>
Local Integration: Search Form
BA
SE:
Inst
ituti
on
al R
ep
osi
tori
es
Thank you!