web science and web archive research @ l3s wolfgang nejdl l3s research center hannover, germany
TRANSCRIPT
Web Science andWeb Archive Research
@ L3S
Wolfgang Nejdl
L3S Research CenterHannover, Germany
L3S @ Hannover
Computer Science and interdisciplinary research on all aspects of the Web
Internet: Communication and Networks
Information: Accessing information and knowledge on and through the Web
Community: Supporting communities and groups on the Web, for research, education, production and entertainment
Society: Requirements (technological, social, legal) for the Web
Selected projects
Web Science @ L3S
LivingKnowledge: Diversity, opinion and
bias on the Web
CUbRIK: Searching by computers and humans
Glocal: Event-based Searchfor Networked Media
Privacy and clinical research
Arcomem: Social Web & Archiving
ForgetIT: Concise Preservation via
Managed Forgetting
Spam
Attack on Copts
Gun running from Sudan
Are we loosing the past of the web?
Are we loosing the past of the web?Library of Congress
In April 2010 LoC and Twitter signed an agreement to archive all tweets since 2006
January 2013: It is clear that technology to allow for scholarship access to large data sets is lagging behind technology for creating and distributing such data. The Library is now pursuing partnerships to allow some limited access capability in reading rooms.
German National Library Based on a law of June 22, 2006, the GNL should collect, enrich, catalog
and archive Web publicationsInternet Archive
Archiving the Web (3 Petabyte) since 1996 Access possible through the URL
National Archives in Denmark, Portugal, etc.Relevant Projects @ L3S
Web Archiving: LiWA, ARCOMEM, ForgetIT Web Search: PHAROS, CUBRIK Web Analysis: EUMSSI ERC Advanced Grant: ALEXANDRIA (2014 – 2018, 2.5 Mill. Euro)
Cooperations German National Library, British Library, Internet Archive, Rutgers
University, et al
ERC Grant ALEXANDRIA: Temporal Information Retrieval, Exploration and Analytics in Web Archives
ALEXANDRIA Test Beds
Temporal Wikipedia English, German, Italian Wikipedia with all revisions Links to news archives (NYTimes, Times, Zeit) and web content Entity extraction and evolution, time and entity aware retrieval
Academic Web Archive Academic content in Germany and UK BibSonomy and FreeSearch/DBLP data Time-aware entity extraction and linking, collaborative exploration
and analytics
Politics on the Web Political web sites: German and UK Web content (together with the
British Library, German National Library and Internet Archive), Stanford US collections, new crawls, blogs, social media
Social stream aggregation, collaborative analytics, as well as the other research questions
Web Observatory and eHumanities
Multidisciplinary Research Questions:• How to decide which Web content to capture, in order to enable relevant
analysis by the eHumanities? How to document the selection and collection process?
• How can combining distributed Web Observatories help to cover multiple perspectives, disciplines and tasks (for selection)?
• How does the Web influence collective and individual remembering and language? How to systematically capture Web evolution and the evolution of observed processes and social realities?
• What are relevant multidisciplinary methods for a comprehensive analysis of Web content and the (changing) social realities reflected by it?
• How to deal with legal, commercial and privacy aspects of Web Archiving?
Collective remembering &collective memory
in the Web Age„Web Memory / Archive“
Web as reflection of social processes and practices,
language, culture„Web (Archive) as Memory“
Web Observatory with focus on eHumanities
„Web Gedächtnis“