the australian newspapers digitisation program: helping communities access and explore their...

41
1 Helping communities access and explore their newspaper heritage. Rose Holley – Manager Newspaper Digitisation Program http://www.nla.gov.au/ndp [email protected] Australian Media Traditions Conference 23 November 2007, Charles Sturt University, Bathurst

Upload: rose-holley

Post on 14-May-2015

814 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

1

Helping communities access and explore their newspaper heritage.

Rose Holley – Manager Newspaper Digitisation Program http://www.nla.gov.au/ndp [email protected]

Australian Media Traditions Conference23 November 2007, Charles Sturt University, Bathurst

Page 2: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

2

Status of the ProgramNovember 2006 Minister for Arts and Sports approval

Budget approval -$8 million for 3 million pages over 4 years

Contracts signed with digitisation suppliers

April 2007 program pilot phase commences

Page 3: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

3

Content and CoverageNational Content

Initially a title from each state

Focus on major titles from each state first

Anticipated that ‘regional’ titles may be contributed later

Coverage: published between 1803 – 1954

(out of copyright)

West Australian

Northern Territory Times

Courier Mail

Advertiser Sydney Gazette

Argus

Mercury

Canberra Times

Page 4: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

4

First Newspaper

• First page of first Australian newspaper ever published

The Sydney Gazette and New South Wales AdvertiserSaturday March 5 1803

Page 5: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

5

Through 150 years

• Up to 1954 (when Copyright applies), and later if agreement with publishers.

The Argus 22 August 1945

Page 6: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

6

Relationship - ANPLANWebsite: http://www.nla.gov.au/anplan/

Page 7: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

7

Keep Up to Date with Progress• Website: http://www.nla.gov.au/ndp/

Page 8: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

8

National Help• NLA working with State and Territory

Libraries as part of ANPLAN. • Libraries suggest titles and dates and

provide microfilm for digitising.• ANPLAN members and other stakeholders

will provide feedback on the search and delivery prototype.

• Developing model for national contribution of regional newspapers.

Page 9: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

9

Process in briefNational sourcing of selected newspaper microfilm

masters.

Masters scanned by Contractor, Sydney to tiff files.

NLA perform quality assurance, add metadata.

Contractor, India process tiff files - OCR, zoning, xml markup.

NLA QA files, ingest to system, create derivatives for delivery.

Page 10: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

10

LogisticsAustralia (State Capitals – Sydney/Canberra) USA (Virginia) - India (Hyderabad, Chennai)

Page 11: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

11

6 Month Progress• IT Infrastructure and storage implemented at NLA

• Content management and ingest software developed by NLA to support workflow

• Quality assurance and production software developed by US/India contractor

• Pilot data sent to contractors to test workflows, systems and software against agreed project spec.

Page 12: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

12

Next 6 months• Acceptance of pilot data then commence

production phase (3 million pages)

• Development of search and delivery prototype

• Public launch of service with a good body of content in 2008

• Progressive addition of content – national program ongoing

Page 13: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

13

Technology – internal NLAOld newspapers being processed and delivered

using latest digital technology

• NLA developing in house:– Ingest and storage system– Workflow and content management system including

quality assurance module– Search and delivery system

• NLA providing:– System Infrastructure

(storage, backup, disaster recovery)

Page 14: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

14

Infrastructure and Storage

Online Storage – 70 TB:• Working space for images in processing 40TB for 1 million pages• Search and delivery derivatives 30TB for 3 million pages• XML files, database systems and indexes 1 TB

Offline Storage – unlimited for master images on tape.

Page 15: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

15

Establishing Workflows

Page 16: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

16

Technology - external

• Scanning microfilm using Flexscan/Eclipse scanner and latest software (nextstar) from NextScanwww.nextscan.com

20,000 pages a week.

Page 17: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

17

Scanning Contractor

Page 18: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

18

Digital Images returned to NLA

Page 19: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

19

Quality Assurance at NLA

Use 2 widescreen monitors placed vertically. Can view complete page within context of issue.

Add metadata, sort out missing and duplicate pages within an issue.

Prepare batches to send for OCR.

Page 20: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

20

Metadata

Page 21: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

21

Page verification

Page 22: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

22

Page 23: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

23

Technology - external

Software developed to:• Zone areas and articles on a page • Flag continuing articles across multiple pages• Categorise articles on a page• OCR text on a page• Re-key headings and first 4 lines of text.• Deliver XML files (ALTO) and METS/MODS

files.

Page 24: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

24

India Facility - Hyderabad

Page 25: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

25

Page 26: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

26

Quality Assurance

Page 27: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

27

OCR Accuracy

Page 28: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

28

Batch reporting

Page 29: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

29

Acceptance Criteria

Page 30: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

30

Prototype Development

Under discussion:• Derivative sizes and zoom technology

testing• Search and Browse features• Results and refinement of results• User interaction with source (web 2.0)• Interface design

Page 31: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

31

Digital Newspaper Searching

• Newspapers full text searchable• Image captions searchable• Search across multiple papers e.g. by

persons name.• Refine searching by:

– Date– Newspaper title– State published

Page 32: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

32

Refine search by categories

• News• Advertising• Birth Death Marriage notices• Obituaries• Editorial commentary and letters• Shipping News• Arts and leisure• Detailed lists, results, guides

Page 33: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

33

Search Illustrations

Categorised as:• Photo• Cartoon• Map• Graph• IllustrationCaptions searchable

Canberra Times 26 July 1928 page 6

Page 34: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

34

Browsing and Viewing

• Browse papers page by page• Zoom in and out of image

– to read small text– to view context of article within page layout

• Print article or entire page or issue

Page 35: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

35

Zoom technology

Page 36: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

36

Testing derivative sizes and zooming

Page 37: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

37

Prototype wireframe

Page 38: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

38

Other features

Under discussion:• OCR correction by users• Personal annotation of articles by users• Tagging results• Creating public sets (for historical events)• Clustering results• Searching across other relevant resources (paid

subscription services, international resources, other digital resources)

Page 39: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

39

Prototype release

• To be released to stakeholders who have given microfilm content

• Stakeholders able to view their data• Feedback on data quality and search

functionality• Amendments made and then ‘search and

delivery version 1’ released to a wider group for testing and feedback before public launch in 2008.

Page 40: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

40

Pilot Data• Canberra Times• Sydney Gazette• Northern Territory Times• South Australia Advertiser• Hobart Town Gazette, Courier, Colonial, Mercury• Melbourne Argus• Perth Gazette• West Australian• Brisbane Courier Mail(12 titles, 8000 issues = 50,000 pages = 500,000 articles)

Page 41: The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

41http://www.nla.gov.au/ndp