october 1, 1999 two catalysts for qualitative change richard snodgrass

38
October 1, 199 9 Two Catalysts for Qualitative Change Richard Snodgrass

Post on 19-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

October 1, 1999

Two Catalysts for Qualitative Change

Richard Snodgrass

October 1, 1999 SGB Meeting

Richard T. Snodgrass 2

Location

• City and State, 2000 BCE

• Longitude, 1773 CE

• GPS + cell phone, 1999 CE

October 1, 1999 SGB Meeting

Richard T. Snodgrass 3

Confluences

• Underlying technologies– Highly accurate atomic clocks– Geosynchronous satellites– Advances in micro-circuitry– Proliferation of cell phones

• Demonstrated need

• Catalyst: companies able to produce in quantity at low price

• Qualitative change

October 1, 1999 SGB Meeting

Richard T. Snodgrass 4

The Vision

The ACM Computing Portal

• A web-based repository of bibliographic information

– contains information on all papers and books in the computing literature

– contains a pointer to the digitized version, if available

October 1, 1999 SGB Meeting

Richard T. Snodgrass 5

Objectives

• Qualitatively increase the effectiveness of scientific research into computing

• Continue to place ACM as the premier scientific and educational organization for computing

• Increase service of ACM and the SIGs to the scientific community

• Provide a concrete illustration of the scope of computer science

October 1, 1999 SGB Meeting

Richard T. Snodgrass 6

Presentation

• Components– Bibliographic Entries– Abstracts and Keywords– Full Text and Bitmapped Images– Citation Linking

• Demonstration

• Realizing the Computing Portal– Revisit the components

• The Next Step

October 1, 1999 SGB Meeting

Richard T. Snodgrass 7

Step 1: Bibliographic Entries

• Collect all bibliographic entries from all computer science journals, conferences, workshops, technical bulletins, and books.

– Over the period from 1940 to 2000, then continuing

– Approximately 1M entries

– Provide free searching on the web.

– Provide citations in multiple formats: HTML, BiBTeX, refer, Word, XML, ...

October 1, 1999 SGB Meeting

Richard T. Snodgrass 8

Step 2: Abstracts and Keywords

• Collect keywords, and later, abstracts, for all entries.

• Copyright restrictions on some abstracts?

October 1, 1999 SGB Meeting

Richard T. Snodgrass 9

Step 3: Full Text and Images

• Collect full text of each available paper and book for

– use in searching

– to develop classification maps and lexicons

– other analyses

October 1, 1999 SGB Meeting

Richard T. Snodgrass 10

Step 3, cont.

• Encourage acquisition of digitized version of each paper in web-accessible digital libraries (e.g., the ACM DL)

– Collect bit-mapped image of each page of each paper to retain formatting, equations, and figures.

– Each paper can then be reproduced as an exact copy.

– Can provide structure on full text

• sections, figures, citations in running prose

October 1, 1999 SGB Meeting

Richard T. Snodgrass 11

Step 4: Citation Linking

• Start with full text of paper’s bibliography.

• Out linking: identify bibliographic entry of papers referenced by the paper

• In linking: identify bibliographic entries of papers referencing the paper

• Use for citation analysis, knowledge diffusion studies

October 1, 1999 SGB Meeting

Richard T. Snodgrass 12

October 1, 1999 SGB Meeting

Richard T. Snodgrass 13

Demonstration

October 1, 1999 SGB Meeting

Richard T. Snodgrass 14

Papers with “wavelet”

October 1, 1999 SGB Meeting

Richard T. Snodgrass 15

October 1, 1999 SGB Meeting

Richard T. Snodgrass 16

October 1, 1999 SGB Meeting

Richard T. Snodgrass 17

October 1, 1999 SGB Meeting

Richard T. Snodgrass 18

October 1, 1999 SGB Meeting

Richard T. Snodgrass 19

October 1, 1999 SGB Meeting

Richard T. Snodgrass 20

October 1, 1999 SGB Meeting

Richard T. Snodgrass 21

October 1, 1999 SGB Meeting

Richard T. Snodgrass 22

October 1, 1999 SGB Meeting

Richard T. Snodgrass 23

INSPEC

October 1, 1999 SGB Meeting

Richard T. Snodgrass 24

October 1, 1999 SGB Meeting

Richard T. Snodgrass 25

Some Numbers

• 5300

• 10

• 13.6

• 290

• 377

Years remaining of lifetime for the average SIG

$ per member (over required fund balance)

$M total SIG fund balance (over required)

$K per SIG fund balance (over required)

SIG members lost last year (52.1K 46.8K, > 10%)

October 1, 1999 SGB Meeting

Richard T. Snodgrass 26

Step 1: Bibliographic Entries

• Propose that each SIG be responsible for ensuring correctness of relevant entries.

• relevance based on SIG interests

• reduce overlap between SIGs

• Software for provided to SIGs– data entry, validation, conversion

– presentation (HTML, BiBTex, …, XML)

– searching

– precomputed lists (e.g., bibliographic home page for every author)

October 1, 1999 SGB Meeting

Richard T. Snodgrass 27

Stage 1: Bibliographic Entries

• 1M entries / 36 SIGs = 30K entries per SIG– e.g., SIGMOD: approximately 50K entries

• Many resources– DBLP: 2^17 (130K) entries

– Propose that ACM donate the ACM Guide to Computing Literature: 300K entries

– Collection of Computer Science Bibliographies: 930K entries

October 1, 1999 SGB Meeting

Richard T. Snodgrass 28

Step 2: Keywords and Abstracts

• May need copyright permission, negotiated by ACM HQ

• Collection of CS bibliographies has 100K abstracts

October 1, 1999 SGB Meeting

Richard T. Snodgrass 29

Step 3: Full Text and Bitmapped Images

• Full text is used for searching and citation linking in the Computing Portal.

• Bit-mapped images, stored in a Digital Library, is used to display and print actual paper.

• Propose SIGs fund populating entire ACM Digital Library.– PDF files containing encapsulated TIFF and OCRed full text

– 99% accuracy

– $1.25 per page

– Could go to SGML or XML, 99.9% accuracy: $8-$10 per page.

October 1, 1999 SGB Meeting

Richard T. Snodgrass 30

Populating ACM DL

• 1991-1998 already in DL

• Journals: about 110K pages

• Conferences– 1985-1990: 76K pages

– pre-1985: about 200K pages

• Newsletters– 120K pages

• Total: 500K pages at $600K– $20K per SIG

October 1, 1999 SGB Meeting

Richard T. Snodgrass 31

Step 3: Full Text, cont.

• ACM papers: 500K pages, or about 40K papers– This represents perhaps 5% of total of 1M papers.

• For remaining conference proceedings and journals– Offer URL into their DL in exchange for full text, only for searching

• ACM Computing Portal provides valuable entry into their DL, enhancing their revenue stream.

– Offer full CD Rom package at cost in exchange for inclusion in CD Rom and use of full text for searching.

– Pay for digitization out of conference profits

– SIGs pay for integration: $0.25 - $0.50 per page.

October 1, 1999 SGB Meeting

Richard T. Snodgrass 32

Step 3: Full Text, cont.

• Use standard IR indexing and search techniques on full text.

• Partner with DL and IR research efforts to come up with new search strategies.

• Search software provided to each SIG

October 1, 1999 SGB Meeting

Richard T. Snodgrass 33

Step 4: Citation Linking

• Manual out-linking– about $5-$6 per paper, or $0.30 per page of digitized text

• Can be done semi-automatically for much less, if the appropriate linking software is developed

• In-linking is simply a database search.

• All bibliographic entries must be present.

October 1, 1999 SGB Meeting

Richard T. Snodgrass 36

Previous Efforts

• SIGDA CD Rom Project– 9 CD Roms

– $1.5M project

– SGML, proprietary display software on CD Rom

• POPL CDRom– 10 years of POPL, given out as a SIGPlan member benefit

– PDF files

• Many conferences distribute CD-ROMs of papers

October 1, 1999 SGB Meeting

Richard T. Snodgrass 37

Previous Efforts, cont.

• SIGMOD Anthology– 10 CD Roms (later 1-2 DVD Roms), $105K

– SIGMOD, PODS, KDD, VLDB, ICDE, SSDBM, COMAD, ...

– SIGMOD Record, Data Engineering Bulletin

– TODS, VLDB Journal

– Given out as member benefit

• SIGMOD DiSC yearly CD-ROM– 1999: 2 CD-ROMs, about $30K per year

– all relevant conferences and workshops for that year, ancillary material, such as powerpoint presentations, audio, video

– Given out as a member benefit (Consumer Reports model)

October 1, 1999 SGB Meeting

Richard T. Snodgrass 40

SGB Portal Committee

• Rick Snodgrass (University of Arizona, CS), chair

• Steve Cunningham (Cal State University-Stanislaus, CS)

• Carol Hutchins (Courant Institute of Math. Sci. Library)

• Bob Krovetz (NEC Research Institute)

• Michael Ley (University of Trier, CS)

• Andreas Paepcke (Stanford University)

• Kathy Preas (KP Pubs on CDROM)

• Bernie Rous (ACM Publications)

• Charles Viles (Univ. of North Carolina, Info and Lib Sci)

October 1, 1999 SGB Meeting

Richard T. Snodgrass 45

The ACM Computing Portal

• Free searchable access to the entire computer science corpus

• Links to a fully populated ACM DL and to other DLs

• Capability to purchase papers and to register queries

• Possibly ancillary SIG-provided benefits, such as CD-ROMs and SIG-specific portals

October 1, 1999 SGB Meeting

Richard T. Snodgrass 46

Confluences

• Underlying technologies– Inexpensive scanning, OCR, disk space, high capacity CD-ROM

and DVD-ROM, and widely available www access

• Demonstrated need

• Catalysts: SIG Governing Board, ACM Council, ACM Publications Board, HQ staff

• Qualitative change