october 1, 1999 two catalysts for qualitative change richard snodgrass
Post on 19-Dec-2015
220 views
TRANSCRIPT
October 1, 1999 SGB Meeting
Richard T. Snodgrass 2
Location
• City and State, 2000 BCE
• Longitude, 1773 CE
• GPS + cell phone, 1999 CE
October 1, 1999 SGB Meeting
Richard T. Snodgrass 3
Confluences
• Underlying technologies– Highly accurate atomic clocks– Geosynchronous satellites– Advances in micro-circuitry– Proliferation of cell phones
• Demonstrated need
• Catalyst: companies able to produce in quantity at low price
• Qualitative change
October 1, 1999 SGB Meeting
Richard T. Snodgrass 4
The Vision
The ACM Computing Portal
• A web-based repository of bibliographic information
– contains information on all papers and books in the computing literature
– contains a pointer to the digitized version, if available
October 1, 1999 SGB Meeting
Richard T. Snodgrass 5
Objectives
• Qualitatively increase the effectiveness of scientific research into computing
• Continue to place ACM as the premier scientific and educational organization for computing
• Increase service of ACM and the SIGs to the scientific community
• Provide a concrete illustration of the scope of computer science
October 1, 1999 SGB Meeting
Richard T. Snodgrass 6
Presentation
• Components– Bibliographic Entries– Abstracts and Keywords– Full Text and Bitmapped Images– Citation Linking
• Demonstration
• Realizing the Computing Portal– Revisit the components
• The Next Step
October 1, 1999 SGB Meeting
Richard T. Snodgrass 7
Step 1: Bibliographic Entries
• Collect all bibliographic entries from all computer science journals, conferences, workshops, technical bulletins, and books.
– Over the period from 1940 to 2000, then continuing
– Approximately 1M entries
– Provide free searching on the web.
– Provide citations in multiple formats: HTML, BiBTeX, refer, Word, XML, ...
October 1, 1999 SGB Meeting
Richard T. Snodgrass 8
Step 2: Abstracts and Keywords
• Collect keywords, and later, abstracts, for all entries.
• Copyright restrictions on some abstracts?
October 1, 1999 SGB Meeting
Richard T. Snodgrass 9
Step 3: Full Text and Images
• Collect full text of each available paper and book for
– use in searching
– to develop classification maps and lexicons
– other analyses
October 1, 1999 SGB Meeting
Richard T. Snodgrass 10
Step 3, cont.
• Encourage acquisition of digitized version of each paper in web-accessible digital libraries (e.g., the ACM DL)
– Collect bit-mapped image of each page of each paper to retain formatting, equations, and figures.
– Each paper can then be reproduced as an exact copy.
– Can provide structure on full text
• sections, figures, citations in running prose
October 1, 1999 SGB Meeting
Richard T. Snodgrass 11
Step 4: Citation Linking
• Start with full text of paper’s bibliography.
• Out linking: identify bibliographic entry of papers referenced by the paper
• In linking: identify bibliographic entries of papers referencing the paper
• Use for citation analysis, knowledge diffusion studies
October 1, 1999 SGB Meeting
Richard T. Snodgrass 25
Some Numbers
• 5300
• 10
• 13.6
• 290
• 377
Years remaining of lifetime for the average SIG
$ per member (over required fund balance)
$M total SIG fund balance (over required)
$K per SIG fund balance (over required)
SIG members lost last year (52.1K 46.8K, > 10%)
October 1, 1999 SGB Meeting
Richard T. Snodgrass 26
Step 1: Bibliographic Entries
• Propose that each SIG be responsible for ensuring correctness of relevant entries.
• relevance based on SIG interests
• reduce overlap between SIGs
• Software for provided to SIGs– data entry, validation, conversion
– presentation (HTML, BiBTex, …, XML)
– searching
– precomputed lists (e.g., bibliographic home page for every author)
October 1, 1999 SGB Meeting
Richard T. Snodgrass 27
Stage 1: Bibliographic Entries
• 1M entries / 36 SIGs = 30K entries per SIG– e.g., SIGMOD: approximately 50K entries
• Many resources– DBLP: 2^17 (130K) entries
– Propose that ACM donate the ACM Guide to Computing Literature: 300K entries
– Collection of Computer Science Bibliographies: 930K entries
October 1, 1999 SGB Meeting
Richard T. Snodgrass 28
Step 2: Keywords and Abstracts
• May need copyright permission, negotiated by ACM HQ
• Collection of CS bibliographies has 100K abstracts
October 1, 1999 SGB Meeting
Richard T. Snodgrass 29
Step 3: Full Text and Bitmapped Images
• Full text is used for searching and citation linking in the Computing Portal.
• Bit-mapped images, stored in a Digital Library, is used to display and print actual paper.
• Propose SIGs fund populating entire ACM Digital Library.– PDF files containing encapsulated TIFF and OCRed full text
– 99% accuracy
– $1.25 per page
– Could go to SGML or XML, 99.9% accuracy: $8-$10 per page.
October 1, 1999 SGB Meeting
Richard T. Snodgrass 30
Populating ACM DL
• 1991-1998 already in DL
• Journals: about 110K pages
• Conferences– 1985-1990: 76K pages
– pre-1985: about 200K pages
• Newsletters– 120K pages
• Total: 500K pages at $600K– $20K per SIG
October 1, 1999 SGB Meeting
Richard T. Snodgrass 31
Step 3: Full Text, cont.
• ACM papers: 500K pages, or about 40K papers– This represents perhaps 5% of total of 1M papers.
• For remaining conference proceedings and journals– Offer URL into their DL in exchange for full text, only for searching
• ACM Computing Portal provides valuable entry into their DL, enhancing their revenue stream.
– Offer full CD Rom package at cost in exchange for inclusion in CD Rom and use of full text for searching.
– Pay for digitization out of conference profits
– SIGs pay for integration: $0.25 - $0.50 per page.
October 1, 1999 SGB Meeting
Richard T. Snodgrass 32
Step 3: Full Text, cont.
• Use standard IR indexing and search techniques on full text.
• Partner with DL and IR research efforts to come up with new search strategies.
• Search software provided to each SIG
October 1, 1999 SGB Meeting
Richard T. Snodgrass 33
Step 4: Citation Linking
• Manual out-linking– about $5-$6 per paper, or $0.30 per page of digitized text
• Can be done semi-automatically for much less, if the appropriate linking software is developed
• In-linking is simply a database search.
• All bibliographic entries must be present.
October 1, 1999 SGB Meeting
Richard T. Snodgrass 36
Previous Efforts
• SIGDA CD Rom Project– 9 CD Roms
– $1.5M project
– SGML, proprietary display software on CD Rom
• POPL CDRom– 10 years of POPL, given out as a SIGPlan member benefit
– PDF files
• Many conferences distribute CD-ROMs of papers
October 1, 1999 SGB Meeting
Richard T. Snodgrass 37
Previous Efforts, cont.
• SIGMOD Anthology– 10 CD Roms (later 1-2 DVD Roms), $105K
– SIGMOD, PODS, KDD, VLDB, ICDE, SSDBM, COMAD, ...
– SIGMOD Record, Data Engineering Bulletin
– TODS, VLDB Journal
– Given out as member benefit
• SIGMOD DiSC yearly CD-ROM– 1999: 2 CD-ROMs, about $30K per year
– all relevant conferences and workshops for that year, ancillary material, such as powerpoint presentations, audio, video
– Given out as a member benefit (Consumer Reports model)
October 1, 1999 SGB Meeting
Richard T. Snodgrass 40
SGB Portal Committee
• Rick Snodgrass (University of Arizona, CS), chair
• Steve Cunningham (Cal State University-Stanislaus, CS)
• Carol Hutchins (Courant Institute of Math. Sci. Library)
• Bob Krovetz (NEC Research Institute)
• Michael Ley (University of Trier, CS)
• Andreas Paepcke (Stanford University)
• Kathy Preas (KP Pubs on CDROM)
• Bernie Rous (ACM Publications)
• Charles Viles (Univ. of North Carolina, Info and Lib Sci)
October 1, 1999 SGB Meeting
Richard T. Snodgrass 45
The ACM Computing Portal
• Free searchable access to the entire computer science corpus
• Links to a fully populated ACM DL and to other DLs
• Capability to purchase papers and to register queries
• Possibly ancillary SIG-provided benefits, such as CD-ROMs and SIG-specific portals
October 1, 1999 SGB Meeting
Richard T. Snodgrass 46
Confluences
• Underlying technologies– Inexpensive scanning, OCR, disk space, high capacity CD-ROM
and DVD-ROM, and widely available www access
• Demonstrated need
• Catalysts: SIG Governing Board, ACM Council, ACM Publications Board, HQ staff
• Qualitative change