ahm 2014: crawling for earthcube
DESCRIPTION
Presentation by Ruth Duerr during the lunch & learn sessions on Day 2, June 25 at the EarthCube All-Hands MeetingTRANSCRIPT
![Page 1: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/1.jpg)
Crawling for EarthCube
Ruth Duerr, Luis Lopez, Abeve Tayachow, Erik Mingo
![Page 2: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/2.jpg)
Outline
• Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community
2
![Page 3: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/3.jpg)
NSIDC: An overview2
Cooperative Institute for Research in Environmental Sciences
Main sponsors:
University of Colorado Boulder
NSIDC affiliations and sponsorship
National Science Foundation NASA National Oceanographic and Atmospheric Administration
![Page 4: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/4.jpg)
The National Snow and Ice Data Center…
Provides tools for
data access
Researches the cryosphere and data science
Educates the public about the
cryosphereSupports data users
Manages and distributes scientific data
Supports local and traditional
knowledge
![Page 5: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/5.jpg)
Outline
• Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community
2
![Page 6: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/6.jpg)
Why not let Google do it?
• What's their incentive? • The schema.org route for data has extreme limitations
2
![Page 7: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/7.jpg)
Ways to build a comprehensive catalog
• Ask folks to register their data and services • Build your catalog by hand • Automate discovery of data and services
2
![Page 8: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/8.jpg)
Preparing Data for Ingest, presented 10/27/09 by R. Duerr LID590DCL Foundations of Data Curation
What if...
Advertising your data so that everyone could find them, were as simple as...
1 - Filling out a web form 2 - Saving it to your website 3 - Adding its link to your site
Well... It can be!
![Page 9: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/9.jpg)
Why not let Google do it?
2
![Page 10: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/10.jpg)
Outline
• Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community
2
![Page 11: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/11.jpg)
Crawler Big Picture
2
BCube Crawler
BCube Broker
CINERGI
![Page 12: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/12.jpg)
Crawler Architecture
2
![Page 13: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/13.jpg)
Things we are going to search for
• OpenSearch • OAI-PMH, ESIP Data and service cast feeds • THREDDS catalogs • Web-enabled folders • WADL/WSDL
2
![Page 14: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/14.jpg)
Things we are going to search for
• OpenSearch • OAI-PMH, ESIP Data and service cast feeds • THREDDS catalogs • Web-enabled folders • WADL/WSDL
2
But what else should we look for?
![Page 15: AHM 2014: Crawling for EarthCube](https://reader030.vdocuments.net/reader030/viewer/2022013114/5481e30db079591a0c8b463b/html5/thumbnails/15.jpg)
16
Questions/Comments