scientific data management
DESCRIPTION
Scientific Data Management. Dr. Laura Bright Bill Howe. Biology. Old way: Wet lab chemistry New way: Microarray Search GenBank, Ensembl, GDB, SwissProt, Entrez using BLAST, FASTA, GCG, EMBOSS. Astronomy. Old way: Sign up for telescope time New way: Sloan Digital Sky Survey - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/1.jpg)
CS410/510: SciData Management
1
Scientific Data Management
Dr. Laura BrightBill Howe
![Page 2: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/2.jpg)
CS410/510: SciData Management
2
Biology
Old way: Wet lab chemistry
New way: Microarray Search GenBank,
Ensembl, GDB, SwissProt, Entrez using BLAST, FASTA, GCG, EMBOSS
![Page 3: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/3.jpg)
CS410/510: SciData Management
3
Astronomy
Old way: Sign up for telescope time
New way: Sloan Digital Sky Survey
Systematically mapping ¼ of the entire sky
12 TB to date, 15 TB final in 2007
![Page 4: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/4.jpg)
CS410/510: SciData Management
4
Oceanography
Old way: Field work Simplified
Calculations
New way: Finite Element
Analysis In situ sensors CODAR
![Page 5: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/5.jpg)
CS410/510: SciData Management
5
Science is Changing
Old Science: “Query the world” Data acquisition is the dominant cost
New Science: “Download the world” Data analysis is the dominant cost
![Page 6: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/6.jpg)
CS410/510: SciData Management
6
Course Structure
10% In-class exercises10% Study Questions40% Homework Assignments15% Mini-project25% Short Paper (3 pages)
No exams
![Page 7: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/7.jpg)
CS410/510: SciData Management
7
Short Paper Assignment (1/2)
To be completed individually!Compare/Contrast a pair of papers We provide a list to choose from
![Page 8: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/8.jpg)
CS410/510: SciData Management
8
Short Paper Assignment (2/2)
25% = 3 milestones + final paper 2 points: select paper pair. (~ week 3) 5 points: a half-page summary of each
paper; one page total. (~ week 5) 3 points: a list of 3 points of
contrast/comparison, in complete sentences. (~ week 7)
15 points: Final paper (~ week 11) Both content and mechanics matter!
![Page 9: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/9.jpg)
CS410/510: SciData Management
9
Study Questions
Covers the readingsDiscussion ok, but write up your own answers Dr. Bright’s “Pizza rule” Try to keep the discussion on the list
3-4 questions per set, about 1 set per weekDetails: About a paragraph; use complete sentences Feel free to use diagrams or figures when
appropriate! Due at the beginning of class on the due date
![Page 10: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/10.jpg)
CS410/510: SciData Management
10
Homework Assignments
Covers Tools (rather than readings)To be completed individually!Send questions to the instructors rather than the list
![Page 11: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/11.jpg)
CS410/510: SciData Management
11
Late work
Prior approval is necessary, but not always sufficient
![Page 12: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/12.jpg)
CS410/510: SciData Management
12
Course Web Page
http://www.cs.pdx.edu/~howe/cs410
We hope to post class materials at least an hour before class (no promises)Extra copies of printed material will be available outside Dr. Bright’s office (FAB 310-24)
material web page hard copy
lectures Yes No
readings available online Yes No
copy-sensitive readings No Yes
study questions Yes Yes
homework Yes Yes
![Page 13: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/13.jpg)
CS410/510: SciData Management
13
Office Hours
Howe: FAB 310-C Monday 4-6 (or by appointment)
Bright: FAB 310-24 Thursday 1-3 (or by appointment)
![Page 14: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/14.jpg)
CS410/510: SciData Management
14
![Page 15: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/15.jpg)
CS410/510: SciData Management
15
Course Email List
“scidata”
Ok to discuss study questionsNot ok to discuss homework answersSend HW Questions to instructors
https://webmail.cecs.pdx.edu/mailman/listinfo.cgi/scidata
![Page 16: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/16.jpg)
CS410/510: SciData Management
16
Academic Integrity
2004-2005 PSU Catalog pages 29-30Posted on the web page
![Page 17: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/17.jpg)
CS410/510: SciData Management
17
A First Class Exercise1) Name (feel free to add pronounciation hints!)2) Email you wish to use for this class3) How much experience with RDBMS?
(A) What’s an RDBMS? (B) I’ve taken CS 386, but that’s it (C) I’ve used an RDBMS on a few projects (D) I write SQL semi-daily (E) I’m a DBA
4) How might Scientific Data Management be different than “regular” data management?
![Page 18: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/18.jpg)
CS410/510: SciData Management
18
(Scientific Data) Management
Interesting data types Gene sequences, spatio-temporal objects, scalars, vectors, tensors map layers, images, meshes unstructured metadata
Interesting Scale Terabytes becoming Petabytes
Interesting Access patterns Data “products” Data “releases”
![Page 19: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/19.jpg)
CS410/510: SciData Management
19
Scientific (Data Management)
Readings drawn from database literatureWe will consider: Conventional technology
Relational databases Web Services/XML
Specialized technology GIS Grid Workflow Visualization
Emphasis on Case Studies
![Page 20: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/20.jpg)
CS410/510: SciData Management
20
Characterizing SDMS (1/3)
What logical data types are involved? DNA sequences, maps of the earth, rivers, lakes maps of the sky, galaxies, stars Particle trajectories
What physical data types are involved? Multimedia? Multidimensional arrays? Spatio-temporal objects? “ordinary” tuples?
![Page 21: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/21.jpg)
CS410/510: SciData Management
21
Characterizing SDMS (2/3)
Who are the Customers? Other Researchers General Public Policy Makers Emergency Workers Commercial
![Page 22: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/22.jpg)
CS410/510: SciData Management
22
Customers?
![Page 23: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/23.jpg)
CS410/510: SciData Management
23
Characterizing SDMS (3/3)What is the Architecture? Pipeline (Workflow) Archive (Database) Clearinghouse (Portal)
What Interfaces are supported? Browse Query Upload Derive Script (Web Services)
![Page 24: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/24.jpg)
CS410/510: SciData Management
24
More Examples
geodata.gov governmental GIS clearinghouse
EOSDIS NASA’s satellite image repository
IOOS Ocean measurement and forecasting
Others?
![Page 25: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/25.jpg)
CS410/510: SciData Management
25
![Page 26: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/26.jpg)
CS410/510: SciData Management
26
National Weather Service: Timeline
1849: Smithsonian Institution provides weather instruments to telegraph operators 1900: Galveston Hurricane1935: Long range forecasts; buoys1955-1960: Computer forecasts scheduled regularly; weather satellite TIROS I launched.1979: AFOS Computer system is deployed, connecting all Weather Service forecast offices.1988: Weather Service mobilizes local forecasting operation to assist in fighting week-long wildfire in Yellowstone park1990: NEXRAD Radar deployment project; a Cray supercomputer deployed
![Page 27: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/27.jpg)
CS410/510: SciData Management
27
National Weather Service
Data Collection Radar Satellite Forecasts Bulletins
Data Dissemination Radio: aviation, marine, military channels FTP, HTTP, email, RSS: public
Part of a UN sponsored Gobal network
![Page 28: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/28.jpg)
CS410/510: SciData Management
28
National Weather Service: Network
![Page 29: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/29.jpg)
CS410/510: SciData Management
29
The Gateway
NWS: Gateway
Public
Anonymous FTPFTPMail
“Family of Services”(Direct phone line)
http web services (XML/SOAP)
web form
emailftp
bulletins
RSS
radarsatellite buoys
models
![Page 30: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/30.jpg)
CS410/510: SciData Management
30
National Weather Service: Products (1/2)
Computer Models GRIB files from 10+ models from regional to global
scale Example:SL.008001/ST.opnl/MT.ruc_CY.06/RD.20000622/PT.grid_DF.gr1/
fh.0003x_tl.press Facsimile/Images Text products derived from models Special products in special formats
Text Products - Warnings, outlooks, advisories, forecast, discussion ~100 different types
![Page 31: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/31.jpg)
CS410/510: SciData Management
31
National Weather Service: Products (2/2)
Observed Data - kept for 24 hours at least observations from aviation, buoys, ships, balloons special formats, but some have parsed them to XML
Radar Products - Multicast by connecting a router directly to NWS as well as FTP SL.us008001/DF.of/DC.radar/DS.p19r1/SI.kfws/sn.0114
Satellite Products – Cloud Water Vapor, Cloud Liquid Water, Rain Rate, Sea Ice
Concentration, Sea Ice Age, Sea Ice Edge, Soil Moisture, Surface Wind, Water Vapor over oceans, Surface Temperature, Snow Water Content, Cloud Amount, and EDR Surface Type
![Page 32: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/32.jpg)
CS410/510: SciData Management
32
National Weather Service: Radar
![Page 33: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/33.jpg)
CS410/510: SciData Management
33
National Weather Service: Forecasts (1/3)
Several Climate Models: Weather Research and Forecast (WRF) Global Forecast System (GFS) North American Mesoscale (NAM) Nested Grid Model (NGM)
Specialized Models: Fire Weather Hurricane Aviation
![Page 34: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/34.jpg)
CS410/510: SciData Management
34
National Weather Service: Forecasts (2/3)
National Digital Forecast Database 3 hr temporal resolution 5km spatial resolution GRIB files, GIS map layers, data
products
![Page 35: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/35.jpg)
CS410/510: SciData Management
35
National Weather Service: Forecasts (3/3)
Model Output Statistics (MOS) Examples:
Max/Min Temperature Forecasts Surface Temp / Dewpoint Forecasts Opaque Cloud Amount Probability of Precipitation Severe weather probabilities
MOS products
![Page 36: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/36.jpg)
CS410/510: SciData Management
36
National Weather Service: Satellites
Geostationary Operational Environmental Satellites
Variety of images and products
![Page 37: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/37.jpg)
CS410/510: SciData Management
37
National Weather Service: Summary
Domain?Customers?Architecture?Interfaces?
![Page 38: Scientific Data Management](https://reader031.vdocuments.net/reader031/viewer/2022013004/56813e31550346895da815fa/html5/thumbnails/38.jpg)
CS410/510: SciData Management
38