storing and organizing data informatics i101 february 18, 2004 john c. paolillo
TRANSCRIPT
Storing and Organizing DataStoring and Organizing Data
Informatics I101Informatics I101
February 18, 2004February 18, 2004
John C. PaolilloJohn C. Paolillo
Storing DataStoring Data
• Encoding: fixed or variable widthEncoding: fixed or variable width
• MemoryMemory
• Storage medium:Storage medium:– Magnetic: tape, disk, hard diskMagnetic: tape, disk, hard disk– Optical: CD, DVD, etc.Optical: CD, DVD, etc.– Silicon: Programable Read Only Memory Silicon: Programable Read Only Memory
(PROM), Erasable PROM, etc.(PROM), Erasable PROM, etc.
Compact Disk RecordingCompact Disk Recording
Lens
Phot
ocel
l
Data groove, etched insurface of plastic, hasa slight “wobble” that helps locate the data
LED
Light beam
Lens
Crystaline metal alloyrecording surface
The Recording ProcessThe Recording Process
Pits of amorphous solidleft when metal re-cools
Light beam — pulses to record on and off states,steady for reading
1.6µm 0.74µm 0.32µm
CD Media StatesCD Media States
• CrystalineCrystaline: bright, reflects light well : bright, reflects light well – ““off” stateoff” state
• AmorphousAmorphous: dark, scatters light : dark, scatters light – “ “on” stateon” state
• Micro-crystalineMicro-crystaline: reflects light, but not : reflects light, but not brightly brightly – “ “erased” state (= “off”)erased” state (= “off”)
How Erasing Takes PlaceHow Erasing Takes Place
Writing isn’t perfectWriting isn’t perfect
The center pits (dots) are partly erased by the heating caused by the writing of the nearby longer pits (dashes)which were written later.
ReferenceReference
van Houten, Henk; and Wouter Leibbrandt. van Houten, Henk; and Wouter Leibbrandt. 2000. “Phase change recording”. 2000. “Phase change recording”. Communications of the ACMCommunications of the ACM, 43.11: 64-71., 43.11: 64-71.
http://www.acm.org/dlhttp://www.acm.org/dl
Storing DataStoring Data
• Encoding: we may need to change from one Encoding: we may need to change from one encoding to another encoding to another – Task of the Task of the device driverdevice driver– Gives us a stream of bitsGives us a stream of bits
• Medium: different media require different Medium: different media require different treatment of the data for storagetreatment of the data for storage– Task of the device hardware itselfTask of the device hardware itself– Gives us a stream of bits read/write-able by the deviceGives us a stream of bits read/write-able by the device
But how do we find the data later?But how do we find the data later?
Data OrganizationData Organization
• Index for the dataIndex for the data– File names, extensionsFile names, extensions– Metadata (date, program that uses it, etc.)Metadata (date, program that uses it, etc.)– Directory structuresDirectory structures
• All data storage systems use some kind of All data storage systems use some kind of data organizationdata organization– The principles of data organization are the same The principles of data organization are the same
no matter what the data or where it is organizedno matter what the data or where it is organized
When Organization is CriticalWhen Organization is Critical
• National Center for Biotechnology Information National Center for Biotechnology Information (NCBI)(NCBI)Genbank:Genbank:– 28 billion DNA base pairs (A, C, G, T)28 billion DNA base pairs (A, C, G, T)– 22 million sequences (possible genes)22 million sequences (possible genes)
This is a lot of data to manage. In NCBI it has This is a lot of data to manage. In NCBI it has been indexed with many kinds of metadata and been indexed with many kinds of metadata and integrated with information from scientific integrated with information from scientific publications, so the overall enterprise is larger yet.publications, so the overall enterprise is larger yet.
Other Similar ApplicationsOther Similar Applications
• NASA mars and other missionsNASA mars and other missions– http://photojournal.jpl.nasa.gov/index.htmlhttp://photojournal.jpl.nasa.gov/index.html
• The National Virtual ObservatoryThe National Virtual Observatory– http://www.us-vo.org/http://www.us-vo.org/
• Centers for Disease ControlCenters for Disease Control– http://www.cdc.gov/http://www.cdc.gov/
• Homeland SecurityHomeland Security
Data and MetadataData and Metadata
DataData: : any object of interest which can be any object of interest which can be characterized and encoded in digital formcharacterized and encoded in digital form
MetadataMetadata: : data data aboutabout data — data used to help index data — data used to help index and locate data of interest in some and locate data of interest in some application application
Data Organization SchemesData Organization Schemes
• HierarchicalHierarchical– Data organized into object hierarchies for easy accessData organized into object hierarchies for easy access– Metadata is in the tree structure of the hierarchiesMetadata is in the tree structure of the hierarchies– XML DatabasesXML Databases
• NetworkNetwork– Objects link to some selected other objectsObjects link to some selected other objects– Metadata is embedded in the dataMetadata is embedded in the data– The World-Wide WebThe World-Wide Web
• RelationalRelational– Data organized into Data organized into relationsrelations– Metadata is in the structure of the relationsMetadata is in the structure of the relations– Most Database Management Systems (DBMSs)Most Database Management Systems (DBMSs)
RelationsRelations
ActorActorMeryl StreepMeryl StreepJohnny DeppJohnny DeppMeg RyanMeg Ryan......
MovieMovieThe HoursThe HoursDead ManDead ManAgainst the RopesAgainst the Ropes......
DateDateSummer 2003Summer 2003Summer 1994Summer 1994Winter 2004Winter 2004......
MetadataMetadata
DataData