1 harvard university csci e-2a life, liberty, and happiness after the digital explosion 3a: data...

28
1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

Post on 18-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

11

Harvard UniversityCSCI E-2a

Life, Liberty, and Happiness

After the Digital Explosion

Harvard UniversityCSCI E-2a

Life, Liberty, and Happiness

After the Digital Explosion

3A: Data Representation3A: Data Representation

Page 2: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

22

RepresentationRepresentation

• How do you represent “things” with bits• Anything

• Text• Documents

• Pictures• Sounds

• How do you represent “things” with bits• Anything

• Text• Documents

• Pictures• Sounds

Page 3: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

33

RepresentationRepresentation

• How do you represent “things” with bits

• Why does representation matter?

• How do you represent “things” with bits

• Why does representation matter?

Power

Money

Page 4: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

44

Bits (“Binary digITs”)Bits (“Binary digITs”)

• There are two bits: 0 and 1• Everything else is a sequence of

bits • I.e. a “bit string”• 0010101, 111100010100101011

• There are two bits: 0 and 1• Everything else is a sequence of

bits • I.e. a “bit string”• 0010101, 111100010100101011

Page 5: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

55

Digital representations are, by definition,

approximations.

They leave out a lot.

Page 6: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

66

Page 7: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

77

Representing ThingsRepresenting Things

Harry

Ken

Tyler

Sue

Xing

Page 8: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

88

Representing ThingsRepresenting Things

Harry 1

Ken 2

Tyler 3

Sue 4

Xing 5

Page 9: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

99

Representing ThingsRepresenting Things

Harry 1 0

Ken 2 1

Tyler 3 ?

Sue 4

Xing 5

Page 10: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

1010

Representing ThingsRepresenting Things

Harry 1 0 00

Ken 2 1 01

Tyler 3 ? 10

Sue 4 11

Xing 5 ??

Page 11: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

1111

Representing ThingsRepresenting Things

Harry 1 0 00 000

Ken 2 1 01 001

Tyler 3 ? 10 010

Sue 4 11 011

Xing 5 ?? 100

Page 12: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

1212

How many bits for “n” thingsHow many bits for “n” things

• Each bit doubles the number• 1 bit = 2, 2 bits = 4, 3 bits = 8• N bits = 2n things• 10 bits = 1024 etc.

• Each bit doubles the number• 1 bit = 2, 2 bits = 4, 3 bits = 8• N bits = 2n things• 10 bits = 1024 etc.

Another example of exponential growth

Page 13: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

1313

How many bits does it take to represent the 2007

Red Sox Season?

How many bits does it take to represent the 2007

Red Sox Season?

Page 14: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

1414

Red Sox 2007Red Sox 2007

• 162 games = 162 bits (96 “1”s and 66 “0”s

• Add 4 bits / game for opponent• Add Inning results• Add at-bat results• Add pitch details• Stop when you’ve had enough

• 162 games = 162 bits (96 “1”s and 66 “0”s

• Add 4 bits / game for opponent• Add Inning results• Add at-bat results• Add pitch details• Stop when you’ve had enough

Page 15: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

1515

Representing TextRepresenting Text

• 8 bits per character• “A” = 01000001• “(” = 00101000• How many combinations of 8 bits?

2· 2· 2· 2· 2· 2· 2· 2 = 28 = 256

• 8 bits per character• “A” = 01000001• “(” = 00101000• How many combinations of 8 bits?

2· 2· 2· 2· 2· 2· 2· 2 = 28 = 256

Page 16: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

1616

Hexadecimal DigitsHexadecimal Digits

0000 0001 0010 0011 0100 0101 0110 0111

0 1 2 3 4 5 6 7

1000 1001 1010 1011 1100 1101 1110 1111

8 9 A B C D E F

Page 17: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

1717

xy

0 1 2 3 4 5 6 7 8 9 A B C D E F

0

1

2 sp ! " # $ % & ' ( ) * + , - . /

3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?

4 @ A B C D E F G H I J K L M N O

5 P Q R S T U V W X Y Z [ \ ] ^ _

6 ` a b c d e f g h i j k l m n o

7 p q r s t u v w x y z { | } ~ del

ASCIIAmerican Standard Code for Information

InterchangeCharacter represented by Hex xy, e.g. 4B is “K”

ASCIIAmerican Standard Code for Information

InterchangeCharacter represented by Hex xy, e.g. 4B is “K”

Page 18: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

1818

ASCII UnderneathASCII Underneath

• Emails• Web pages

• Emails• Web pages

Page 19: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

1919

What if you need more than 256 characters?

What if you need more than 256 characters?

• Unicode• 32 bits per character (roughly 4

billion different characters)

• Unicode• 32 bits per character (roughly 4

billion different characters)

Page 20: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

2020

Page 21: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

2121

Page 22: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

2222

Page 23: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

2323

What about documents?What about documents?

Representation+

Interpretation

Representation+

Interpretation

Page 24: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

2424

Word ProcessorsWord Processors602PC SuiteAppleWorksApplix Word -Atlantis Ocean MindEasyWordFrameMakerHan/GulLotus Word ProMellelMicrosoft Word -Nisus Writer -Pages -Papyrus – PolyEditStarOfficeTextMakerWordExpressWordPerfectHieroglyphJarteMadhyamAmíAtariWriter BravoBank Street WriterDeskMate

DisplayWriteDocument EditorEasyWriterFullWrite Professional geoWriteGypsylexiconLocoScriptMacWriteMagic WandMindWrite MultiMatePaperClippfs:WriteProtextSpeedScriptSprintTasteTJ-2 [3]TranswriteWordMARCWordStarWordsworth WriteNow XyWrite

Page 25: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

2525

Why not pick one?Why not pick one?

Page 26: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

2626

Page 27: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

2727

Page 28: 1 Harvard University CSCI E-2a Life, Liberty, and Happiness After the Digital Explosion 3A: Data Representation

2828

Why representation mattersWhy representation matters

• Loss of data and the inability to exchange is the primary deterrent to switching vendors.

• Control the representation and your control what can be seen and what can be done.

• Loss of data and the inability to exchange is the primary deterrent to switching vendors.

• Control the representation and your control what can be seen and what can be done.