lecture # 30 data organization and binary search

42
Lecture # 30 Data Organization and Binary Search

Upload: kaye-wright

Post on 15-Mar-2016

27 views

Category:

Documents


1 download

DESCRIPTION

Lecture # 30 Data Organization and Binary Search. Data Organization. Problem. Huge amounts of information How do I find Information that I know I want Information related to what I want How do I understand Particular pieces of information The whole collection of information. - PowerPoint PPT Presentation

TRANSCRIPT

Lecture # 30Data Organization and Binary Search

Data Organization

Problem

• Huge amounts of information

• How do I find– Information that I know I want– Information related to what I want

• How do I understand– Particular pieces of information– The whole collection of information

Limitations

• Screen space

• Network bandwidth– Bandwidth - how much information can be

transmitted per second

• Human attention

Kinds of things to organize

• Menu items– MS Word - about 150 menu items

• Text– Pages in a book - 500– Documents on the WWW - gazillions

• Images– All of the pictures created in a commercial

advertising company

Kinds of things to organize

• Sounds– Sound tracks to all TV and Radio news broadcasts

• Video– A complete collection of classic movies

• Structured information (records)– People– Cars– Students– Electronic appliance parts

A question of scale

• 10 things• 100 things - menu• 1,000 things - files on your computer• 10,000 things - students at a university• 1,000,000 things - books in a library• gazillion things - WWW pages

Three ways to find things

• Lists – arrays

• Trees – organize in to categories

• Search – describe what you want and have the computer

find it

The Phone Book Challenge

• How long will it take to find “Bill Lund” in the BYU Directory?

• How long will it take to find “422-8766” in the BYU Directory?

What Algorithm did you use to search the phone book?

• Where did you start?

• How many steps did it take?

• Is there a more efficient way?

Binary search - for “Goodrich”

Binary search - for “Goodrich”

Lower = 0Upper = 10

Guess = (0+10)/2 = 5

Binary search - for “Goodrich”

Lower = 0Upper = 5

Guess = (0+5)/2 = 2

Binary search - for “Goodrich”

Lower = 2Upper = 5

Guess = (2+5)/2 = 3

Binary search - for “Goodrich”

Lower = 3Upper = 5

Guess = (3+5)/2 = 4

Binary search

• If there are 64 things in a list, how many times can you divide that list in half?– 32, 16, 8, 4, 2, 1

• 6 times

Binary search

• If there are 1024 things in a list, how many times can you divide that list in half?– 512, 256, 128, 64, 32, 16, 8, 4, 2, 1

• 10 times

Binary search

• If the size of the list doubles, how many more steps are required in a binary search?

1

Binary search

• If there are N items in a list then binary search takes

• log2(N) steps

Binary search

• Estimating log2(N)– Count the number of digits and multiply by 2.5

• 1000– 4*2.5 = 10 steps

• 1,000,000– 7*2.5 = 17-18 steps

• 1,000,000,000– 10*2.5= 25 steps

Provo/Orem phone book

• How long to find “Bill Lund?”~ 5000 in the BYU Directory

–Log2(5000) approx 4*2.5 = 10 steps

How to find a phone number

• 920-3231– 1 step

• 130-2313– 11 steps

• Average?– 5 steps

• Average N?– N/2

Provo/Orem phone book

• How many steps to find a phone number?– 5,000/2 = 2,500 average

• How can we improve this?

Sort the phone book by phone number

• What if I want to search on both name and number?

Using an IndexLast Name Phone number

Using an IndexLast Name Phone number

Anderson

Using an IndexLast Name Phone number

Anderson, Bilinski

Using an IndexLast Name Phone number

Anderson, Bilinski, Clark

Using an IndexLast Name Phone number

Anderson, Bilinski, Clark, Garcia

Using an IndexLast Name Phone number

123-3123

Using an IndexLast Name Phone number

123-3123, 130-2313

Using an IndexLast Name Phone number

123-3123, 130-2313, 232-0312

Using an IndexLast Name Phone number

123-3123, 130-2313, 232-0312, 238-1234

Search for GoodrichLast Name

Lower = 0Upper = 10

Guess = 5

lower

Search for GoodrichLast Name

Lower = 0Upper = 5

Guess = 2

above

Search for GoodrichLast Name

Lower = 2Upper = 5

Guess = 3

above

Search for GoodrichLast Name

Lower = 3Upper = 5

Guess = 4

above

Search for 823-1242

Lower = 0Upper = 10

Guess = 5

above

Phone number

Search for 823-1242

Lower = 5Upper = 10

Guess = 7

below

Phone number

Search for 823-1242

Lower = 5Upper = 7

Guess = 6

MATCH

Phone number

Using an IndexLast Name Phone number

• What about first name or city?– another index

Data Organization Summary

• What are we organizing for?• Scale

– 10 - 1,000 - 1,000,000 - 1,000,000,000• Lists

– Unsorted (N/2)– Sorted Log2(N)

• count the digits and multiply by 2.5

• To access in many ways– Use many indices into the same data