cs 106b, lecture 27 advanced hashing - stanford university

37
CS 106B, Lecture 27 Advanced Hashing This document is copyright (C) Stanford Computer Science and Ashley Taylor, licensed under Creative Commons Attribution 2.5 License. All rights reserved. Based on slides created by Marty Stepp, Chris Gregg, Keith Schwarz, Julie Zelenski, Jerry Cain, Eric Roberts, Mehran Sahami, Stuart Reges, Cynthia Lee, and others

Upload: others

Post on 09-May-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 106B, Lecture 27 Advanced Hashing - Stanford University

Thisdocumentiscopyright(C)StanfordComputerScienceandMartyStepp,licensedunderCreativeCommonsAttribution2.5License.Allrightsreserved.BasedonslidescreatedbyKeithSchwarz,JulieZelenski,JerryCain,EricRoberts,MehranSahami,StuartReges,CynthiaLee,andothers.

CS106B,Lecture27AdvancedHashing

Thisdocumentiscopyright(C)StanfordComputerScienceandAshleyTaylor,licensedunderCreativeCommonsAttribution2.5License.Allrightsreserved.BasedonslidescreatedbyMartyStepp,ChrisGregg,KeithSchwarz,JulieZelenski,JerryCain,EricRoberts,MehranSahami,StuartReges,CynthiaLee,andothers

Page 2: CS 106B, Lecture 27 Advanced Hashing - Stanford University

2

Plan for Today • DiscusshowHashMapsdifferfromHashSets• AnotherimplementationforHashSet/Map:CuckooHashing!• Discussqualitiesofagoodhashfunction.• Learnaboutanotherapplicationforhashing:cryptography.

Page 3: CS 106B, Lecture 27 Advanced Hashing - Stanford University

3

Hash map (15.4)

• Ahashmapislikeasetwherethenodesstorekey/valuepairs:

//key(ID)value(name)map.put(51234562,"Ashley");map.put(62756179,"Amy");map.put(54727849,"Marty");map.put(46281955,"Seth");– MustmodifytheHashNodeclasstostoreakeyandavalue

index 0 1 2 3 4 5 6 7 8 9value

62756179 Amy46281955 Seth51234562 Ashley

54727849 Marty

Page 4: CS 106B, Lecture 27 Advanced Hashing - Stanford University

4

Hash map vs. hash set –  Thehashingisalwaysdoneonthekeys,notthevalues.–  ThecontainsfunctionisnowcontainsKey;thereandinremove,yousearchforanodewhosekeymatchesagivenkey.

–  Theaddmethodisnowput;ifthegivenkeyisalreadythere,youmustreplaceitsoldvaluewiththenewone.map.put(54727849,"Chris");//replaceMartywithChris

index 0 1 2 3 4 5 6 7 8 9value

62756179 Amy46281955 Seth51234562 Ashley

54727849 MartyChris

Page 5: CS 106B, Lecture 27 Advanced Hashing - Stanford University

5

Another Way to Hash • Fun(butsoontoberelevant)fact:cuckoobirdslaytheireggsinotherbirds’nests

Source:wikimedia

Page 6: CS 106B, Lecture 27 Advanced Hashing - Stanford University

6

Cuckoo Hashing • Whatifwemadecontainsreallyfast(lookatatmosttwoelements,nomatterwhat)?

•  Idea:havetwoarraysthatstoreelements,whereeacharrayhasitsownhashfunction

• Tryhashingtheelementintobotharrays,andputitinanemptyspace

•  Ifnospaceisempty,kickoutoneoftheexistingelementsandmoveittotheotherarray.

• Containsjustchecksthecorrespondingspotinbotharrays• Sloweradd,butfastercontains

Page 7: CS 106B, Lecture 27 Advanced Hashing - Stanford University

7

Cuckoo Hashing Insert:3

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 8: CS 106B, Lecture 27 Advanced Hashing - Stanford University

8

Cuckoo Hashing

3

Insert:3

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 9: CS 106B, Lecture 27 Advanced Hashing - Stanford University

9

Cuckoo Hashing

3

Insert:6

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 10: CS 106B, Lecture 27 Advanced Hashing - Stanford University

10

Cuckoo Hashing

3 6

Insert:6

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 11: CS 106B, Lecture 27 Advanced Hashing - Stanford University

11

Cuckoo Hashing

3 6

Insert:5

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 12: CS 106B, Lecture 27 Advanced Hashing - Stanford University

12

Cuckoo Hashing

3 6

5

Insert:5

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 13: CS 106B, Lecture 27 Advanced Hashing - Stanford University

13

Cuckoo Hashing

3 6

5

Insert:7

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 14: CS 106B, Lecture 27 Advanced Hashing - Stanford University

14

Cuckoo Hashing

3 6

7

Insert:7

HashFunction:3x%4 HashFunction:(2x+1)%4

5

Page 15: CS 106B, Lecture 27 Advanced Hashing - Stanford University

15

Cuckoo Hashing

3

5

6

7

Insert:7

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 16: CS 106B, Lecture 27 Advanced Hashing - Stanford University

16

Cuckoo Hashing

3

5

6

7

Searchfor7(lookinbotharrays)

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 17: CS 106B, Lecture 27 Advanced Hashing - Stanford University

17

Cuckoo Hashing • Whataretheadvantagesordisadvantagesofcuckoohashingversusresolvingcollisionsthroughchaining?

• Whatdoweneedtowatchoutfor?Whenshouldwerehash?

Page 18: CS 106B, Lecture 27 Advanced Hashing - Stanford University

18

Announcements • Calligraphyannouncements

–  Shouldstartthe3rdparttodayortomorrowatthelatest–  StartercodeandWindows–pleaseredownload– Nolatedaysmaybeused,nolatesubmissionsaccepted

• Lastclasstomorrow–gotopoll.ly/#/LdVNgWyo/G6z0awRv• FinalisaonSaturday,at8:30AM,inCubberleyAuditorium

–  Everythingfromthecoursethroughtodayisfairgame,emphasisisonsecondhalfmaterials(startingwithpointers)

– Moreinformation:https://web.stanford.edu/class/cs106b/exams/final.html

–  Practiceexamisonline–notguaranteedtomatchinformat,etc.– WednesdayandThursdaywillbefinalreview

• Pleasegiveusfeedback!cs198.stanford.edu

Page 19: CS 106B, Lecture 27 Advanced Hashing - Stanford University

19

Hashing strings

•  Itiseasytohashanintegeri(useindexabs(i)%length).– Howcanwehashothertypesofvalues(suchasstrings)?

•  Ifwecouldconvertstringsintointegers,wecouldhashthem.– Whatkindofintegerisappropriateforagivenstring?– Doesitmatterwhatintegerwechoose?Whatshoulditbebasedon?

index 0 1 2 3 4 5 6 7

character 'H' 'i' '' 'D' '0' '0' 'd' '!'

Page 20: CS 106B, Lecture 27 Advanced Hashing - Stanford University

20

hashCode consistency • AvalidhashCodefunctionmustbeconsistent(mustproducesameresultsoneachcall)

hashCode(x)==hashCode(x),ifx'sstatedoesn'tchange

Page 21: CS 106B, Lecture 27 Advanced Hashing - Stanford University

21

hashCode and equality • AvalidhashCodefunctionmustbeconsistentwithequality.

a==bmustimplythathashCode(a)==hashCode(b).Vector<int> v1; Vector<int> v2; v1.add(1); v2.add(3); v1.add(3); v2.insert(0, 1); // hashCode(v1) == hashCode(v2)

a!=b doesNOTnecessarilyimplythat

hashCode(a)!=hashCode(b) (whynot?)

Page 22: CS 106B, Lecture 27 Advanced Hashing - Stanford University

22

hashCode distribution • AgoodhashCodefunctioniswell-distributed.

–  Foralargesetofdistinctvalues,theyshouldgenerallyreturnuniquehashcodesratherthanoftencollidingintothesamehashbucket.

–  Thispropertyisdesiredbutnotrequired.Why?

Page 23: CS 106B, Lecture 27 Advanced Hashing - Stanford University

23

Possible hashCode 1 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#1return42;}

0 1 2 3 4 5 6 7

H i D 0 0 d !

Page 24: CS 106B, Lecture 27 Advanced Hashing - Stanford University

24

Possible hashCode 2 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#2returnrandomInteger(0,9999999);}

0 1 2 3 4 5 6 7

H i D 0 0 d !

Page 25: CS 106B, Lecture 27 Advanced Hashing - Stanford University

25

Possible hashCode 3 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#3return(int)&s;//addressofs(apointer)}

0 1 2 3 4 5 6 7

H i D 0 0 d !

Page 26: CS 106B, Lecture 27 Advanced Hashing - Stanford University

26

Possible hashCode 4 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#4returns.length();}

0 1 2 3 4 5 6 7

H i D 0 0 d !

Page 27: CS 106B, Lecture 27 Advanced Hashing - Stanford University

27

Possible hashCode 5 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#5if(s.length()>0){return(int)s[0];//asciiof1stchar}else{return0;}}

0 1 2 3 4 5 6 7

H i D 0 0 d !

Page 28: CS 106B, Lecture 27 Advanced Hashing - Stanford University

28

Possible hashCode 6 •  Thisfunctionsumsthecharacters'ASCIIvalues.

–  Isitvalid?Isitgood?– Whatwillcollide?inthashCode(strings){//#6inthash=0;for(inti=0;i<s.length();i++){hash+=(int)s[i];//ASCIIofchar}returnhash;}

0 1 2 3 4 5 6 7

H i D 0 0 d !

Page 29: CS 106B, Lecture 27 Advanced Hashing - Stanford University

29

Measuring collisions • Hashfunction=sumofcharactersofstring.• Add50,000,000articletitlestoahashmapwith50,000buckets:

Page 30: CS 106B, Lecture 27 Advanced Hashing - Stanford University

30

Idea: Weighted sum hash=s[0]+s[1]+s[2]+...+s[n]

•  Insteadofadding,let'sgiveeachcharacteraweight.– Multiplyitbyincreasingpowersofsomeprimenumber;say,31.–  Thishelpsspreadthestrings'hashcodesovertherangeofintvalues.

hash=s[0]+(31*s[1])+(312*s[2])+...+(31n*s[n])

Page 31: CS 106B, Lecture 27 Advanced Hashing - Stanford University

31

hashCode for strings inthashCode(strings){inthash=5381;for(inti=0;i<(int)s.length();i++){hash=31*hash+(int)s[i];}returnhash;}–  FYI:TheaboveistheactualhashfunctionusedforstringsinJava.

–  Aswithanygeneralhashingfunction,collisionsarepossible.• Example:"Ea"and"FB"havethesamehashvalue.

Page 32: CS 106B, Lecture 27 Advanced Hashing - Stanford University

32

Measuring collisions • Hashfunction=sumofcharactersofstring,multiplyingby31.• Add50,000,000articletitlestoahashmapwith50,000buckets:

Page 33: CS 106B, Lecture 27 Advanced Hashing - Stanford University

33

Hashing structs/objects • Bydefaultyoucannotaddyourownstructs/objectstohashsets.

– Ourlibrariesdon'tknowhowtohashtheseobjects.structPoint{intx;inty;

...

};HashSet<Point>hset;Pointp{17,35};hset.add(p);ERROR:nomatchingfunctionforcallto'hashCode(constPoint&)'

Page 34: CS 106B, Lecture 27 Advanced Hashing - Stanford University

34

Hashing structs/objects • Tomakeyourowntypeshashablebyourlibraries:

–  1)Overloadthe==operator.–  2)WriteahashCodefunctionthattakesyourtypeasitsparameter.

• "Addup"theobject'sstate;scale/multiplypartstodistributetheresults.

structPoint{intx;inty;

...

};

inthashCode(constPoint&p){return1337*p.y+31*p.x;}

booloperator==(constPoint&p1,constPoint&p2){returnp1.x==p2.x&&p1.y==p2.y;}

Page 35: CS 106B, Lecture 27 Advanced Hashing - Stanford University

35

Hashing and Passwords • Wewanttostoreafileofuserpasswords

– Whenausertypesapassword,seeifitmatchesourfile• Problem:anyonewhocanseeourfilecangetallthepasswords

User Password Ashley password123

Shreya traceComics Seth ki88leLuv

Page 36: CS 106B, Lecture 27 Advanced Hashing - Stanford University

36

Hashing and Passwords • Whatifwestoredauniquecodeforeachpasswordinsteadofthestring?– Hashing!

• Extrarequirementsforthehashfunction:– Wantalargenumberofpossiblevalues(hardtofindcollisions)–  Can’tfindthepasswordfromthehash(one-way)– Generallyuseadifferenthashfunction(e.g.SHA-256)

• TheneedforsaltingUser Password Ashley 17851691385

Marty 63158910316 Amy 90713593110

Page 37: CS 106B, Lecture 27 Advanced Hashing - Stanford University

37

Hashing and Data Integrity • Acommon"attack"incryptographyisman-in-the-middle• Howcanyouensurethatahackerdidn'tinterferewiththedata?• Getthehashfromatrustedsource–sincehashfunctionsonlyrarelyhavecollisions,changestodatawillleadtoadifferenthash