practical computing wiith chaos
TRANSCRIPT
![Page 1: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/1.jpg)
®© 2014 MapR Technologies 1
®
© 2014 MapR Technologies
Ted Dunning June 9, 2015
![Page 2: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/2.jpg)
®© 2014 MapR Technologies 2
Practical Computing with Chaos Ted Dunning, Chief Applications Architect MapR Technologies
Email [email protected] [email protected] Twitter @Ted_Dunning
![Page 3: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/3.jpg)
®© 2014 MapR Technologies 3
e-book available courtesy of MapR Also at MapR booth
http://bit.ly/1jQ9QuL
A New Look at Anomaly Detection by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)
![Page 4: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/4.jpg)
®© 2014 MapR Technologies 4
Practical Machine Learning series (O’Reilly) • Machine learning is becoming mainstream • Need pragmatic approaches that take into account real world
business settings: – Time to value – Limited resources – Availability of data – Expertise and cost of team to develop and to maintain system
• Look for approaches with big benefits for the effort expended
![Page 5: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/5.jpg)
®© 2014 MapR Technologies 5
Agenda • Monty Hall • Randomized geo-coding • Thompson sampling
– Bayesian Bandits – Targeting – Bayesian ranking
• Dithering (sound, signals) • Synthetic data (preview)
![Page 6: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/6.jpg)
®© 2014 MapR Technologies 6
Let’s Start with Trouble • Monty Hall problem (oops, done)
• Three doors, one with a fabulous prize • You pick one • Monte shows you one of the remaining doors is empty • You can switch at this point to the other door or not
• Should you switch?
![Page 7: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/7.jpg)
®© 2014 MapR Technologies 7
![Page 8: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/8.jpg)
®© 2014 MapR Technologies 8
![Page 9: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/9.jpg)
®© 2014 MapR Technologies 9
![Page 10: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/10.jpg)
®© 2014 MapR Technologies 10
The Real Problem
• Doing the math isn’t too hard
• Convincing somebody you have the right answer is really hard
![Page 11: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/11.jpg)
®© 2014 MapR Technologies 11
Live Coding With REAL Chaos
![Page 12: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/12.jpg)
®© 2014 MapR Technologies 12
Geo-coding
![Page 13: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/13.jpg)
®© 2014 MapR Technologies 13
Geo-coding • Some databases have disk locality ó key locality • The primary key is totally ordered
• Embedding a total ordering of the points in a plane is possible – But loses some distance information – A line is not a square!
• We want to do proximity searches – This gets harder in the polar regions for most codings
![Page 14: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/14.jpg)
®© 2014 MapR Technologies 14
Space Filling Curve
0 1
23 01
2 3
0
1 2
3 0
1 2
3
0
1 2
3
![Page 15: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/15.jpg)
®© 2014 MapR Technologies 15
Space Filling Curve
0123
2
3
3
1
0
2
2
3
1
1
00 3
201
![Page 16: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/16.jpg)
®© 2014 MapR Technologies 16 000 001 010 011 100 101 110 111
000
001
010
011
100
101
110
111
Z-coding – Interleave Bits
x = 010y = 011geo = 00.11.01
1110
010000
1110
11
01
01
10
00
00
11
01
10
01
110010
![Page 17: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/17.jpg)
®© 2014 MapR Technologies 17 000 001 010 011 100 101 110 111
000
001
010
011
100
101
110
111
Neighbors Often Share Prefix
1110
010000
1110
11
01
01
10
00
00
11
01
10
01
110010
00. 11.11
10. 01.01
00. 11.01
![Page 18: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/18.jpg)
®© 2014 MapR Technologies 18
Often, not always
13 15 37Close Far
![Page 19: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/19.jpg)
®© 2014 MapR Technologies 19 000 001 010 011 100 101 110 111
000
001
010
011
100
101
110
111
Random Sampling to Derive Keys
1110
010000
1110
11
01
01
10
00
00
11
01
10
01
110010
![Page 20: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/20.jpg)
®© 2014 MapR Technologies 20
"00.01.01" "00.01.10" "00.01.11" "00.11.00" "00.11.01" "00.11.10" "00.11.11" "01.00.10" "01.10.00" "01.10.10”
1110
010000
1110
11
01
01
10
00
00
11
01
10
01
110010
![Page 21: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/21.jpg)
®© 2014 MapR Technologies 21
"00.01.01" "00.01.10" "00.01.11" "00.11.00" "00.11.01" "00.11.10" "00.11.11" "01.00.10" "01.10.00" "01.10.10”
1110
010000
1110
11
01
01
10
00
00
11
01
10
01
110010
![Page 22: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/22.jpg)
®© 2014 MapR Technologies 22
"00.01.10" - "00.01.11" "00.11.00" - "00.11.11" "01.00.10" "01.10.00" - "01.10.10”
1110
010000
1110
11
01
01
10
00
00
11
01
10
01
110010
![Page 23: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/23.jpg)
®© 2014 MapR Technologies 23
Dithering
![Page 24: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/24.jpg)
®© 2014 MapR Technologies 24
• 4 bit sine wave (listen for artifacts as volume decreases)
• White dithering (artifacts gone, we hear through the noise)
• Noise shaping (noise is easier to hear through)
![Page 25: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/25.jpg)
®© 2014 MapR Technologies 25
0 1 2 3 4 5 6
−4−2
02
4
Time
![Page 26: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/26.jpg)
®© 2014 MapR Technologies 26
The Shape of the Noise
Noise
Frequency
−0.4 −0.2 0.0 0.2 0.4
01000
3000
![Page 27: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/27.jpg)
®© 2014 MapR Technologies 27
The Effect After Averaging
0 1 2 3 4 5 6
−4−2
02
4
Time
![Page 28: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/28.jpg)
®© 2014 MapR Technologies 28
Thompson Sampling
![Page 29: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/29.jpg)
®© 2014 MapR Technologies 29
Learning in the Real World • In the real world we get to pick our training examples
– Do we try this restaurant or not?
• Learning has real and opportunity costs
• Not learning has real and opportunity costs as well
• Every sub-optimal choice we make incurs regret – We would like to minimize this – But we can’t quantify regret without incurring regret!
![Page 30: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/30.jpg)
®© 2014 MapR Technologies 30
An Example • Pick one of five options
– Purple, blue, green, red, yellow – Each has a random payoff
• If you pick a bad option, regret = mean(best) – mean(yours)
• The best known algorithm uses randomization – Best = minimal regret + minimal code complexity
![Page 31: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/31.jpg)
®© 2014 MapR Technologies 31
Demo – The Algorithm
![Page 32: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/32.jpg)
®© 2014 MapR Technologies 32
Synthetic Data
![Page 33: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/33.jpg)
®© 2014 MapR Technologies 33
select IR.ENC_KEY ,IR.ENCOUNTER_ ,IR.ETYPE ,IR.bill_type ,IR.CONTR_ ,IR.SOURCE_CD ,IR.sub_source_cd ,IR.HP_CD ,IR.LOB_CD ,IR.FDO ,IR.TDOS ,IR.member_Nbr ,IR.HIC_NBR ,IR.MEMBER_SOURCE_CD ,IR.HDR_ERRCD ,IR.HDR_ERRDESC ,IR.PROVIDER_NBR ,IR.provider_type ,IR.PROVIDER_SOURCE_CD ,IR.cms_provider_ty e ,IR.SPEC_CD ,IR.SPEC_DESC ,IR.rev_cd ,IR.rev_cd_desc ,IR.proc_cd ,IR.diag_cd ,IR.DIAG_CD_KEY ,IR.DIAGNOSIS_KEY ,IR.rec_state_cd ,IR.rec_status_cd ,IR.DG_ERRCD ,IR.DG_ERRDESC FROM (SELECT distinct enc.encounter_key as ENC_KEY, enc.encounter_nbr as ENCOUNTER_, typ.encounter_type_cd as ETYPE, bt.bill_type, cnt.contract_nbr as CONTR_, ds.SOURCE_CD, enc.sub_source_cd, enc.HP_CD, lob.LOB_CD, enc.new_min_dt as FDOS, substr(enc.new_max_dt, 1, 10) as TDOS, enc.member_Nbr, m.HIC_NBR, m.MEMBER_SOURCE_CD, eerr.error_cd as HDR_ERRCD, eerr.ERROR_DESC as HDR_ERRDESC, enc.PROVIDER_NBR, prv.provider_type, prv.PROVIDER_SOURCE_CD, diag.cms_provider_type, sp.specialty_cd as SPEC_CD, sp.specialty_desc as SPEC_DESC, svc.rev_cd, rev.rev_cd_desc, svc.proc_cd, dgcd.diag_cd, dgcd.DIAG_CD_KEY, diag.DIAGNOSIS_KEY, st.rec_state_cd, sts.rec_status_cd, derr.error_cd as DG_ERRCD, derr.error_desc as DG_ERRDESC FROM oicpcuhg.ir_encounter enc `
Can You See the Problem?
![Page 34: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/34.jpg)
®© 2014 MapR Technologies 34
INNER JOIN oicpcuhg.ir_encountertype typ ON (typ.encounter_type_key = enc.encounter_type_key) LEFT OUTER JOIN oicpcuhg.ir_billtype bt ON (bt.bill_type_key = enc.bill_type_key) LEFT OUTER JOIN oicpcuhg.ir_contract cnt ON (cnt.contract_key = enc.contract_key) LEFT OUTER JOIN oicpcuhg.ir_datasource ds ON (ds.source_key = enc.data_source_key) LEFT OUTER JOIN oicpcuhg.ir_lineofbusiness lob ON (lob.lob_key = enc.lob_key) INNER JOIN oicpcuhg.ir_member m ON ( m.hp_cd = enc.hp_cd AND m.member_source_cd = enc.member_source_cd AND m.member_nbr = enc.member_nbr) LEFT OUTER JOIN oicpcuhg.ir_encountererror eerror ON (eerror.encounter_key = enc.encounter_key and eerror.active_flg = 'Y') LEFT OUTER JOIN oicpcuhg.ir_error eerr ON (eerr.error_key = eerror.error_key) LEFT OUTER JOIN oicpcuhg.ir_provider prv ON (prv.hp_cd = enc.hp_cd and prv.provider_source_cd = enc.provider_source_cd and prv.provider_nbr = enc.provider_nbr)
![Page 35: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/35.jpg)
®© 2014 MapR Technologies 35
LEFT OUTER JOIN oicpcuhg.ir_encounterspecialty esp ON (esp.encounter_key = enc.encounter_key) LEFT OUTER JOIN oicpcuhg.ir_specialty sp ON (sp.specialty_key = esp.specialty_key) LEFT OUTER JOIN oicpcuhg.ir_service svc ON (svc.encounter_key = enc.encounter_key) LEFT OUTER JOIN oicpcuhg.ir_revenue rev ON (rev.rev_cd = svc.rev_cd) LEFT OUTER JOIN oicpcuhg.ir_diagnosis diag ON (diag.encounter_key = enc.encounter_key) INNER JOIN oicpcuhg.ir_diagcd dgcd ON (dgcd.diag_cd_key = diag.diag_cd_key) INNER JOIN oicpcuhg.ir_recordstate st ON (st.rec_state_key = diag.rec_state_key) INNER JOIN oicpcuhg.ir_recordstatus sts ON (sts.rec_status_key = diag.rec_status_key) LEFT OUTER JOIN oicpcuhg.ir_diagnosiserror derror ON (derror.diagnosis_key = diag.diagnosis_key and derror.active_flg = 'Y') LEFT OUTER JOIN oicpcuhg.ir_error derr ON (derr.error_key = derror.error_key)) IR INNER JOIN oicpcuhg.umr_req_inbound umr ON (trim(umr.member_nbr) = IR.member_Nbr AND trim(umr.hhc_from_ccyymmdd) = IR.TDOS AND trim(umr.sub_mcare_mbr) = IR.HIC_NBR AND trim(umr.diag1) = IR.diag_cd)
![Page 36: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/36.jpg)
®© 2014 MapR Technologies 36
One Attack • The customer can’t give you the data
– They can’t trust you, by law
• But they can probably summarize the data – How many columns – What types – Perhaps statistical summaries
![Page 37: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/37.jpg)
®© 2014 MapR Technologies 37
Bug Replication Without Security Violation
Customer You
Data Data
Data Fake
Data Fake
x y α ξ
x y α ξ
![Page 38: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/38.jpg)
®© 2014 MapR Technologies 38
The Upshot • So random numbers are useful
• But simple distributions not so much
• How can YOU generate cool data?
![Page 39: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/39.jpg)
®© 2014 MapR Technologies 39
e-book available courtesy of MapR
http://bit.ly/1jQ9QuL
A New Look at Anomaly Detection by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)
![Page 40: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/40.jpg)
®© 2014 MapR Technologies 40
Last October: Time Series Databases by Ted Dunning and Ellen Friedman © Oct 2014 (published by O’Reilly)
Time Series Databases
Ted Dunning &
Ellen Friedman
New Ways to Store and Access
![Page 41: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/41.jpg)
®© 2014 MapR Technologies 41
Coming in February: Real World Hadoop by Ted Dunning and Ellen Friedman © Feb 2015 (published by O’Reilly)
![Page 42: Practical Computing Wiith Chaos](https://reader030.vdocuments.net/reader030/viewer/2022020218/55b6c653bb61ebdd768b4614/html5/thumbnails/42.jpg)
®© 2014 MapR Technologies 42
Thank you for coming today!