lecture 1 (01/23, 01/28): introduction to big data ...kpzhang/teaching/budt... · lecture 1 (01/23,...
TRANSCRIPT
![Page 1: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/1.jpg)
Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2019
K. Zhang BUDT 758
![Page 2: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/2.jpg)
BusinessValueofBigDataandAI
![Page 3: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/3.jpg)
3
![Page 4: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/4.jpg)
4http://www.salesforcehacker.com/2014/11/hadoop-and-pig-come-to-salesforce.html
![Page 5: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/5.jpg)
“AirpalisaWeb-baseddata-explorationandSQLqueryinterfacethatrunsonPresto,thein-memorySQL-on-HadoopquerytechnologythatFacebookdonatedtoApacheopensourceinlate2013.AirbnbinventedAirpalbecauseitneededatoolthatwouldbemoreaccessibletodataanalystsandevenbusinessusers,notjustthe23-personAirbnbdatascienceteamthathandlesHiveandPrestoqueries.”
----Airbnb3/5/201511:35AM
5
![Page 6: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/6.jpg)
• “VCinvestmentinthespaceremainsvibrantandthefirstfewweeksof2016sawaflurryofannouncementsofbigfoundingroundsforlatestageBigDatastartups:DataDog($94M),BloomReach($56M),Qubole($30M),PlaceIQ($25M),etc.BigDatastartupsreceived$6.64Binventurecapitalinvestmentin2015,11%oftotaltechVC.”
6
http://www.goldmansachs.com/our-thinking/pages/big-data.html
![Page 7: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/7.jpg)
BigDataecosystem
7
![Page 8: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/8.jpg)
Theopensourcecommunity• Yahoo!
q Hadoop,Pigq PighidesJavaprogramming
• Facebookq Hive:providesSQLtypefunctionsforHadoopfiles
• Netflixq Hbase:massagebigdatatobelikeadatabase
• UCBerkeleyq Spark:in-memoryprocessingtoavoidthelowdiskI/O
• Twitterq Storm:nearreal-timestreamingdata
8
![Page 9: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/9.jpg)
Technologyisstillevolvingrapidly
9
![Page 10: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/10.jpg)
Andtheal-mightyAI!• 2012Matlab• 2013Caffe• 2014Theano• 2015Torch• 2016/7TensorFlow• 2018???(PyTorch)
• CNN,RNN,GANs…
• SergeyBrin@2017DavosWorldEconomicForum– https://www.youtube.com/watch?v=jYuCVcGxtNM
10
![Page 11: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/11.jpg)
So,what’sgoingon?
• Youneedcriticalthinkingtonotgetlost
11
![Page 12: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/12.jpg)
Howdoesdatageneratevalue?
12
![Page 13: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/13.jpg)
Bigdataprocesses
• Loaddata• Cleanupdata• Transformdata• Querydata• Machinelearning/deeplearning
13
![Page 14: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/14.jpg)
RealizingthebenefitsofBigData
• SettingupHadoopisjustthebeginning!q Itjustmeansthatyouareenabledtohandlethebigdata
q Butdoesnotguaranteeanybenefit!– Mightwasteyourmoneyanddivertyourattention.
14
![Page 15: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/15.jpg)
Theeasyones
• Fasterandcheaperq Inlate2007,theNewYorkTimeswantedtomakeavailableoverthewebitsentirearchiveofarticles,11millioninall,datingbackto1851.Four-terabytepileofimagesinTIFFformatneededtotranslatethatfour-terabytepileofTIFFsintomoreweb-friendlyPDFfiles.• Notaparticularlycomplicatedbutlargecomputingchore,
q requiringawholelotofcomputerprocessingtime.
15
![Page 16: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/16.jpg)
• asoftwareprogrammerattheTimes,DerekGottfrid,q playingaroundwithAmazonWebServices,ElasticComputeCloud
(EC2),• uploadedthefourterabytesofTIFFdataintoAmazon'sSimpleStorageSystem(S3)
• Inlessthan24hours,11millionsPDFs,allstoredneatlyinS3andreadytobeserveduptovisitorstotheTimessite.
• Thetotalcostforthecomputingjob?$240q 10centspercomputer-hourtimes100computerstimes24hours
16
![Page 17: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/17.jpg)
Howtomakedata“actionable”
• D-D-P-P
q Descriptive:whathappened?q Diagnostic:whydidithappen?q Predictive:whatislikelytohappen?q Prescriptive:whatisthebestcourseofaction?
17
CourtesyofCupidChan
![Page 18: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/18.jpg)
Traditionalvs.BigDataApproach
18
![Page 19: Lecture 1 (01/23, 01/28): Introduction to Big Data ...kpzhang/teaching/budt... · Lecture 1 (01/23, 01/28): Introduction to Big Data Decisions, Operations & Information Technologies](https://reader034.vdocuments.net/reader034/viewer/2022050302/5f6b92f1d3b0c7208b18f068/html5/thumbnails/19.jpg)
Adynamicprocess• Whatarethebusinessgoalsandcriticalissues?• Whatdatadoyouhave?• Whatdatacanyoupotentiallycapture?• Whatanalyticaltoolscouldbeapplied?
Goals Data
Goal:findbusinessquestionsthatcanharnessthe
powerofbigdata19