fundamental big data
Post on 01-Oct-2021
0 Views
Preview:
TRANSCRIPT
SocialCompu,ngandBigDataAnaly,cs
社群運算與大數據分析
1
1042SCBDA03MISMBA(M2226)(8628)
Wed,8,9,(15:10-17:00)(Q201)
Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark Ecosystem
(大數據基礎:MapReduce典範、Hadoop與Spark生態系統)
Min-Yuh Day 戴敏育
Assistant Professor 專任助理教授
Dept. of Information Management, Tamkang University 淡江大學 資訊管理學系
http://mail. tku.edu.tw/myday/
2016-03-02
TamkangUniversity
TamkangUniversity
週次(Week)日期(Date)內容(Subject/Topics)12016/02/17CourseOrientaLonforSocialCompuLngand
BigDataAnalyLcs(社群運算與大數據分析課程介紹)
22016/02/24DataScienceandBigDataAnalyLcs:Discovering,Analyzing,VisualizingandPresenLngData(資料科學與大數據分析:探索、分析、視覺化與呈現資料)
32016/03/02FundamentalBigData:MapReduceParadigm,HadoopandSparkEcosystem(大數據基礎:MapReduce典範、Hadoop與Spark生態系統)
課程大綱 (Syllabus)
2
週次(Week)日期(Date)內容(Subject/Topics)42016/03/09BigDataProcessingPlaZormswithSMACK:
Spark,Mesos,Akka,CassandraandKa^a(大數據處理平台SMACK:Spark,Mesos,Akka,Cassandra,Ka^a)
52016/03/16BigDataAnalyLcswithNumpyinPython(PythonNumpy大數據分析)
62016/03/23FinanceBigDataAnalyLcswithPandasinPython(PythonPandas財務大數據分析)
72016/03/30TextMiningTechniquesandNaturalLanguageProcessing(文字探勘分析技術與自然語言處理)
82016/04/06Off-campusstudy(教學行政觀摩日)
課程大綱 (Syllabus)
3
週次(Week)日期(Date)內容(Subject/Topics)92016/04/13SocialMediaMarkeLngAnalyLcs
(社群媒體行銷分析)102016/04/20期中報告 (MidtermProjectReport)112016/04/27DeepLearningwithTheanoandKerasinPython
(PythonTheano和 Keras深度學習)122016/05/04DeepLearningwithGoogleTensorFlow
(GoogleTensorFlow深度學習)132016/05/11SenLmentAnalysisonSocialMediawith
DeepLearning(深度學習社群媒體情感分析)
課程大綱 (Syllabus)
4
週次(Week)日期(Date)內容(Subject/Topics)142016/05/18SocialNetworkAnalysis(社會網絡分析)152016/05/25MeasurementsofSocialNetwork(社會網絡量測)162016/06/01ToolsofSocialNetworkAnalysis
(社會網絡分析工具)172016/06/08FinalProjectPresentaLonI(期末報告 I)182016/06/15FinalProjectPresentaLonII(期末報告 II)
課程大綱 (Syllabus)
5
2016/03/02FundamentalBigData:MapReduceParadigm,
HadoopandSparkEcosystem(大數據基礎:
MapReduce典範、Hadoop與Spark生態系統)
6
ArchitectureofBigDataAnaly,cs
7 Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications
DataMining
OLAP
Reports
QueriesHadoop
MapReducePigHiveJaql
ZookeeperHbase
CassandraOozieAvro
MahoutOthers
Middleware
ExtractTransform
Load
DataWarehouse
TradiLonalFormat
CSV,Tables
*Internal
*External
*MulLpleformats
*MulLplelocaLons
*MulLpleapplicaLons
BigDataSources
BigDataTransforma,on
BigDataPlaPorms&Tools
BigDataAnaly,cs
Applica,ons
BigDataAnaly,cs
TransformedData
RawData
BusinessIntelligence(BI)Infrastructure
8 Source:KennethC.Laudon&JaneP.Laudon(2014),ManagementInformaLonSystems:ManagingtheDigitalFirm,ThirteenthEdiLon,Pearson.
FundamentalBigData:MapReduceParadigm,HadoopandSpark
Ecosystem
9
10 Source: https://www.thalesgroup.com/en/worldwide/big-data/big-data-big-analytics-visual-analytics-what-does-it-all-mean
MapReduceParadigm
11
MapReduceParadigm
12
BigData
Map0 Map1 Map2 Map3
Reduce0 Reduce1 Reduce2 Reduce3
Map
Reduce MapReduceData
OutputData
HadoopEcosystem
13
TheApache™Hadoop®projectdevelopsopen-sourcesoWware
forreliable,scalable,distributedcompu,ng.
14 Source: http://hadoop.apache.org/
15
HDFS
MapReduce Processing
Storage
Source: http://hadoop.apache.org/
BigDatawithHadoopArchitecture
16 Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
17
BigDatawithHadoopArchitectureLogicalArchitectureProcessing:MapReduce
Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
18
BigDatawithHadoopArchitectureLogicalArchitecture
Storage:HDFS
Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
19
BigDatawithHadoopArchitectureProcessFlow
Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
20
BigDatawithHadoopArchitectureHadoopCluster
Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
HadoopEcosystem
21 Source: Shiva Achari (2015), Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop, Packt Publishing
HDP(HortonworksDataPlaPorm)ACompleteEnterpriseHadoopDataPlaPorm
22 Source: http://hortonworks.com/hdp/
ApacheHadoopHortonworksDataPlaPorm
23 Source: http://hortonworks.com/hdp/
HadoopandDataAnaly,csTools
24 Source: http://hortonworks.com/hdp/
Hadoop1àHadoop2
25 Source: http://hortonworks.com/hadoop/tez/
BigDataSolu,on
26 Source: http://www.newera-technologies.com/big-data-solution.html
Tradi,onalETLArchitecture
27 Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
28 Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
OffloadETLwithHadoop(BigDataArchitecture)
SparkEcosystem
29
ApacheSparkisafastandgeneralengine
forlarge-scaledataprocessing.
30
Lightning-fast cluster computing
Source: http://spark.apache.org/
Logis,cregressioninHadoopandSpark
31
Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Source: http://spark.apache.org/
EaseofUse
• WriteapplicaLonsquicklyinJava,Scala,Python,R.
32 Source: http://spark.apache.org/
WordcountinSpark'sPythonAPI
text_file=spark.textFile("hdfs://...")text_file.flatMap(lambdaline:line.split()).map(lambdaword:(word,1)).reduceByKey(lambdaa,b:a+b)
33 Source: http://spark.apache.org/
SparkandHadoop
34 Source: http://spark.apache.org/
SparkEcosystem
35 Source: http://spark.apache.org/
SparkEcosystem
36 Source: Mike Frampton (2015), Mastering Apache Spark, Packt Publishing
Spark
GraphX(graph)
SparkSQL
Mllib(machinelearning)
SparkStreaming
Kaba Flume H2O Hive
Cassandra
Titan
HBase
HDFS
Hadoopvs.Spark
37 Source: Shiva Achari (2015), Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop, Packt Publishing
Iter.1
Iter.1
Iter.2
Iter.2
Input
Input
HDFS read
HDFS read
HDFS write
HDFS write
StepstoInstallHadoop
onaPersonalComputer(Windows/OSX)
38 Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5
Hodoop:LinuxBasedSoWware
39
LINUX
LINUX
LINUX
LINUX
Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5
Appliance
40
HadoopLinux
Virtual Machine (VirtualBox / VMWare)
Personal Computer (Windows / OS X)
Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5
Connec,ontoHadoop
41
HadoopLinux
Virtual Machine (VirtualBox / VMWare)
Personal Computer (Windows / OS X)
Browser
Accessfromhost
Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5
StepstoInstallHadooponaPersonalComputer(Windows/OSX)
42 Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5
Step1.DownloadandInstallVirtualBox
Step2.DownloadAppliance
Step3.ImportAppliance
Step4.ConfigureVirtualMachine(VM)
Step5.StartVirtualMachine(VM)
Step6.TestConnecLonFromHost
VirtualBox
43 https://www.virtualbox.org/
StepstoInstallHadooponaPersonalComputer(Windows/OSX)
44 Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5
Step1.DownloadandInstallVirtualBox
Step2.DownloadAppliance
Step3.ImportAppliance
Step4.ConfigureVirtualMachine(VM)
Step5.StartVirtualMachine(VM)
Step6.TestConnecLonFromHost
Hortonworks Sandbox
HortonworksSandboxTheeasiestwaytogetstartedwithEnterpriseHadoop
45 http://hortonworks.com/products/hortonworks-sandbox/#install
GetstartedonHadoopwiththesetutorialsbasedontheHortonworksSandbox
46 http://hortonworks.com/tutorials/
ApacheHadoop
47 http://hadoop.apache.org/
48
ApacheHadoophttp://hadoop.apache.org/releases.html#Download
49
ApacheHadoop
Source: http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/releasenotes.html
ApacheHadoop2.7.2
50 Source: http://hadoop.apache.org/docs/r2.7.2/
Hadoop:SefngupaSingleNodeCluster
51 Source: http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html
HadoopClusterSetup
52 Source: http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/ClusterSetup.html
ApacheHadoopYARN
53 Source: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
ApacheSpark
54 http://spark.apache.org/
References • EMCEducaLonServices(2015),
DataScienceandBigDataAnalyLcs:Discovering,Analyzing,VisualizingandPresenLngData,Wiley
• ShivaAchari(2015),HadoopEssenLals-TacklingtheChallengesofBigDatawithHadoop,PacktPublishing
• MikeFrampton(2015),MasteringApacheSpark,PacktPublishing
• DeepakRamanathan(2014),SASModernizaLonarchitectures-BigDataAnalyLcs,hrp://www.slideshare.net/deepakramanathan/sas-modernizaLon-architectures-big-data-analyLcs
55
top related