fundamental big data

55
Social Compu,ng and Big Data Analy,cs 社群運算與大數據分析 1 1042SCBDA03 MIS MBA (M2226) (8628) Wed, 8,9, (15:10-17:00) (Q201) Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark Ecosystem (大數據基礎:MapReduce典範、HadoopSpark生態系統) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management , Tamkang University 淡江大學 資訊管理學系 http://mail. tku.edu.tw/myday/ 2016-03-02 Tamkang University Tamkang University

Upload: others

Post on 01-Oct-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fundamental Big Data

SocialCompu,ngandBigDataAnaly,cs

社群運算與大數據分析

1

1042SCBDA03MISMBA(M2226)(8628)

Wed,8,9,(15:10-17:00)(Q201)

Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark Ecosystem

(大數據基礎:MapReduce典範、Hadoop與Spark生態系統)

Min-Yuh Day 戴敏育

Assistant Professor 專任助理教授

Dept. of Information Management, Tamkang University 淡江大學 資訊管理學系

http://mail. tku.edu.tw/myday/

2016-03-02

TamkangUniversity

TamkangUniversity

Page 2: Fundamental Big Data

週次(Week)日期(Date)內容(Subject/Topics)12016/02/17CourseOrientaLonforSocialCompuLngand

BigDataAnalyLcs(社群運算與大數據分析課程介紹)

22016/02/24DataScienceandBigDataAnalyLcs:Discovering,Analyzing,VisualizingandPresenLngData(資料科學與大數據分析:探索、分析、視覺化與呈現資料)

32016/03/02FundamentalBigData:MapReduceParadigm,HadoopandSparkEcosystem(大數據基礎:MapReduce典範、Hadoop與Spark生態系統)

課程大綱 (Syllabus)

2

Page 3: Fundamental Big Data

週次(Week)日期(Date)內容(Subject/Topics)42016/03/09BigDataProcessingPlaZormswithSMACK:

Spark,Mesos,Akka,CassandraandKa^a(大數據處理平台SMACK:Spark,Mesos,Akka,Cassandra,Ka^a)

52016/03/16BigDataAnalyLcswithNumpyinPython(PythonNumpy大數據分析)

62016/03/23FinanceBigDataAnalyLcswithPandasinPython(PythonPandas財務大數據分析)

72016/03/30TextMiningTechniquesandNaturalLanguageProcessing(文字探勘分析技術與自然語言處理)

82016/04/06Off-campusstudy(教學行政觀摩日)

課程大綱 (Syllabus)

3

Page 4: Fundamental Big Data

週次(Week)日期(Date)內容(Subject/Topics)92016/04/13SocialMediaMarkeLngAnalyLcs

(社群媒體行銷分析)102016/04/20期中報告 (MidtermProjectReport)112016/04/27DeepLearningwithTheanoandKerasinPython

(PythonTheano和 Keras深度學習)122016/05/04DeepLearningwithGoogleTensorFlow

(GoogleTensorFlow深度學習)132016/05/11SenLmentAnalysisonSocialMediawith

DeepLearning(深度學習社群媒體情感分析)

課程大綱 (Syllabus)

4

Page 5: Fundamental Big Data

週次(Week)日期(Date)內容(Subject/Topics)142016/05/18SocialNetworkAnalysis(社會網絡分析)152016/05/25MeasurementsofSocialNetwork(社會網絡量測)162016/06/01ToolsofSocialNetworkAnalysis

(社會網絡分析工具)172016/06/08FinalProjectPresentaLonI(期末報告 I)182016/06/15FinalProjectPresentaLonII(期末報告 II)

課程大綱 (Syllabus)

5

Page 6: Fundamental Big Data

2016/03/02FundamentalBigData:MapReduceParadigm,

HadoopandSparkEcosystem(大數據基礎:

MapReduce典範、Hadoop與Spark生態系統)

6

Page 7: Fundamental Big Data

ArchitectureofBigDataAnaly,cs

7 Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications

DataMining

OLAP

Reports

QueriesHadoop

MapReducePigHiveJaql

ZookeeperHbase

CassandraOozieAvro

MahoutOthers

Middleware

ExtractTransform

Load

DataWarehouse

TradiLonalFormat

CSV,Tables

*Internal

*External

*MulLpleformats

*MulLplelocaLons

*MulLpleapplicaLons

BigDataSources

BigDataTransforma,on

BigDataPlaPorms&Tools

BigDataAnaly,cs

Applica,ons

BigDataAnaly,cs

TransformedData

RawData

Page 8: Fundamental Big Data

BusinessIntelligence(BI)Infrastructure

8 Source:KennethC.Laudon&JaneP.Laudon(2014),ManagementInformaLonSystems:ManagingtheDigitalFirm,ThirteenthEdiLon,Pearson.

Page 9: Fundamental Big Data

FundamentalBigData:MapReduceParadigm,HadoopandSpark

Ecosystem

9

Page 10: Fundamental Big Data

10 Source: https://www.thalesgroup.com/en/worldwide/big-data/big-data-big-analytics-visual-analytics-what-does-it-all-mean

Page 11: Fundamental Big Data

MapReduceParadigm

11

Page 12: Fundamental Big Data

MapReduceParadigm

12

BigData

Map0 Map1 Map2 Map3

Reduce0 Reduce1 Reduce2 Reduce3

Map

Reduce MapReduceData

OutputData

Page 13: Fundamental Big Data

HadoopEcosystem

13

Page 14: Fundamental Big Data

TheApache™Hadoop®projectdevelopsopen-sourcesoWware

forreliable,scalable,distributedcompu,ng.

14 Source: http://hadoop.apache.org/

Page 15: Fundamental Big Data

15

HDFS

MapReduce Processing

Storage

Source: http://hadoop.apache.org/

Page 16: Fundamental Big Data

BigDatawithHadoopArchitecture

16 Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

Page 17: Fundamental Big Data

17

BigDatawithHadoopArchitectureLogicalArchitectureProcessing:MapReduce

Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

Page 18: Fundamental Big Data

18

BigDatawithHadoopArchitectureLogicalArchitecture

Storage:HDFS

Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

Page 19: Fundamental Big Data

19

BigDatawithHadoopArchitectureProcessFlow

Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

Page 20: Fundamental Big Data

20

BigDatawithHadoopArchitectureHadoopCluster

Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

Page 21: Fundamental Big Data

HadoopEcosystem

21 Source: Shiva Achari (2015), Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop, Packt Publishing

Page 22: Fundamental Big Data

HDP(HortonworksDataPlaPorm)ACompleteEnterpriseHadoopDataPlaPorm

22 Source: http://hortonworks.com/hdp/

Page 23: Fundamental Big Data

ApacheHadoopHortonworksDataPlaPorm

23 Source: http://hortonworks.com/hdp/

Page 24: Fundamental Big Data

HadoopandDataAnaly,csTools

24 Source: http://hortonworks.com/hdp/

Page 25: Fundamental Big Data

Hadoop1àHadoop2

25 Source: http://hortonworks.com/hadoop/tez/

Page 26: Fundamental Big Data

BigDataSolu,on

26 Source: http://www.newera-technologies.com/big-data-solution.html

Page 27: Fundamental Big Data

Tradi,onalETLArchitecture

27 Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

Page 28: Fundamental Big Data

28 Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf

OffloadETLwithHadoop(BigDataArchitecture)

Page 29: Fundamental Big Data

SparkEcosystem

29

Page 30: Fundamental Big Data

ApacheSparkisafastandgeneralengine

forlarge-scaledataprocessing.

30

Lightning-fast cluster computing

Source: http://spark.apache.org/

Page 31: Fundamental Big Data

Logis,cregressioninHadoopandSpark

31

Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.

Source: http://spark.apache.org/

Page 32: Fundamental Big Data

EaseofUse

• WriteapplicaLonsquicklyinJava,Scala,Python,R.

32 Source: http://spark.apache.org/

Page 33: Fundamental Big Data

WordcountinSpark'sPythonAPI

text_file=spark.textFile("hdfs://...")text_file.flatMap(lambdaline:line.split()).map(lambdaword:(word,1)).reduceByKey(lambdaa,b:a+b)

33 Source: http://spark.apache.org/

Page 34: Fundamental Big Data

SparkandHadoop

34 Source: http://spark.apache.org/

Page 35: Fundamental Big Data

SparkEcosystem

35 Source: http://spark.apache.org/

Page 36: Fundamental Big Data

SparkEcosystem

36 Source: Mike Frampton (2015), Mastering Apache Spark, Packt Publishing

Spark

GraphX(graph)

SparkSQL

Mllib(machinelearning)

SparkStreaming

Kaba Flume H2O Hive

Cassandra

Titan

HBase

HDFS

Page 37: Fundamental Big Data

Hadoopvs.Spark

37 Source: Shiva Achari (2015), Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop, Packt Publishing

Iter.1

Iter.1

Iter.2

Iter.2

Input

Input

HDFS read

HDFS read

HDFS write

HDFS write

Page 38: Fundamental Big Data

StepstoInstallHadoop

onaPersonalComputer(Windows/OSX)

38 Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5

Page 39: Fundamental Big Data

Hodoop:LinuxBasedSoWware

39

LINUX

LINUX

LINUX

LINUX

Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5

Page 40: Fundamental Big Data

Appliance

40

HadoopLinux

Virtual Machine (VirtualBox / VMWare)

Personal Computer (Windows / OS X)

Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5

Page 41: Fundamental Big Data

Connec,ontoHadoop

41

HadoopLinux

Virtual Machine (VirtualBox / VMWare)

Personal Computer (Windows / OS X)

Browser

Accessfromhost

Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5

Page 42: Fundamental Big Data

StepstoInstallHadooponaPersonalComputer(Windows/OSX)

42 Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5

Step1.DownloadandInstallVirtualBox

Step2.DownloadAppliance

Step3.ImportAppliance

Step4.ConfigureVirtualMachine(VM)

Step5.StartVirtualMachine(VM)

Step6.TestConnecLonFromHost

Page 43: Fundamental Big Data

VirtualBox

43 https://www.virtualbox.org/

Page 44: Fundamental Big Data

StepstoInstallHadooponaPersonalComputer(Windows/OSX)

44 Source: https://www.youtube.com/watch?v=rO-V1mxhzcM&list=PLyZEf-TOnZen8E5m5TIpIsdok2fyKDNRa&index=5

Step1.DownloadandInstallVirtualBox

Step2.DownloadAppliance

Step3.ImportAppliance

Step4.ConfigureVirtualMachine(VM)

Step5.StartVirtualMachine(VM)

Step6.TestConnecLonFromHost

Hortonworks Sandbox

Page 45: Fundamental Big Data

HortonworksSandboxTheeasiestwaytogetstartedwithEnterpriseHadoop

45 http://hortonworks.com/products/hortonworks-sandbox/#install

Page 46: Fundamental Big Data

GetstartedonHadoopwiththesetutorialsbasedontheHortonworksSandbox

46 http://hortonworks.com/tutorials/

Page 47: Fundamental Big Data

ApacheHadoop

47 http://hadoop.apache.org/

Page 48: Fundamental Big Data

48

ApacheHadoophttp://hadoop.apache.org/releases.html#Download

Page 49: Fundamental Big Data

49

ApacheHadoop

Source: http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/releasenotes.html

Page 50: Fundamental Big Data

ApacheHadoop2.7.2

50 Source: http://hadoop.apache.org/docs/r2.7.2/

Page 51: Fundamental Big Data

Hadoop:SefngupaSingleNodeCluster

51 Source: http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html

Page 52: Fundamental Big Data

HadoopClusterSetup

52 Source: http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/ClusterSetup.html

Page 53: Fundamental Big Data

ApacheHadoopYARN

53 Source: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

Page 54: Fundamental Big Data

ApacheSpark

54 http://spark.apache.org/

Page 55: Fundamental Big Data

References •  EMCEducaLonServices(2015),

DataScienceandBigDataAnalyLcs:Discovering,Analyzing,VisualizingandPresenLngData,Wiley

•  ShivaAchari(2015),HadoopEssenLals-TacklingtheChallengesofBigDatawithHadoop,PacktPublishing

•  MikeFrampton(2015),MasteringApacheSpark,PacktPublishing

•  DeepakRamanathan(2014),SASModernizaLonarchitectures-BigDataAnalyLcs,hrp://www.slideshare.net/deepakramanathan/sas-modernizaLon-architectures-big-data-analyLcs

55