Download - 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀
![Page 1: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/1.jpg)
+
100062108 李智宇、100062116 林威宏、100062220 施閔耀
![Page 2: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/2.jpg)
+OutlineIntroduction
Architecture of Hadoop
HDFS
MapReduce
Comparison
Why Hadoop
Conclusion
2
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
![Page 3: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/3.jpg)
+What is Hadoop ? open-source software framework
process and store big data
Easy to use and implement, economic, flexible
lots of nodes(server)
written in JAVA
free license
created by Doug Cutting and Mike Cafarella in 2005
3
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
![Page 4: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/4.jpg)
+Advantages of Interpreted Language
Cross-platform(ex: Windows, Ubuntu, Mac OS X)
smaller executable program size
easier to modify during both development and execution
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
4
![Page 5: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/5.jpg)
+Architecture of Hadoop
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
5
![Page 6: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/6.jpg)
+Hadoop in Enterprise
6
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
The Dell representation of the Hadoop ecosystem.
![Page 7: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/7.jpg)
+Hadoop in Enterprise
7
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
![Page 8: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/8.jpg)
+Who is using Hadoop ?
more than half of the Fortune 50 uses Hadoop by 2013
8
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
![Page 9: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/9.jpg)
+HDFSHadoop Distributed File System
Client: user
name node: manage and store metadata, namespace of files
Data node: store files
each data node sends its status to name node periodically
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
9
![Page 10: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/10.jpg)
+HDFS: Writing data in HDFS Each file will be divided into blocks(in size 64
or 128MB) , and have three copies in different data nodes.
Client asks name node to get a list of data node sorted by distance, and send the file to the nearest one , then the data node will send the file to the rest node.
When above operation done, data node will send “done” to name node.
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
10
![Page 11: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/11.jpg)
+HDFS: Reading data in HDFSClient send filename to the name
node , then the name node will send a list of the blocks of files sorted by distance.
Client use the list to get the file from data node.
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
11
![Page 12: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/12.jpg)
+HDFS: failurenode failure
communication failure
data corruption
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
12
![Page 13: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/13.jpg)
+HDFS: handle failureHandle writing failure:
name node will skip the data node without an ACK.
Handle reading failure:recall that when reading a file, client will get a list of data node content the file.
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
13
![Page 14: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/14.jpg)
+HDFS: handle failureName node handle node failure :
name node will find out the data the failure node have, and copy those data from others and restore them to other data node.
Note that HDFS can’t guarantee at least one copy of data is alive.
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
14
![Page 15: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/15.jpg)
+MapReducesimilar to divide-and-conquer
First, use “Map” to divide tasks
Second, use “Shuffle” to “transfer the data from the mapper nodes to a reducer’s node and decompress if needed. “
Third, use “Reduce” to “execute the user-defined reduce function to produce the final output data. “
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
15
![Page 16: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/16.jpg)
+MapReduce-Map
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
16
![Page 17: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/17.jpg)
+MapReduce-shuffle
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
17
![Page 18: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/18.jpg)
+MapReduce-Reduce
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
18
![Page 19: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/19.jpg)
+MapReduce
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
19
![Page 20: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/20.jpg)
+Comparison
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
20
![Page 21: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/21.jpg)
+Comparison
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
21
![Page 22: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/22.jpg)
+Why Hadoop?technically
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
22
Comparison of Grep Task Result with Vertica and DBMS-X
![Page 23: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/23.jpg)
+Why Hadoop?
Simple structure vs. Optimization
Transaction time not minimized
Lower performance with same number of nodes
No compelling reason to choose Hadoop
technically
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
23
![Page 24: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/24.jpg)
+Why Hadoop?commercially
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
24
![Page 25: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/25.jpg)
+Why Hadoop
Cheap (Buy more servers to beat DBMS)
Flexible (Both in design and deployment)
Easier to design
Easier to scale up
Combine with other system to achieve better performance
commercially
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
25
![Page 26: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/26.jpg)
+ConclusionHadoop is much easier for users to
implement and more economic
MapReduce advocates should study the techniques used in parallel DBMSs
Hybrid systems are also popular
With improvement of performance, we believe Hadoop will lead the trend of big data computing
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
26
![Page 27: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/27.jpg)
+Reference http://hadoop.apache.org/
http://www.runpc.com.tw/content/cloud_content.aspx?id=105318
http://en.wikipedia.org/wiki/Apache_Hadoo
https://www.facebookbrand.com/
http://assets.fontsinuse.com/static/use-media-items/15/14246/full-2048x768/522903b7/Yahoo_Logo.png
http://wiki.apache.org/hadoop/PoweredBy
http://semiaccurate.com/assets/uploads/2011/09/Amazon-logo.jpg
http://www.conceptcupboard.com/blog/wp-content/uploads/2013/09/google.jpg
27
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
![Page 28: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/28.jpg)
+Reference http://datashieldcorp.com/files/2013/11/adobe-LOGO-2.jpg
http://upload.wikimedia.org/wikipedia/commons/7/77/The_New_York_Times_logo.png
http://i.dell.com/sites/content/business/solutions/whitepapers/en/Documents/hadoop-introduction.pdf
http://hadoop.intel.com/pdfs/IntelDistributionReferenceArchitecture.pdf
http://www.google.com.tw/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.classcloud.org%2Fcloud%2Fraw-attachment%2Fwiki%2FHinet100402%2F02.HadoopOverview.pdf&ei=IE2XUtLfBMfxiAea_oHQCA&usg=AFQjCNFoIXxLJrOnoul4cKJpQ8v3_kuTYg
28
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
![Page 29: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/29.jpg)
+Reference http://www.accenture.com/SiteCollectionDocuments/PDF/
Accenture-Hadoop-Deployment-Comparison-Study.pdf
https://www.google.com.tw/url?sa=t&rct=j&q&esrc=s&source=web&cd=1&ved=0CCkQFjAA&url=http%3A%2F%2Fwww.psgtech.edu%2Fyrgcc%2Fattach%2FMAP%2520REDUCE%2520PROGRAMMING.ppt&ei=7lGXUtvCJsy5iAfWtYH4Bw&usg=AFQjCNGWRKJLal-tvbvORULZV6_Te2y74g&sig2=Ba77ihsV1SEqcNeEFkRzfg
https://www.cs.duke.edu/starfish/files/hadoop-models.pdf
http://dotnetmis91.blogspot.tw/2010/04/hdfs-hadoop-mapreduce.html
http://wiki.apache.org/hadoop/HDFS
http://www.ewdna.com/2013/04/Hadoop-HDFS-Comics.html
29
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
![Page 30: 100062108 李智宇、 100062116 林威宏、 1 00062220 施閔耀](https://reader033.vdocuments.net/reader033/viewer/2022061610/56815465550346895dc27fa1/html5/thumbnails/30.jpg)
+Reference http://en.wikipedia.org/wiki/Interpreted_language
A Comparison of Approaches to Large-Scale Data Analysis by Sam Madden
http://www.cc.ntu.edu.tw/chinese/epaper/0011/20091220_1106.htm
http://web.cs.wpi.edu/~cs561/s12/Lectures/6/Hadoop.pdf
http://www.mobilemartin.com/mobile/show-me-the-mobile-money.jpg
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
30