data-intensive text processing with mapreduce(ch1,ch2)

Click here to load reader

Post on 27-Jun-2015

5.157 views

Category:

Technology

1 download

Embed Size (px)

DESCRIPTION

This document is written about "Data-Intensive Text Processing with MapReduce" Chapter 1 and 2. It includes basic topics about MapReduce programming model. Next document will be more advanced topics.

TRANSCRIPT

  • 1. Data-Intensive Text Processingwith MapReduce(ch1, ch2)2010/09/23 shiumachi http://d.hatena.ne.jp/shiumachi/ http://twitter.com/shiumachi

2. Agenda 1 2MapReduce 2.1 2.2 mapper reducer 2.3 2.4 Partitioner Combiner 2.5 2.6 Hadoop 3. Jimmy Lin, Chris Dyer Data-Intensive Text Processing with MapReduce 3 shiumachi, marqs 4. Chapter 1. Introduction 1 5. MapReduce NLP()IR() HadoopHadoop Hadoop [3] 6. Big Data Big Data(TB) Big Data Big Data etc... 7. Big Data Google200820PBMapReduce eBay10PB FaceBook201015PB [1] 8. Big Data Why How 9. [2] Everything as a Service 10. IaaS, PaaS, SaaS MapReduce IaaS PaaS, SaaS MapReduce 11. IaaS, MapReduceIaaS MapReduce ( etc...) 12. MapReduce Ap 13. MapReduce MapReduce PRAM, LogP, BSP, etc... MapReduce MapReduce 14. Chapter 2. MapReduce Basics2MapReduce 15. Divide and Conquer 16. MapReduce (2.1) mapper, reducer (2.2) (2.3) partitioner, combiner (2.4) (2.5) Hadoop (2.6) 17. 2.1 18. MapReduce ()2 map, reduce 2 map, fold 19. 2.1 mapfold map(f) fold(g) 20. MapReduce 1. 2.1. 21. MapReduce 1.MapReduce 2.MapReduce 3.MapReduce Google MapReduce(GMR), Hadoop CPUGPGPUCELL Hadoop 22. 2.2 mapper reducer 23. MapReduce MapReduce Protocol Buffers, Thrift, Avro Message Pack = [ ( url, html ) ] = [ ( id, [ id ] ] 24. mapper reducer map, reduce map: (k1, v1) [(k2, v2)]reduce: (k2, [v2] ) [(k3, v3)] map reduce group by 25. 2.2 MapReduce() map shuffle sort reduce 26. 27. GMR Hadoop GMRHadoop ( HadoopHadoop [3] ) GMRreducer Hadoop 28. mapper, reducer map (Ch.3) (4.4, 6.5) 29. mapreduce BigTable Hbase ()MapReduce pi 30. 2.3 31. mapper + reducer ( + combiner , partitioner ) + Hadoop 32. / reducemap 33. 2.4 Partitioner Combiner 34. Partitioner mapreducer mod Hadoop HashPartitioner Partitioner Hadoop 35. Combiner mapreducer reducer reducer map Hadoop (0) 36. 2.4 MapReduce() 2.2 combiner partitioner Hadoop 37. 2.5 38. HPCNAS/SAN 10GbEInfiniband MapReduce(DFS) DFSMR 39. DFS 64MB Google File System(GFS) GFS Master Hadoop Distributed File System(HDFS) namenode 40. HDFS ( ) 2.5 41. 2.5 HDFS () 42. Hadoop3 2 RF 4 1 43. DFS (GB) POSIX Hadoop0.21Kerberos Experimental 44. DFS CAPC P()DFS GFS,HDFSA()C() SPOF 45. 2.6 Hadoop 46. 2.6 Hadoop () 47. Hadoop(JobTracker)TaskTracker Map Reduce mapreduce Hadoop API (Java) 48. 49. 1. Facebook has the world's largest Hadoop cluster!, Facebook has the world's largest Hadoop cluster!, http://hadoopblog.blogspot.com/2010/05/facebook-has- worlds-largest-hadoop.html 2. , wikipedia, http://ja.wikipedia.org/wiki/ %E3%83%A6%E3%83%BC%E3%83%86%E3%82%A3%E3%83%AA %E3%83%86%E3%82%A3%E3%82%B3%E3%83%B3%E3%83%94%E3%83 %A5%E3%83%BC%E3%83%86%E3%82%A3%E3%83%B3%E3%82%B0 3.Hadoop, Tom White, , 2009 4. 50. Thank you !

View more