building a hadoop connector
DESCRIPTION
This presentation was made during the HUG London Meetup: SQL and NoSQL on Hadoop – A look at performance. Speakers: Alex Bordei- Techie Product Manager at Bigstep, Calin Burloiu- Big Data Engineer at Avira and Radu Pastia - Big Data Team Leader at Avira. We worked with Avira to show how much throughput that can be squeezed from a Hadoop connector. Together we have benchmarked Couchdoop for performance and talked about the behavior you can expect and tweaks that can improve the performance of your big data setup. If you have any questions, we will be glad to provide you with any additional information.TRANSCRIPT
pastiaro.wordpress.com
@rpastia
Building a connector – The Wrong Way
Mapper Reducer
Building a connector – The Right Way
Mapper ReducerPartitioner
InputSplit
InputFormat
RecordReader
RecordWriter
OutputFormat
The InputFormat: From Input to Mapper--range 2014-09-01;2014-09-20
--number_of_mappers 4
2014-09-01 2014-09-022014-09-03
2014-09-04
2014-09-05
… … …
2014-09-06
2014-09-20
2014-09-01
2014-09-02
2014-09-05
.
.
.
Input Split 1
(2014-09-01-A; record A)
(2014-09-01-B; record B)
(2014-09-01-…; record …)
(2014-09-02-A; record A)
(2014-09-02-B; record B)
(2014-09-02-…; record …)
(2014-09-05-A; record A)
(2014-09-05-B; record B)
(2014-09-05-…; record …)
Record Reader 1
Mapper
The InputFormat: From Input to Mapper
--range 2014-09-01;2014-09-20
--number_of_mappers 4
2014-09-01 2014-09-022014-09-03
2014-09-04
2014-09-05
… … …
2014-09-06
2014-09-20
2014-09-01
2014-09-02
2014-09-05
.
.
.
Input Split 1
(2014-09-01-A; record A)
(2014-09-01-B; record B)
(2014-09-01-…; record …)
(2014-09-02-A; record A)
(2014-09-02-B; record B)
(2014-09-02-…; record …)
(2014-09-05-A; record A)
(2014-09-05-B; record B)
(2014-09-05-…; record …)
Record Reader 1
Mapper