stream processing in go
Embed Size (px)
TRANSCRIPT

1
Stream Processing In Go
Khosrow AfroozehSunil Sayyaparaju

2
Streams are the Norm● Need for BusinessAnalyticsgenerates endlessstreams of data
● HorizontalScaling adds tothe number ofstreams
● Stream variety ison the rise
● Streams need tobe composed andco-processed

3
Stream
●Arrays●Slices●Channels●Buffers●Files●Database Queries●...

4
Stream Elements
No Generics In Go, so stream elements are boxedobjects:
interface{}
● There is no type-safety for generic streamprocessing.
● Not a big deal really, Schemaless datasourcesreturn interfaces anyway.
● It can be easily managed by runtime type-checking in the first step of the pipeline.

5
Classic Collections

6
Traditional Compositions 1
stream 1 <Record>
stream
2 <Cl
oud>
stream1.Join(stream2).Filter(...)
API InterfaceProblem

7
Traditional Compositions 2
stream 1 <Record>
stream
2 <Cl
oud>
Join(stream1, stream2)
Lots of Gophers Needed forPipelining, Signature Problem
Still Unsolved
Filter(stream3, ...)
stream3

8
Problem
● Don’t want to code1 unlessabsolutely necessary
● Don’t want to repeat ourselves● More code leads to more maintenanceand testing
1 not on company hours at least! YMMV.

9
Abstraction Goals
● Data processing should be decoupledfrom data structures.
● Compositions should happen on data, not datastructures.
Note: <T> denotes type. This is not valid Gocode.
Note: f and m are functions, e.g:
f(value interface{}) bool m(value interface{}) interface{}

10
Abstraction Goals Cont’d
● Data should not be transportedduring transformation, unlessnecessary.

11
Transducers1
1 Idea inspired by Clojure. Fair enough, they got inspired by channels ;)

12
Transducers Impl.

13
Reducer
● Responsible for chaining of the pipeline:
stream → t1 → t2 → … → tn → reducer → result

14
Transducers Impl. Example

15
Transduction
● Flush is used when some function in thechain would like to eject the operation.
● When all the data in the stream has beenprocessed or a flush has been requested,method Complete() is called to capturethe states in the stateful reducers.
Chain of functions call eachother:
f, m => m(f(val))

16
Example

17
Observations● Cons
– No compile-time type safety– Tricky to parallelize
● Pros– Fewer Go-routines for long pipelines– Fewer synchronizations For channels– Potentially uses less memory– Decoupled processing logic from data structures– Better compositions– More readable

18
Thank You
Khosrow Afroozeh:
● @parshua
Sunil Sayyaparajou