stream processing engine
Embed Size (px)
DESCRIPTION
stream processing engineTRANSCRIPT

Stream Processing Engine
王岩 2011-12-8

Agenda
• Architecture• Multi-thread• I/O• Further work

Purpose
• Users can query provenance
Stream processing engine
stream1
stream2
stream3
stream4
result
Provenance : the data in the original coming streams from which the result is generated
When the user get a strange result, he may be interested in the provenance
Why to query provenance?
Required : If the provenance of one tuple hasn’t been saved in the disk , the tuple would not be sent to the user.

ArchitectureLayered architecture
File layer
Buffer layer
Spe layer
The layer below will provides service to the layer aboveThe layer above will invoke the interface provided by the layer below
Stream processing engine
Buffer for provenance
Disk I/O

Spe layer
Metadata Cql parser
Store metadata of streams including cql statements and datatypes Parser the cql statement and
generate query plan tree
Query plan processor Utility
Process along the query plan tree Provide common service
Component view

Query Plan TreeEntity
leaf
leaf
leaf
leaf
select
join
join
join
root
leaf operatorselect operator
join operator
root operator

Operator class diagram
OperatorEntity
list<QueueEntity*>queueInput;list<QueueEntity*>queueOutput;...string id;
RootOperatorEntity
list<RelationTuple*>waitTupleList;...LeafOperat
orEntitySelectOperatorEntity JoinOperatorEntity
RelationWindow * relationWindow1;...RelationWindow * relationWindow2;...
leaf select join root
id

Query Plan TreeEntity
leaf
leaf
leaf
leaf
select
join
join
join
root
Common queue
Storage queue
Transportation queue

QueueEntity
OperatorEntity * operatorInput;OperatorEntity * operatorOutput;RelationSchema * relationSchema;...list<RelationTuple*>tupleList;
void Push()RelationTuple& pop()
StorageQueueEntity
void Push()
TransportationQueueEntity
void Push()
StorageTransportationQueueEntity
void Push()
Queue class diagram
Data flow
attribute1: integer attribute2: integer attribute3: string

Queue EntityMemory management
Continuous memory used as buffer Head : the head of the tuples in the
bufferTail : the tail of the tuples in the buffer
In a queue, we don’t allocate memory for each tuple , we allocate memory for the queue, and the tuples would be saved in the buffer of the queue.
When initialed, head and tail are the beginning address of the buffer
headtail
When a tuple arrives, the head will move forward the length of a tuple
head head
When a tuple leaves, the tail would move forward the length of a tuple
tail headtail
When there is no space for the new tuple , throw exception: need load shedding algorithm

Tuple Entity
RelationTuple
BYTE * bytes;int tuplePosition;int tupleLength;TIME timestamp;RelationSchema * relationSchema;map<string,list<int> >idMap;
The beginning address of the buffer
If the tuple is in a queue, it will use the buffer of the queue.If not, it will create its own buffer
The offset in the buffer
The tuple length
The timestamp of the tuple
The relation schema with the tuple
The map saves the provenance of the tupleMap[“s1”]=list{“id1”,”id2”}Map[“s2”]=list{“id4”}

Buffer layer
The buffer control class provides an interface of the layer.The upper layer needn’t to interact with other classes in the buffer layer.If we change the implementation of the buffer layer, we needn’t to change the code of the layer above as long as we maintain the same interface.
Façade design patternSingleton design pattern
File layer
Buffer layer
Spe layer
BufferControl
BufferControl
instance : singleton...
getInstance()insert()delete()toBeStored()storing()isStored()query()

Provenance life cycle
File layer
Buffer layer
BufferControl
BufferControl
instance : singleton...
getInstance()insert()delete()toBeStored()storing()isStored()query()
The provenance arrives at the system, it is pushed to a queue.This queue is a storage queue, each tuple pushed in would be stored in the memory.Call the insert functionThe tuple make a copy on the memoryThen the tuple is processed along the query plan treeWhen the tuple arrives at a transportation queue for the streamCall the toBeStored functionInsert the id into a map for tuples to be storedThere will be another thread storing the tuple at sometime. It will
call the storing function.It will scan the map and see what provenance should be stored. And store the provenance in the fileWhen the tuple reaches the root operatorCall the isStored function to see if the provenance have been saved.At sometime the system will call the delete functionThe provenance may be deleted from the memoryAnother thread may call the query function to query provenanceIf the provenance has been stored, the tuple will be output to the client

PageContinuous memoryMay be 4kb, 16kb, …. 56kbpage In this system, pages are used to save two kinds of objects.
page Page for tuplestuple
tuple
tuple
pagebitmap
bitmap
bitmap
bitmap
bitmap
bitmap
bitmap
bitmap
bitmap
bitmap
bitmap
bitmap
Page for bitmapsMarkup the state of the tuple
0:not saved1:saved
Why to use bitmap?Because we should save the state for each tuple, not saved , saved 。We are able to use just use 1 bit for one tuple if we use bitmap.Just thinking about a stream about 10kb/s, each tuple is 8 bytes, then we can save 10*1024/8*0.875=1.12kb/s

Architecture for buffer layer
Hash table
List<Page*> List<Page*>
vector
buffer
list<Page*>unused list<Page*>used
Global buffer
page page page page page
page page page page page
page page
page*
page*
page*
page*
s2s1 s1 s2
page*
page*
page*
page*
page*
page*
tuplebitmap
Each name of stream would be hashed to a vector
Each vector would save the pointers of the pages that save the data of one stream.
The buffer for this hash table
The global buffer that all pages needed should be allocated from here.

Architecture for buffer layer
Hash table
List<Page*> List<Page*>
vector
buffer
list<Page*>unused list<Page*>used
Global buffer
page page page page page
page page page page page
page page
page*
page*
page*
page*
s2s1 s1 s2
page*
page*
page*
page*
page*
page*
Insert tuple o(1)
Just suppose that Page : 100 bytes
Tuple of stream1 : 10 bytesThen a page can store 10 tuples.
Suppose that a tuple from stream1 with an id of 21 comes now
Firstly, it will look up in the hash table to find the vector for stream1
Secondly ,it will see if the last page in the vector have space to save the tuple. If yes, the tuple will
be inserted into this page.
In this case, because each page can only save 10 tuples, the two pages just save 20 pages, so there is no space for the tuple with id of 21. so it will allocate a page from the
buffer
The buffer of the hash table would allocate the page from the global bufferThe global buffer just move a page from the unused list to the used list, and return this
page
page
Page*
Page*
And it is the same with the bitmapThen the page is added to the buffer and the vector. The tuple is inserted into this page

Buffer layer Sequence diagram for inserting a tuple
: BufferControl PageHashTable PageVector ProvenanceBuffer
StreamBuffer
ifStreamExist(streamName)
getInsertablePage(int id)
getMorePage()
page
page
true
getPage(int id)
page
push(data)
getOnePageToUse()
page

Architecture for buffer layer
Hash table
List<Page*> List<Page*>
vector
buffer
list<Page*>unused list<Page*>used
Global buffer
page page page page page
page page page page page
page page
page*
page*
page*
page*
s2s1 s1 s2
page*
page*
page*
page*
page*
page*
Find tuple o(1)
Suppose that the page is 100 bytesThe tuple of stream1 is 10 bytes
The first Id of the vector is 31
Suppose now we want to find the tuple with the identifier of 45
We just calculate 45-31/10=1It is the index of the tuple in the vector
And we calculate 45-31-10*1=4It is the offset of the page in the vector
As a result it is in the page of vector with index of 1, and offset of 4 in the page. Then we have found the tuple. It is the same
with the bitmap

Release the memoryIf we don’t release the memory used for saving provenance ,the memory would run out quicklyWe don’t release memory for one tuple each time , we just release memory for one page each time.We will look into every identifier of the provenance in the query plan tree. These identifiers are considered useful. And others are useless. Then the page contains no useful tuples would be deleted.leaf
leaf
leaf
leaf
select
join
join
join
root

Architecture for buffer layer
Hash table
List<Page*> List<Page*>
vector
buffer
list<Page*>unused list<Page*>used
Global buffer
page page page page page
page page page page page
page page
page*
page*
page*
page*
s2s1 s1 s2
page*
page*
page*
page*
page*
page*
Delete tuples o(nmp)
Suppose that one page is 100 bytesThe tuple of stream1 is 10 bytes
For releasing the memory ,we scan along the query plan tree, and we found the useful identify of stream1 are: 13, 14 ,16
And the first id of the vector is 1.
Then we know the first page of the vector contains no useful tuples, we will release it.
What we should do is just flush the page and move it from the used page list to the unused list
page
And delete the page from the buffer and vector. Update the first id of the vector. It is the same with the bitmap

User query for provenanceData flow diagram
When user don’t query for provenance
When user query for provenance , and the
provenance is in the pages in the memory
When user query for provenance , and the
provenance is not in the pages in the memory

Architecture for buffer layerQuery provenance
List<Page*> List<Page*>
list<Page*>unused list<Page*>used
page page page page page
page page page page page
page page
page*
page*
page*
page*
s2s1 s1 s2
page*
page*
page*
page*
page*
page*
page
tuplequery
When we query the provenance with the identify of 31 in stream1.
We will see if the provenance is in the buffer for tuples, if yes, we will find the provenance in this page.
If not, we must read the page from disk. We will read one page of data at one time.
page*
page*
page*
page*
page*
page*
If not, we will see if the provenance is in the page for query.In the buffer for query, we can set there are at most 5 pages here
If the buffer is full ,we must get one page outThe strategy may be LFU: least frequently used. For example ,we will flush the last page in the buffer for query, and read the data from disk here , then put the page to the beginning of the buffer.

Buffer layer
The client code needn’t to know the implementation details of the tuple and bitmap and query.
Abstract factory design pattern
AbstractBufferFactory
Page *createPage()PageHashTable* createPageHashTable()PageVector * createPageVecot()ProvenaceBuffer * createProcenanceBuffer()
AbstractPage AbstractPageHashTable
AbstractPageVector
AbstractProvenanceBuffer
TupleBufferFactory
BitMapBufferFactory
TuplePage TuplePageHashTable
TupleVector TuplePorvenanceBuffer
BitMapPage BitMapPageHashTable
BitMapPageVector
BitMapProvenanceBuffer
BufferControl
instance : singleton
getInstance()insert()delete()toBeStored()storing()isStored()query()
TuplePage
QueryPage QueryPageHashTable
QueryPageVector
QueryProvenanceBuffer
QueryFactory

Multi-threads• main tread : do most of things including
receiving data from streams.• Storing tread : save provenance• I/O thread : deal with I/O with clients
including registering streams, registering cqls, query provenance.

Lockdatastructure
ProvenanceMap
hashtable vector buffer globalbuffer page
type map map vector list List Unsigned char []
Lock logic Initialed
read
~read
Write
initialed
~write
read
~read
read
~read
write
initailed
~write
Write
~write
Write
~write
Write
write
Insert(thread 1)Read-write lock
tuple
thread-unsafe

Lock
datastructure
ProvenanceMap
hashtable vector buffer globalbuffer page
type map map vector list List Unsigned char []
Lock logic Write
~write
To be stored(thread 1)
Read-write lock

Lockdatastructure
ProvenanceMap
hashtable vector buffer globalbuffer
page
type map map vector list List Unsigned char []
Lock logic read
~read
Read
~read
Read
~read
Is tuple stored(thread 1)Read-write lock
bitmap

Lockdatastructure
ProvenanceMap
hashtable vector buffer globalbuffer
page
type map map vector list List Unsigned char []
Lock logic Read
~read
Read
~read
read
~read
Read
Write
released
~write
~read
Write
~write
delete(thread 1)Read-write lock
tuple

Lockdatastructure
ProvenanceMap
hashtable vector buffer globalbuffer
page
type map map vector list List Unsigned char []
Lock logic Write
~write
Read
~read
Read
~read
Read
~read
Read
~read
Read
~read
trywrite
~write
storing(thread 2)Read-write lock
tuple
bitmap

Lockdatastructure
ProvenanceMap
hashtable vector buffer globalbuffer
page
type map map vector list List Unsigned char []
Lock logic read
~read
Read
~read
tryread
~read
Read
~read
Read
~read
write
Initialed
~write
Write
~write
query(thread 3)Read-write lock
tuple
query

Lock optimization
• We should reduce the cost of lock management while increase the concurrency
• The lock for buffer is useless because all threads would make no conflicts on it. We can get rid of it.
• The lock for global buffer can be changed to a mutex.
• Some not important operations can just do trylock and trywrite.

Lock performance analysisFor the read-write lock we used:• allowing concurrent access to multiple threads for reading • restricting access to a single thread for writes• write-preferring• The smallest granularity : PagePerformance lost:• When we need to do some operations on one page.• Page for tuple : reader—storing thread, query thread writer—main thread• Page for bitmap :reader—main thread writer—storing thread• Page for query : all done in the I/O threadConclusion:• Likely to improve performance while needs experiments

Concurrency controlstudying

File layer
• When write a tuple into the file• Get the offset of the tail of the file• Append the tuple on the tail of the file• Flush the buffer• Add the offset and tuple identifier to the index• Use partitioned hash to implement the two-
dimensional index.
diskfile
Write a tuple

I/O
System
Registering streams
Registering cqls
stream1
stream2
stream3
stream4
Query provenance
We just implement it in the main thread.Must be non-blocking I/O
We don’t use one thread for one I/O. We implement them in one thread.It can be blocked when there is no need to read or write We will use I/O Multiplexing here.

• When an application needs to handle multiple I/O descriptors at the same time
• When I/O on any one descriptor can result in blocking
• It can be blocked until any of the I/O descriptors registered becomes able to read, write or throw exception.
What is I/O multiplexing?

epoll
• epoll is a scalable I/O event notification mechanism
• It is meant to replace the older POSIX select and poll system calls.
File descriptor:
Write:
Read:
Fd=0 Fd=1 Fd=2 fd=3
0
Fd=4
0 0 1 1
0 0 1 1 0
select

Further work
• Implement the multi-threads design, use a thread to save the provenance
• Implement the file layer design. Add an index to the provenance saved in the file
• Implement the I/O design

Thank you