spud a distributed high performance publish-subscribe cluster
DESCRIPTION
SPUD A Distributed High Performance Publish-Subscribe Cluster. Uriel Peled and Tal Kol Guided by Edward Bortnikov Software Systems Laboratory Faculty of Electrical Engineering, Technion. Project Goal. Design and implement a general-purpose Publish-Subscribe server - PowerPoint PPT PresentationTRANSCRIPT
SPUDA Distributed High Performance
Publish-Subscribe Cluster
Uriel Peled and Tal Kol
Guided by Edward Bortnikov
Software Systems LaboratorySoftware Systems LaboratoryFaculty of Electrical Engineering, TechnionFaculty of Electrical Engineering, Technion
Project Goal
Design and implement a general-purpose Publish-Subscribe server
Push traditional implementations into global scale performance demands
1 million concurrent clientsMillions of concurrent topicsHigh transaction rate
Demonstrate server abilities with a fun client application
What is Pub/Sub?
topic://traffic-jams/ayalon
subscribe
publish
accident in hashalom
accident in hashalom
What Can We Do With It?Collaborative Web
Browsing
others: others:
What Can We Do With It?Instant Messaging
Hi buddy!
Hi buddy!
Seems Easy To Implement, But…
“I’m behind a NAT, I can’t connect!”Not all client setups are server friendly
“Server is too busy, try again later?!”1 million concurrent clients is simply too much
“The server is so slow!!!”Service time grows exponentially with load
“A server crashed, everything is lost!”
Single points of failure will eventually fail
Naïve Implementation(example 1)
Simple UDP for client-server communication
No need for sessions since we send messagesVery low cost-per-clientSounds perfect?
NAT
NAT Traversal
UDP hole punchingNAT will accept UDP reply for a short windowOur measurements: 15-30 secondsKeep UDP pinging from each client every 15s
Days-long TCP sessionsNAT remembers current sessions for repliesIf WWW works - we should workIncreases dramatically cost-per-clientOur research: all IM’s do exactly this
Naïve Implementation(example 2)
Blocking I/O with one thread per client
Basic model for most servers (JAVA default)Traditional UNIX – fork for every clientSounds perfect?
500clients
500clients
500clients
Network I/O InternalsBlocking I/O – one thread per client
2MB stack, 1GB virtual space enough for only 512 (!)
Non-blocking I/O - selectLinear fd searches are very slow
Asynchronous I/O – completion portsThread pool to handle request completionOur measurements: 30,000 concurrent clients!What is the bottleneck?
Number of locked pages (zero-byte receives)
TCP/IP kernel driver non-paged pool allocations
Scalability
Scale upBuy a bigger box
Scale outBuy more boxes
Which one to do?Both!Push each box to its hardware maximum
1000’s of servers is impractical
Add relevant boxes as load increasesThe Google way (cheap PC server farms)
Identify Our Load Factors
Concurrent TCP clientsScale up: async-I/O, 0-byte-recv, larger NPPScale out: dedicate boxes to handle clients=> Connection Server (CS)
High transaction throughput (topic load)
Scale up: software optimizationsScale out: dedicate boxes to handle topics => Topic Server (TS)
Design the cluster accordingly
Network Architecture
C1 T1
C2
C3
T2
T3
Room 1
C1 T1
C2
C3
T2
Room 2
C1
T1C2
C3
T2
T3
Room 3
CLB1
CLB2
Client Load Balancing
CLB CS1
CS2
CS3
TS1
TS2request CS
load balance:- user location- CS client load
given CS2
loginsubscribepublish
Topic Load BalancingStatic
CS
TS0
TS3
TS2
TS1
subscribe:traffic
Room 0
subscribe:923481%4
=1
Topic Load BalancingDynamic
TS1
CS
Room 0
Room 1
Room 2
TS1
TS1
subscribe
subscribeR0: 345KR1: ?R2: ?
subscribeR0: 345KR1: 278KR2: ?
subscribeR0: 345KR1: 278KR2: 301K
subscribe
R1: 278K
handlesubscribe
Performance PitfallsData Copies
Single instance - reference counting (REF_BLOCK)
Multi-buffer messages (MESSAGE: header, body, tail)
Context SwitchesFlexible module exec foundation (MODULE)
Processor num sized thread pools
Memory AllocationMM: custom memory pools (POOL, POOL_BLOCK)
fine-grained locking, pre-allocation, batching, single-size
Lock ContentionEVENT, MUTEX, RW_MUTEX, interlocked API
+Init()+Cleanup()+ReloadConfig()+Start()+ThreadMain()
-Parent : Module-CompletionPort : int-NumThreads : int
General::Module
+ShowHelp()+HandleCommand()
-ServerType : int
General::Application
+ReloadFile()+GetStringParam()+GetDwordParam()+GetBooleanParam()+GetIpParam()
-Filename : char*-Values : struct
General::Config
1
+Log()+DebugLog()+Assert()
-Debug : bool
General::Log
1
+UpdateValue()+GetStats()+WriteStatsToFile()+RequestStatsFromAllServers()+PrintStatsString()
-Values : struct
General::Stats
1
+AllocBlock()+AddFreeBlocks()
-SizeLists : struct
Pool
1
-Size : int-Body : char[]
Memory::BodyBlock
-Header : struct
Memory::HeaderBlock
1 1
+operator new()+operator delete()
-BlockSize : int
Memory::PoolBlock
+RefcountInc()+RefcountDec()+Free()
-Refcount : int
Memory::RefBlock
CLBSpecific::ClbServer CSSpecific::CsServer TSSpecific::TsServer
Class Diagram (Application)
Class Diagram (TS, CS)+Init()+Cleanup()+ReloadConfig()+Start()+ThreadMain()
-Parent : Module-CompletionPort : int-NumThreads : int
General::Module
+ShowHelp()+HandleCommand()
-ServerType : int
General::Application
+StartServer()+ConnectSocket()+DoSendOperation()+DoReceiveOperation()
IOStack::TcpIO
+StartServer()+DoSendOperation()+DoReceiveOperarion()
IOStack::UdpIO
1
1
+StartServer()+ConnectSocket()+DoSendOperation()+DoReceiveOperation()
-ServerSocket : int
IOStack::IO
+RegisterMessageHandler()+SendMessage()+HandleSendCompleted()+HandleServerStarted()+HandleNewConnection()+HandleHeaderReceived()+HandleBodyReceived()+HandleDatagramReceived()+HandleIOFailure()
-MessageHandlers : MessageHandler[]
IOStack::ProtocolHandler
1
+HandleReceivedMessage()+CreateMessage()+HandleNewPeer()+HandleIOFailure()
MsgHandlers::PingHandler
+SendMessage()+HandleReceivedMessage()+HandleNewPeer()+HandleIOFailure()+CreateMessage()+CreateMessageTail()
-SenderId : long
IOStack::MessageHandler
+GetNumRooms()+GetNumServersInRoom()+GetServerByIndex()
-Servers : struct-PingInterval : int
General::ServerDb
+GetIdFromPeer()+GetAddressFromPeer()+Add()+Remove()+GetPeerFromId()+GetPeerFromAddress()
General::PeerDb
-RoomIndex : int-IndexInRoom : int
Types::Server
-Socket : int-Address : struct-Id : int-State : int
Types::Peer
111
111
1
+HandleReceivedMessage()+CreateMessage()
MsgHandlers::UserAckHandler
1111
+CalcTopicHash()+GetTsFromHash()+GetTsFromTopic()+GetNextTsInRing()
CSSpecific::FindTs
TSSpecific::TsServer
+HandleReceivedMessage()+CreateMessage()
MsgHandlers::StatsHandler
+HandleReceivedMessage()+CreateMessage()
MsgHandlers::CacheHandler
+HandleReceivedMessage()+CreateMessage()
TsNotifyHandler
+HandleReceivedMessage()+CreateMessage()
MsgHandlers::TsRequestHandler
+HandleReceivedMessage()+CreateMessage()
MsgHandlers::ReplicationHandler
+ChooseTs()
TSSpecific::TsLoadBalancer
+SearchCache()+UpdateCache()+RemoveCache()+Print()
-Hashtable : struct
TSSpecific::TopicCache
+IsTopicSelfOwned()+AddTopic()+RemoveTopic()+SubscribeUserToTopic()+UnsubscribeUserToTopic()+AddTopicReplica()+GetTopicSubscriberList()+GetLoad()+Print()
-SqlConnection : struct
TSSpecific::TopicDatabase
+UpdateUser()+GetUser()+RemoveUser()+GetTotalUsers()+Print()
-SqlConnection : struct
TSSpecific::UserDatabase
111
+Init()+Cleanup()+ReloadConfig()+Start()+ThreadMain()
-Parent : Module-CompletionPort : int-NumThreads : int
General::Module
+ShowHelp()+HandleCommand()
-ServerType : int
General::Application
+StartServer()+ConnectSocket()+DoSendOperation()+DoReceiveOperation()
IOStack::TcpIO
+StartServer()+DoSendOperation()+DoReceiveOperarion()
IOStack::UdpIO
1
1
+StartServer()+ConnectSocket()+DoSendOperation()+DoReceiveOperation()
-ServerSocket : int
IOStack::IO
+RegisterMessageHandler()+SendMessage()+HandleSendCompleted()+HandleServerStarted()+HandleNewConnection()+HandleHeaderReceived()+HandleBodyReceived()+HandleDatagramReceived()+HandleIOFailure()
-MessageHandlers : MessageHandler[]
IOStack::ProtocolHandler
1
+HandleReceivedMessage()+CreateMessage()+HandleNewPeer()+HandleIOFailure()
MsgHandlers::LoadHandler
+HandleReceivedMessage()+CreateMessage()+HandleNewPeer()+HandleIOFailure()
MsgHandlers::PingHandler
+SendMessage()+HandleReceivedMessage()+HandleNewPeer()+HandleIOFailure()+CreateMessage()+CreateMessageTail()
-SenderId : long
IOStack::MessageHandler
+GetNumRooms()+GetNumServersInRoom()+GetServerByIndex()
-Servers : struct-PingInterval : int
General::ServerDb
+GetIdFromPeer()+GetAddressFromPeer()+Add()+Remove()+GetPeerFromId()+GetPeerFromAddress()
General::PeerDb
-RoomIndex : int-IndexInRoom : int
Types::Server
-Socket : int-Address : struct-Id : int-State : int
Types::Peer
111
111
1
CSSpecific::CsServer
+HandleReceivedMessage()+CreateMessage()+HandleNewPeer()+HandleIOFailure()
MsgHandlers::LoginHandler
+HandleReceivedMessage()+CreateMessage()
MsgHandlers::CsRequestHandler
+HandleReceivedMessage()+CreateMessage()
MsgHandlers::CsNotifyHandler
+HandleReceivedMessage()+CreateMessage()
MsgHandlers::UserAckHandler
+HandleReceivedMessage()+CreateMessage()
MsgHandlers::TsNotifyHandler
1111
+Add()+Remove()+AllocClient()+FreeClient()+GetLoad()
-ClientPingInterval : int-Clients : Client[]
CSSpecific::ClientDb
+CalcTopicHash()+GetTsFromHash()+GetTsFromTopic()+GetNextTsInRing()
CSSpecific::FindTs
-GeoParam : int
Types::Client
1
Stress Testing
client load test 2
0.00
200.00
400.00
600.00
800.00
1000.00
1200.00
1400.00
1600.00
1800.00
1471013161922252831
client load (K)
turn
aro
un
d t
ime
(m
s)
סידרה1
Measure publish-notify turnaround time
1 ms resolution using MM timer, avg. of 30
Increasing client and/or topic loadSeveral room topologies examinedResults:
• Exponential-like climb• TS increase: better times • CS increase: better max clients time not improved