efficient and adaptive stateful replication for stream processing engines in high-availability...

Download Efficient and Adaptive Stateful Replication for Stream Processing Engines in High-Availability Cluster

Post on 01-Mar-2017




1 download

Embed Size (px)


  • Efficient and Adaptive Stateful Replicationfor Stream Processing Engines

    in High-Availability ClusterYi-Hsuan Feng, Nen-Fu Huang, Senior Member, IEEE, and Yen-Min Wu

    AbstractStateful stream process engines in high availability clusters (HACs) track a large number of concurrent flow states and

    replicate them to backups to provide reliable functionality. Under high traffic loads, existing solutions in such HACs are expensive

    owing to precise stateful replication. This work presents two novel methods to address this issue: randomization on replication

    representation and a replication scheme designed for when system becomes overloaded. A hashing structure called Multilevel

    Counting Bloom Filter (MLCBF) is proposed as a low resource-consuming solution of stateful replication. Its performance and tradeoffs

    are then evaluated based on theoretic analysis and extensive trace-based tests. Trace-based simulation reveals that MLCBF reduces

    network and memory requirements of replication typically by over 90 percent for URL categorization. Most importantly, MLCBF is quite

    as simple and practical for implementation and maintenance. Moreover, an adaptive scheme called dynamic lazy insertion is designed

    to prevent replication from overloading system continuously and optimize the throughput of HAC. Testbed evaluation demonstrates its

    feasibility and effectiveness in an overloaded HAC.

    Index TermsMultiple hash functions, bloom filters, adaptive method, high availability, replication.


    HIGH Availability Clusters (HACs) are widely deployedon the highly valuable links of enterprises, campuses,and ISP networks. The most important goal of HACs is toremove a single point of failure. Fig. 1 shows that an HACconsists of pairs of stateful stream processing engines(SPEs) [1] for functionalities such as TCP tracking and URLcategorization. These SPEs process input stream (e.g.,assembled TCP segments or HTTP requests) continuously,perform stateful tracking by pre-determined finite statemachines (FSMs), and produce output (e.g., a decision todrop a packet or warning HTTP responses to web clients) inreal-time. For example, the SPEs of TCP tracking monitorthe state transitions of all TCP flows to ensure theircompliance with the TCP specification. To monitor flowbehaviors, an SPE requires a key-and-state storage (referredto state table) to manage precise keys (e.g., TCP four-tuple and URL) and their currentstates. If such key-and-state data is lost, the SPE willprobably not return an expected output.

    For redundancy, if an SPE on the pass-through link inoperation is out of service, pass-through traffic (e.g., TCPflows) is passed to the backup link (i.e., a failover)immediately. SPEs of identical functionality in an HAC

    must maintain key-and-state consistency among them toensure consistent service in case of a system/networkfailure. In Fig. 1, through a replication link, an SPEsynchronizes keys and state changes to its backup.

    However, efficiency and flexibility of a replicationmechanism is critical for the performance of SPEs and theentire HAC. First, existing replication solutions usingprecise update messages can incur considerable resourcecosts, including CPU, memory, and bandwidth require-ments. Next, because SPEs of functionalities are connectedsequentially, the pass-through throughput (defined as bits persec measured on a pass-through link) of HAC is limited tothe minimum performance of SPEs on a pass-through link.Pass-through processing of all SPEs must be optimized,particularly when an SPE becomes overloaded.

    This work provides an efficient key-and-state replicationamongst SPEs in an HAC. Two types of stateful replicationare considered: state replication and membership replication(e.g., [2], [3], [4]). State replication refers to the task ofsynchronizing keys and state transitions of an active flow(or item) to the backup. In membership replication, theinformation as to whether or not a flow is in a set is sent.

    A compact data structure called Multilevel Counting BloomFilter (MLCBF) is designed, with our results demonstratinghow to utilize this data representation based on randomiza-tion to reduce the costs of stateful replication. By employingd-left hashing [5], d-left CBF (DLCBF) [6], [7] is considered as asimple and feasible alternative [8] to legacy Counting Bloomfilter (CBF) [2], [3]. MLCBF can be viewed as a modificationto DLCBF; we introduce skewness to filter levels and adifferent insertion strategy is adopted to improve the filterperformance. Based on theoretic analysis and extensiveexperiments, the properties of MLCBF are along with itsreplication efficiency evaluated by several metrics, e.g.,


    . Y.-H. Feng and N.-F. Huang are with the Department of ComputerScience, National Tsing Hua University, No. 101, Section 2, Kuang-FuRoad, Hsinchu, Taiwan 30013, R.O.C.E-mail: {dr918302, nfhuang}@cs.nthu.edu.tw.

    . Y.-M. Wu is with IBM, 5F, no. 17, Aly. 2, Ln. 244, Sec. 3, Roosevelt Rd.,Zhongzheng Dist., Taipei 100, Taiwan, R.O.C.E-mail: g9562564@cs.nthu.edu.tw.

    Manuscript received 6 Apr. 2009; revised 27 Feb. 2010; accepted 22 Nov.2010; published online 10 Mar. 2011.Recommended for acceptance by H.Jiang.For information on obtaining reprints of this article, please send e-mail to:tpds@computer.org, and reference IEEECS Log Number TPDS-2009-04-0156.Digital Object Identifier no. 10.1109/TPDS.2011.83.

    1045-9219/11/$26.00 2011 IEEE Published by the IEEE Computer Society

  • accuracy, resource consumption, and operational latency.

    Experimental results indicate that the proposed method

    significantly reduces network and memory costs of replica-

    tion, as well as provides replication with a small and

    constant latency time. Hereinafter, the stateful replication

    using precise key and state value is referred to as precise

    replication. Additionally, the replication by hashing repre-

    sentation is called as imprecise or approximate replication.Next, an adaptive method is developed to prevent

    system overloading by the replication of TCP flows, i.e.,

    most Internet traffic. The proposed method prioritizes the

    pass-through processing over replication at system over-

    load to maintain optimal throughput dynamically. Testbed

    evaluation demonstrates its feasibility and effectiveness in

    an overloaded HAC.The rest of this paper is organized as follows: Section 2

    describes the model of stateful replication and HAC,

    motivations, and design goals. Next, Section 3 introduces

    MLCBF and its properties as well as explains its use for

    stateful replication. Additionally, Section 4 describes an

    adaptive mechanism to dynamically control TCP replica-

    tion. Section 5 evaluates the feasibility of the proposed

    methods based on trace-based and testbed-based experi-

    ments. Following a discussion of related works in Section 6,

    conclusions are finally drawn in Section 7.


    This work considers a generic HAC, where two sequencesof SPEs are connected by two pass-through links. The SPEsprocess pass-through traffic and replicate flow statessimultaneously to their backups through replication links.

    Two distinct HA schemes are generally available. InFig. 1, active/backup (AB) scheme directs all traffic to theprimary pass-through link during normal operation. If anSPE in primary link is out of service, a failover occursand the traffic is then directed to the backup link.

    In active/active (AA) scheme, edge switches attempts tobalance traffic on pass-through links. In both HA schemes,SPEs rely on replication for reliable service in face of failureand flow migration due to load balancing [9]. In the ABscheme, two SPEs of the same functionality function inprimary and backup roles, respectively. In AA scheme, anSPE plays two roles at the same time. This work focusesmainly on the AB scheme for simplicity. However, theproposed methods can be applied equally to an HAC usingAA scheme like testbed tests in Section 5.5.

    Fig. 1 shows schematically the SPE architecture ofexisting precise replication solutions like OpenBSD pfsyncand Linux ct_sync. Our preliminary tests analyze replica-tion bottlenecks by using TCP state replication (a modifiedversion of Linux ct_sync), with six states for each flow, asdescribed in Section 5.3.2. This work is motivated largely byour observations.

    First, the long-lasting flows replicated from another SPEmay occupy considerable table entries, which are only of usewhen necessary. Second, existing precise replication incursconsiderable costs into SPEs and replication links under high-rate traffic. Assume that steady TCP flow rate is 20kconnections per sec (cps), while a replication messagecontainswhose size is 100 bits and updateinterval is 30 seconds. An update introduces 20kcps 6states 30sec 3;600 k messages and 360 Mb of mem-ory and network costs for replication.

    Third, when an attempt to use CBF in stateful replicationas data representation, the bandwidth cost is even higherthan that of precise replication for certain applications.Finally, CPU load is dominated by the number of incomingpass-through packets and replication tasks. For an over-loaded system, replication should be deprioritized foroptimal pass-through throughput.

    In sum, this work focuses on the following design goals:1) an architectural separation of pass-through and replica-tion processing; 2) design of a hashing structure for statefulreplication at very low runtime costs; and 3) developmentof a dynamic scheme to prioritize pass-through tasks overreplication ones for optimal pass-through throughput atsystem overload.

View more