gestureflow: qoe-aware streaming of multi-touch gestures ...liu.zimu.ca/files/zliu-jsac12.pdf · 1...

1

GestureFlow: QoE-Aware Streaming of Multi-TouchGestures in Interactive Multimedia Applications

Yuan Feng,Student Member, IEEE, Zimu Liu, Student Member, IEEE, and Baochun Li,Senior Member, IEEE

Abstract—With the proliferation of multi-touch mobile devices,such as smartphones and tablets, users interact with devices innon-conventional gesture-intensive ways. As a new way to interactwith mobile devices, gestures have been proven to be intuitiveand natural with a minimal learning curve, and can be used ininteractive multimedia applications. In order for multipl e usersto collaborate in an interactive manner, we propose that gesturescan be streamed in multiple broadcast sessions, with each sessioncorresponding to one of the users as the source of a gesturestream. During the interactive session, the Quality of Experience(QoE) of mobile users hinges upon delays from when gestures areentered by the source to when they are recognized by each of thereceivers, which we refer to asgesture recognizing delays. In thispaper, we present the design ofGestureFlow, a gesture broadcastprotocol designed specifically for concurrent gesture streams inmultiple broadcast sessions, such that the gesture recognizingdelay in each session is minimized. We motivate the effectivenessand practicality of using inter-session network coding, and addresschallenges introduced by the linear dependence of coded packets.We evaluate our protocol design using an extensive array of real-world experiments on mobile devices, involving a new gesture-intensive interactive multimedia application, called MusicScore,that we developed from scratch.

Index Terms—Multi-Touch Streaming, Inter-session NetworkCoding, Mobile Framework.

I. I NTRODUCTION

NEW mobile devices with multi-touch displays havebrought revolutionary changes to ways users interact

with wireless devices, withmulti-touch gesturesused as theprimary means of interaction. In particular, interactive multi-media applications on mobile devices have made it possibleto use gestures intensively to create and consume artistic ormusical content in an interactive and collaborative fashion,since gestures are frequently needed to create and manipulateartistic strokes or musical notes. With such media authoringapplications, it is desirable, if feasible, to support collaborationamong multiple participating users. As an example, it wouldcertainly be exciting if music composition hobbyists maycollaborate in real time to work on a musical piece.

To support such collaboration among multiple users in realtime, we propose thatgesturesare streamed in a broad-cast fashion from one user to all participating users, in abroadcast session. Streaming gestures themselves, ratherthanapplication-specific data, has made it possible to optimizethedesign and implementation of agesture broadcast protocolthat can be reused by any mobile multimedia application that

Manuscript received on August 8, 2011; revised on January 23, 2012.Y. Feng, Z. Liu, and B. Li are with the Department of Electrical and

Computer Engineering, University of Toronto, Canada (e-mail: {yfeng, zimu,bli}@eecg.toronto.edu).

needs to support multi-party collaboration. Clearly, it isamore elegant and reusable solution to serve the needs of anentire category of gesture-intensive media applications.Oncereceived, gestures can be recognized and rendered in real timeby a live instance of the same application on a receiver. Totake such broadcast of gestures a step further,multiplegesturebroadcast sessions need to be supported concurrently, so thatany participating user can be the source of a gesture stream.

With such gesture streaming broadcast sessions in place, ahigh-quality user experience within an interactive applicationhinges upon an important Quality of Experience (QoE) metric:the time it takes for a gesture to be recognized at each of thereceivers, starting from the time it is recognized at the sourceof the session. Referred to as thegesture recognizing delay,such a delay is an application-layer QoE metric that directlyaffects the user-perceived quality of an application session.

In this paper, we presentGestureFlow, a new QoE-awaregesture streaming protocol specifically designed for multipleconcurrent broadcast sessions of gestures, with the objectiveof minimizing gesture recognizing delaysin these broadcastsessions. Since only a subset of raw touch events can be recog-nized as multi-touch gestures, the source of a broadcast sessionneeds to sendraw touch events, which will be recognized ateach of the receivers. Each recognized gesture will consistof a number of network-layer packets, all of which need tobe received for the gesture to be correctly recognized. Unliketraditional media streams, gesture streams typically incur lowyet bursty bit rates, but packet losses are not tolerable sinceeach lost packet will severely affect the accuracy of the gesturerecognizer on a receiver.

In order to support multiple broadcast sessions while min-imizing gesture recognizing delays and guaranteeing reliablepacket delivery, we present a detailed design that usesinter-session network coding, and addresses a number of openchallenges in our design that have not been discussed in theliterature: what the best size of the coding window should be,how the coding window should be advanced, and how packetsfrom different sessions can be coded together.

To validate our design, we have developed a real-world im-plementation ofGestureFlow, as well as an interactive musiccomposition application, calledMusicScore, using Objective-C from scratch on the iPad.MusicScoretakes full advantageof our implementation of theGestureFlowframework to allowcomposers to enjoy a live collaborative session. During exten-sive experiments presented in this paper, we have discoverednew challenges in the use of network coding within theGestureFlowimplementation. It turns out that coded blocksare linearly dependent with one another with an alarmingly

2

high probability, leading to a much higher overhead than whatwe originally anticipated. We found that it is due to the factthat the coding window size is typically very small, which isrequired to satisfy stringent delay requirements. We propose touse systematic Reed-Solomon codes on the source to mitigatethe overhead due to such linear dependence, and show thatthe revised QoE-aware gesture streaming protocol has met ourneeds with the smallest possible gesture recognizing delay.

The remainder of this paper is organized as follows. Sec. IIdiscusses the motivation and challenges of QoE-aware ges-ture streaming. In Sec. III, we describe our detailed systemdesign inGestureFlow, using inter-session network coding. InSec. IV, we present a thorough analysis of measurement resultsusing our implementation ofGestureFlowand MusicScore,and observe that coded blocks are linearly dependent withan alarmingly high probability. In Sec. V, we propose oursolution to mitigate such linear dependence among codedblocks, and evaluate its performance. We discuss related workand conclude the paper in Sec. VI and Sec. VII, respectively.

II. GESTURESTREAMING: MOTIVATION AND QOE

Multi-touch allows users to interact with user interfaceelements directly with their fingers usinggestures, and hasbeen proven to be the most intuitive interface for a wide varietyof applications. These gestures can be as simple as a one-fingertap, or as complicated as a three-finger swipe. As a runningexample and experimental testbed, we have designed andimplemented an iPad application for music composition usingmulti-touch gestures, calledMusicScoreand shown in Fig. 1,from scratch.MusicScoreallows a user to create musical noteswith double taps, to change the pitch of notes by draggingthem vertically, and to select a group of notes by dragging arectangle around them.

Fig. 1. A screenshot of runningMusicScoreon the iPad.

A. Streaming Multi-touch Gestures

In MusicScore, there are situations in which a teacher givesher student a tutorial on music composition when she is travel-ing; or composition hobbyists collaborate to compose a pieceof music without being physically together. These exampleshave shown a clear need to facilitate spontaneous sharingof user experience among multiple users, which substantiallyimproves the utility of multi-touch applications. In a nutshell,

rather than streaming application-specific data, we propose thatmulti-touch gestures are streamed instead, regardless of whatthe application may be. With the same application running onmulti-touch devices belonging to all participating users,thestreamed gestures can be precisely rendered on a receivingdevice, as if they are enteredlive by the local user.

By streaming multi-touch gestures, we immediately gaina number of important benefits. In contrast to the design ofcustomized state exchange protocols for specific collaborativemedia applications, such as musical symbols in a score andartistic objects in a canvas, it would be much more genericto stream multi-touch gestures as representatives of user inter-actions. By handling the replay of streamed gestures as userinput, application states are updated correctly at the receivingdevice. This implies that gesture streaming can serve as anunderlying framework that supports any media applicationthat desires multi-party collaboration. Furthermore, streamingmulti-touch gestures makes it easier for multiple users tointeract with the same set of application states at the sametime. In MusicScore, this implies that multiple users are ableto compose the same piece of music together, by composingdifferent voices or musical instruments in the piece.

Since all participating users are able to affect the state oftheapplication, it is a must that everyone can see the exact changesmade by other users. While some multi-touch gestures caneasily be replayed after being streamed to a different user,suchas adding a note inMusicScore, other gestures only changethe views of the local user, such as zooming, and do not affectthe state of the application. If participating users have differentviews, it will be impossible to show all of them on one display.We solve this problem by adopting a“picture in picture”design: a user’s own gestures interact with the native viewusing the full-screen display, and the views of participatingusers are displayed in their respective overlay windows. Inthe example shown in Fig. 2, both Alice and Bob are able towork on their preferred views, and to observe the view of theother party at the same time.

Alice Bob

Multi-touch streams

Fig. 2. A “picture in picture” design as multi-touch gestures are streamedbetween two users, who are collaborating to compose the samepiece of music.

B. Quality of Experience

Our ultimate design objective is to design a reusable frame-work, called GestureFlow, from the ground up to streammulti-touch gestures to multiple participating users, with thebest possible Quality of Experience achieved. The frameworkuses a shared set of well-designed presentation and transportmechanisms to support a variety of interactive multimediaapplications, includingMusicScore. There are a number ofchallenges when designing theGestureFlowframework.

3

First, multi-touch gesture streams have avery low, yetbursty, bit rate. In multi-touch applications, it is usually thecase that users interact with their devices frequently for awhileand then stay idle most of the time. For example, a musiccomposer touches the display inMusicScoreto add or removenotes only when she is inspired. Fig. 3 shows the bit rates ofa typical gesture broadcast session inMusicScoreover time.We can observe that the peak bit rate reaches11 kbps, whilethe average bit rate is no more than3 kbps.

0 50 100 150 200 250 300Time (sec)

0

2

4

6

8

10

12

Bitr

ate

(kbp

s)

Fig. 3. Bit rates of a typical gesture broadcast session inMusicScoreovertime.

Second,multi-touch gestures need to be streamed in anin-order, losslessand error-free fashion, as any lost or er-roneously transmitted gesture to any of the participating usersmakes it difficult to precisely render and reconstruct applica-tion states at the receiver. This is different from typical mediastreams, where a loss or an error is considered an inconve-nience that degrades playback quality, but not a catastrophicevent.

Third, gesture streaming has a stringent delay requirement.Media applications that need the support from a gesturestreaming framework are interactive in nature, and demandthe smallest possiblegesture recognizing delay, from whengestures are recognized by the source, to when they areeventually received and recognized by each of the receivers.

Finally, once the replay of streamed gestures has started ata receiver, the interval between the replay of two consecutivegestures has to be kept identical to the difference between theiroriginal timestamps when they are generated at the sender.Otherwise, rendered states of an application may be differentfrom the original. This implies that each gesture has to berecognizedat the receiveron time, i.e., before its scheduledreplaying time, despite the fact that each gesture may bereceived with different end-to-end delays over the Internet,and as a result experience different gesture recognizing delays.Similar to live media streaming, aninitial startup delay—with a corresponding application buffer at the receiver — canbe used to mask varying gesture recognizing delays.

Using our “Bob and Alice” example, we illustrate delaysin the session from Alice to Bob in Fig. 4. Alice’s gesturesare received by Bob with end-to-end delaysτ1, τ2, τ3, . . . ,and recognized by Bob’s application with gesture recognizingdelaysv1, v2, v3, . . . . Though a gesture may be recognizedby Bob’s application, its replay may be delayed by an initialstartup delayδ, such that the intervals between gestures,∆t1,∆t2, . . . , are kept precisely the same during replay. Inorder to make sure all gestures are recognized on time forreplaying, the initial startup delayδ has to be no shorter

than the longest gesture recognizing delay,vmax. Therefore, toachieve the best possible Quality of Experience with a shortδ, gesture recognizing delays need to be minimized.

TimeAlice's gestures...

t

Bob's Receiving

Sequence Time...

Gesture Replay

in Bob's device Timeδ

...

τ1 τ2 τ3

∆t2∆t1

∆t2∆t1

v1 v2 v3

Fig. 4. The replay of gestures from Alice to Bob, using an initial startupdelay δ to mask varying end-to-end delaysτi over the Internet, as well asgesture recognizing delaysvi in the application.

With these unique characteristics, the design of theGesture-Flow framework is more challenging than conventional mediastreaming systems. It needs to be designed so that a burstyand low bit-rate stream from each user can be transmitted toall participating users in a reliable and timely fashion, ina setof broadcast communication sessions.

C. Presenting Multi-touch Gestures

We are now ready to present a detailed design of ourGestureFlowframework. The first natural question is howgestures should be presented and packetized, in preparationfor streaming to multiple participating users.

In multi-touch applications,gesture recognizersare in-stances that analyze araw stream of touch objects in asequence, and determine the intention of users based onproperties of each gesture. It analyzes the number of touchesand the number of taps from the raw stream, and comparesthem with the required ones stored in the recognizer tomake a decision. Table I shows descriptions and examples inMusicScorefor a collection of useful gestures.

What information should be streamed for a precise playbackat a receiver? There are two alternatives. The first is touse araw stream, represented by a successive sequence oftouch events(e.g., finger-down, finger-up, or location updateof touch), and the second is to stream a sequence of ges-tures recognized by gesture recognizers. Intuitively, onemaythink streaming recognized gestures would be sufficient formobile applications. Unfortunately,GestureFlowneeds to usethe first alternative, and to stream raw touch events ratherthan recognized gestures. This is because interactive mediaapplications typically include a mixture of raw touch eventsand recognized gestures. Take a drawing application as anexample, while scaling or moving an object in the canvas canbe done by gestures, artistic drawing requires a raw streamof touch events to track every movement of fingers. Table IIsummarizes examples of multi-touch operations in typicalinteractive applications, which require both raw touch eventsand recognized gestures.

As soon as the raw stream is received, gestures can berecognized by the same application on the receiver, as illus-trated in Fig. 5. As a gesture is essentially a sequence of rawtouch events, before the last touch event arrives at the receiver

4

TABLE IEXAMPLES OF MULTI-TOUCH GESTURES INGestureFlow.

Gesture type Description Example in MusicScore Information needed to replay

Tap Tap a view with one or more fingers,possibly multiple times

Select a note or chord The number of fingers used, the number of taps,and the location in a given view

Swipe One or more fingers moving towardsa direction for a distance

Scroll up or down The location of the first touch, the direction ofthe swipe, its velocity and the distance

Touch and hold Touch with one or more fingers andhold for a short period of time

Trigger a pop-up menu to change thenote duration or to add accidentals

The location in a given view

TABLE IIEXAMPLES OF MULTI-TOUCH OPERATIONS IN TYPICAL INTERACTIVE

APPLICATIONS.

Interactive Brushes MusicScoreApplication (Artistic drawing) (Music composition)

TouchEvents

Draw a curve with vary-ing speed

Free play on the virtualpiano keyboard

RecognizedGestures

Move a layer in the can-vas using panning

Create a note usingdouble-tapping

and leads to a successfully recognized gesture, all precedingtouch events have already being streamed to the receiver.In other words, the recognizer in the receiving applicationhas beenprogressivelyreceiving “partial” information aboutthis gesture; once the last raw touch event is received, thegesture is immediately recognized. In contrast, if a gestureis streamed when it is fully recognized by the sender, thereceiver has to wait for the transmission of an entire gesture,which is typically larger than a touch event,i.e., the delayof recognizing a gesture in the receiving application will behigher.

iOS

Mobile

Application

GestureFlow

Application

Picture in Picture

Recognizers Recognizers

StreamingRaw Stream

Touch Objects

User's touches

raw touches

recognizedgestures

Fig. 5. Streaming a raw stream of touch objects.

To packetize and transmit a raw stream of touch objects(Fig. 5), we propose to use a simple binary format, due to thefact that there are only four types of events involved (touch-begin, touch-move, touch-end and touch-cancel). With a rawstream, touch objects are continuously generated as a userinteracts with her device, and transmitted with a compact formof presentation so that the bit rate is minimized. Each of thesetouch events should be accompanied by itssequence numberand timestamp. At a receiver, sequence numbers are used todetect out-of-order delivery and losses of the event stream, andtimestamps help to replay them with precise time intervals asthey are originally generated: any time interval∆ti can becomputed as the difference between the timestamp of theith

event and that of the(i− 1)th.

III. T RANSPORTINGGESTURESTREAMS

When transporting gestures, we wish to achieve the bestpossible Quality of Experience, in that gesture recognizingdelays are to be as short as possible. It is intuitive to conceivea design where a TCP connection is established between eachpair of users, forming a complete graph of overlay. AlthoughTCP guarantees the reliable and in-order delivery of a streamof bytes, the realistic nature of traffic on the Internet dictatesthat overlay links based on TCP connections exhibit a widerange of delays, and vary significantly over time as well.Further, since TCP uses retransmissions to guarantee reliabledelivery, delays may escalate with a slightly more congestedlink, leading to high delay jitters.

To minimize end-to-end delays of delivering gestures to re-ceivers with guaranteed reliability, we present two approachesby taking advantage of the “all-to-all” broadcast nature ofGestureFlow, where every participating node is the sourcenode of a broadcast session to all others, and multiple broad-cast sessions exist concurrently in the complete overlay graphconnecting all users.First, we propose to userandom networkcoding, which streams coded blocks using UDP flows ratherthan TCP, and allows possible relaying nodes to relay blocksthat they receive after recoding.Second, instead of directconnections in the case of using TCP, multiple paths betweenthe source and each receiver are used to minimize the end-to-end delays of delivering gestures. Since all relaying nodesarereceivers themselves, no additional bandwidth is consumedtotake advantage of relay paths. The essence of our transportsolution is to use network coding as a rateless erasure codefor all broadcast sessions to guarantee reliable delivery,tightlycoupled with the use of multiple paths to minimize end-to-enddelays.

A. GestureFlow: Design Overview

In GestureFlow, we have designed a custom-tailored proto-col to utilize random network coding in all of the concurrentbroadcast sessions, each streaming gesture events from oneof the participating nodes. The reliable delivery of originaldata blocks is guaranteed with the erasure correction nature ofrandom network coding. Should a particular coded block belost, subsequent coded blocks received are equally innovativeand useful.

Data blocks flow conceptually from each source in acodedform that is mixed with other blocks, via multiple single-hop or two-hop paths to each destination, with each two-hoppath using one participating node as a relay. With multiplepaths, original data blocks will arrive at a receiver via thepath

5

with the lowest delay, yet in a coded form. Fig. 6 illustratesan example to show how blocks from Node1’s session aretransmitted in coded forms and following different paths.

1

2

3

4

Fig. 6. Streaming coded blocks from Node 1 to all other participating nodesalong multiple paths. Data blocks from Node1 are being transmitted to Node4 in coded form, either using a direct link, or relayed by Node2 and Node3 after being recoded with their own data blocks.

When participating nodes are used to relay data blocks, theyare also producing their own original blocks. What should eachnode do with these data blocks belonging to multiple broadcastsessions? Since a node is capable of recoding all coded blocksit has received before transmitting them to others, shall weallow for recodingacrossmultiple broadcast sessions? If wedo, a participating node would then serve the dual role of beinga source node and a relaying node. Referred to asinter-sessionnetwork coding in the theoretical literature, it has not yetbeenadopted in any practical systems using network coding.

In the GestureFlowframework, we have made the decisionthat all nodes are to perform network coding across multiplebroadcast sessions. If each node is allowed to mix all incomingblocks with original blocks produced by itself, there is nolonger a need to allocate outgoing bandwidth to multipleconcurrent sessions, or to schedule outgoing blocks belongingto different sessions competing for outgoing bandwidth. Withinter-session network coding, every node only needs to trans-mit as many coded blocks as the outgoing bandwidth allows,without considering the sessions they belong to.

In the four-node example shown in Fig. 6, if we considerall 4 broadcast sessions from4 users concurrently, the inter-session network coding engine in node2 is shown in Fig. 7.As we can see, node2 produces coded blocks covering all4sessions, each of which carries the necessary information othernodes require to decode, such as the sequence numbers oforiginal blocks, all random coefficients, and the coded payload,so that it is self-contained.

Node 2Node 1

Node 3

Incoming Sessions Outgoing Sessions

Node 4

Node 1

Node 3

Node 4

RandomNetwork Coding

S1, S2, S3, S4

S1, S2, S3, S4

S1, S2, S3, S4

S1, S3, S4

S1, S3, S4

S1, S3, S4

S2

# of Sessions = 4Start Seq. #End Seq. #

S1

S2

Start Seq. #End Seq. #

...

...

An Inter-SessionCoded Block

Coded Payload

Coefficents

Fig. 7. The inter-session network coding engine in node2, in the contextof the four-node topology shown in Figure 6, with4 “all-to-all” broadcastsessions considered.

B. GestureFlow: Protocol Design

Now that we have presented an overview ofGestureFlow,we are ready to discuss more details in our protocol design.

Basics of random network coding. Random network codinghas been well established in recent research literature [1],[2], and has been shown to maximize throughput in multi-cast sessions. With random network coding,k original datablocks b = [b1, b2, . . . , bk]

T , each withs bytes, are to betransmitted from the source to multiple receivers in a networktopology. A source of a broadcast session generates codedblocksxj =

∑ki=1 cji · bi as a linear combination of original

data blocks in a finite field (typicallyGF (28)), where the setof coding coefficientscj = [cj1, cj2, · · · , cjk] is randomlychosen. A relaying node is able to perform similar randomlinear combinations on received coded blocks with randomcoefficients. Coding coefficients related to original blocks biare transmitted together with a coded block.

A receiver is able to decode allk data blocks whenit has receivedk linearly independent coded blocksx =[x1, x2, . . . , xk]

T , either from the source or from a relay. It firstforms ak× k coefficient matrixC, in which each row corre-sponds to the coefficients of one coded block. It then decodesthe original blocksb = [b1, b2, . . . , bk]

T asb = C−1

x. Sucha decoding process can even be performed progressively ascoded blocks arrive, using Gauss-Jordan elimination to reduceC to its reduced row-echelon form (RREF).

Although inter-session network coding is conceptually sim-ple to perform, its real-world design and implementation havebrought a number of challenges to the spotlight. In whatfollows, we illustrate our design choices as we address thesechallenges.

The size of an original data block. When determining whatthe value ofs — the size of an original data block — shouldbe, we have discovered in our experiments that the size of onegesture event in the stream is very small: less than512 bytes.In GestureFlow, each original data block contains one touchevent if any touch event is produced. When original blocks ofdifferent sizes are coded, they are padded with zeros to the sizeof 512 bytes. Since gesture events do not vary substantiallyin size and the streaming bit rate is very low, the overheadintroduced by such padding is not a concern.

Cumulative acknowledgments. How should receivers ac-knowledge the source node of a broadcast session in whichan original data block has been correctly decoded? The firstintuitive idea is to selectively acknowledge each of the datablocks as soon as they become decoded, even if they are notconsecutive to one another. While it is certainly possible forthe source node to remove any of the original blocks from thecoding window when it is acknowledged by all the receivers,it requires a coded block to carry the sequence numbers ofall original blocks that are coded. Since sequence numbers oforiginal blocks are not consecutive, it is no longer feasible tocarry only the starting sequence number of theearliestoriginalblock. Since the additional overhead and complexity may notbe justified, we propose to usecumulativeacknowledgments,which are much simpler.

With cumulative acknowledgments of decoded data blocks,a receiver uses Gauss-Jordan elimination to reduce the coef-ficient matrix of all coded blocks it has received so far to itsRREF, and finds out which block has just been completelydecoded. Instead of acknowledging a newly decoded data

6

block immediately, the receiver sends an acknowledgmentfor a decoded block only ifall earlier blocks with smallersequence numbers have been decoded. As an example shownin Fig. 8, the receiver does not acknowledgeb3 even though ithas been decoded after receiving two coded blocksx1 andx3.It waits until receiving another coded blockx4, which rendersall three data blocks,b1, b2, andb3, decoded.

1 1 1Lost

1 1 0

0 0 1

1 0 0

0 1 0

0 0 1

RREF of the Coefficient Matrix

b1 b2 b3

b3

b1, b2, b3

x1

x2

x3

x4

b1+ b2+ b3

2b1+ 3b3b2+

b1+ b2+2b3

2b2+b1+ b3

b1, b2, b3

Acknowledgmentfor

decoded

decodedACK=3

Sender Receiver

Fig. 8. A receiver sends a cumulative acknowledgment only when all earlierblocks have been decoded.

Basics of the coding window at a source node withno relaying. In the theory of random network coding, it isassumed thatk data blocks are to be coded, and if more datablocks are being transmitted, they are divided intogroupsof kblocks, and coding is to be performed within each group. Ifk

is fixed, it corresponds to a fixed number of blocks to be coded.However, a fixed group sizek may negatively affect the QoEmetric in gesture streaming: due to inherently bursty trafficwhen gestures are streamed, a fixed group size may increasethe gesture recognizing delays. As an example, consider thecase where4 blocks are to be coded at a source node, yet only3 are received or produced, followed by a long idle period.With a fixed group size, the source node would have to waitfor all 4 blocks to become available.

To address this challenge, blocks are to be coded withina sliding window in GestureFlow, referred to as thecodingwindow. To explain the basic idea of a coding window, let usfirst consider the simplified scenario where a node does notrelay coded blocks from other broadcast sessions,i.e., onlyoriginal blocks are coded by the source node of a broadcastsession. In this case, as a new original block containing a newgesture event is produced, it is added to the coding windowat the source node. A maximum size of the coding window,W , is imposed to guarantee successful decoding at receivers,and it corresponds to the maximum number of original blocksthat can be coded to produce an outgoing coded block. Thesource node performs random network coding on originalblocks within the coding window, and sends coded blocks toall the receivers as newer blocks are being added to the codingwindow. The coding window advances itself by removing theearliest data block from the window when the source nodehas received acknowledgments from all the receivers in thebroadcast session.

Fig. 9 shows the basic idea of the coding window at asource node. At timet1, the coding window grows to3as block 5 enters, and then reaches the maximum codingwindow sizeW (4 blocks in this example) at timet2. Notethat even though blocks7 and8 have already been produced

containing new multi-touch events (and buffered), they arenotadded to the coding window since it has already reached itsmaximum size. After a few coded blocks are received, thereceiver acknowledges that blocks3−5 have been successfullydecoded. At timet3, the coding window at the source nodeadvances itself by removing acknowledged blocks, and thenblocks 7 and 8 enter the coding window. By adopting thesliding window mechanism, during bursty periods when touchevents are produced back-to-back, the coding window expandsto cover new events, so that they can be received and decodedby receivers in time. During idle times when touch events arescarce, the size of the coding window is naturally reduced asacknowledgments are being received.

...

Coded Blocks

4 3

5 4 3

5 4 36

5 4 3

...

Coding Window

5

6

7

Alice's Multi-TouchEvent Timeline

...

Decoded for Replay

Sender(Alice)

Receiver(Bob)

t1

t2

t3

...

...

...

...

8

5 4 36

9

ACK8 7 6

8 7 69

Fig. 9. The coding window at a source node advances itself over time.

Shrinking the maximum size of the coding window ondemand. Since raw touch events are streamed directly and agesture consists of multiple raw touch events, with a fixedmaximum size for the coding window, blocks containinginformation to recover one gesture may be split into separatecoding windows, incurring longer gesture recognizing delaysat the receivers. Longer gesture recognizing delays can bereduced if we can send out a coded blockimmediatelywhenthe recognizer on the source node recognizes a gesture.

In our design, the source node will always “shrink” themaximum size of the coding window to the current blockwhenever a gesture is recognized, so that no other blocks canenter the current coding window. By doing so, the sourcenode will not need to hold itself back and wait for neworiginal blocks after a gesture is already recognized; it isalso easier for receivers to receive enough coded blocks torecover the gesture since fewer blocks are contained in acoded block. Fig. 10 shows an illustration of our maximumcoding window size adjustment mechanism with the previousexample in Fig. 9. When a gesture is recognized by therecognizer at timet4, the source node will adjust its maximumsize of the coding window to2 blocks, and send out thesecoded blocks immediately. When the source node receivesacknowledgments confirming that blocks belonging to thisgesture are successfully decoded by all receivers, the sourcenode advances its sliding window, and the maximum windowsize is then restored to its original value (4 blocks).

The coding window with inter-session network coding. Un-fortunately, the design of the coding window inGestureFlow

7

Original Blocks

Time

...3 4 5 6 7

...Coded Blocks

t4 : A gesture is recognized.

8 9

...

ACK=6 ACK=8

Coding Window2

3

4

2 2

...

Fig. 10. The source node adjusts its maximum coding window sizeW whena gesture is recognized by its gesture recognizer. The number in the circleindicates the size of the coding window; and the dark circle indicates that themaximum coding window size has been reached.

becomes more complicated with inter-session network coding,where a source node of a broadcast session also serves as arelaying node for other sessions. The initial complexity comesfrom the computation of the coding window size. Even thougha source node also serves as a relay and mixes incomingcoded blocks from other sessions, these incoming coded blocksshould not affect the computation of the coding window size.In other words, the coding window size should still be thenumber of original blocks that the source node itself hasproduced, and the source node will not include new originalblocks in its coding window if it has reached its maximumsize.

As we mix blocks from multiple broadcast sessions, whatis the set of blocks that is to be coded on a node for anoutgoing block to be produced? Of course, if an original datablock is already decoded at a downstream receiver, it shouldnot be included in the recoding process. As a result, withinter-session network coding, the coding window at each nodeshould selectivelyremove original data blocks within otherbroadcast sessions, if they are no longer useful to all thereceivers. But how does a node in its role as a relay knowwhich original block is already decoded at receivers?

The short answer to this question is: the relaying node doesnot know directly, but the source node knows, since receiverssendscumulative acknowledgmentsto the source node of asession, acknowledging the latest original block that has beendecoded. Of course, these acknowledgments are sent directlyfrom a receiver to the source node of a session, and are not sentto any of the relaying nodes. Nevertheless, after the sourceofa session advances its coding window by removing its earliestoriginal block, all relaying nodes will easily detect such anadvance, as the sequence number of the earliest original blockis embedded within a coded block.

It only remains to see when a node removes a block fromits coding window. As a coded block arrives or as an originalblock is produced, it adds its coefficient row to the existingco-efficient matrix, and reduces the new matrix to its RREF. Aftereliminating original data blocks from its own broadcast sessionthat are acknowledged by all receivers, it simply recodes allrows in the existing matrix, even if an original data blockfrom another broadcast session is completely decoded whenreducing the coefficient matrix to its RREF. The node does notremove the block from its coding window immediately, sincedoing so introduces the risk that subsequent original blocks

may not be decoded. Instead, a node, in its role as a relay,waits until the source node of a broadcast session advances itsown coding window. The relaying node removes an originalblock from its coding windowonly when its sequence numberis smaller than the starting sequence number in a newlyreceived coded block from the source node of a session. Sincethe source node only advances its coding window when allreceivers have decoded an original block, recoding such ablock will no longer benefit any of the receivers.

1 0 0 0 0 0 0 0

0 1 5 0 7 0 8 0

0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 1

3 6 4 2 8 7

b(1)1 b

(1)2 b

(1)3

b(2)1 b

(2)2 b

(3)1 b

(3)2 b

(4)1

1 5 7 0 8 0

0 0 0 0 0 1

3 6 4 2 8 7

b(1)2 b

(1)3

b(2)2 b

(3)1 b

(3)2 b

(4)1

X

X

S4S1 S2 S3 S4S1 S2 S3

Coef. of a Newly Received Coded Block

Fig. 11. Removing data blocks from the coding window of a nodein itsrole as a relay (node4).

To illustrate the design of the coding window with inter-session network coding, in the context of our four-nodeexample given in Fig. 7, Fig. 11 shows the coefficient matrixat node4. b(j)i represents an original data blocki in the jth

broadcast session,Sj . In the left-side matrix in RREF, weobserve that node4 has decodedb(1)1 and b

(2)1 , from S1 and

S2 respectively, by receiving the first three coded blocks, andthe fourth coefficient row corresponds tob(4)1 from node4itself for inter-session network coding. In a newly receivedcoded block (the coefficients of this block are in the lastrow), any information related tob(1)1 and b

(2)1 is no longer

included, which indicates that the coding windows at node1and node2 have advanced beyond these original data blocks.In its role as a relay, node4 now removes coefficient rows thatcorrespond to these two blocks (circled by dashed rectangles)from its coding window, within which it produces outgoingcoded blocks in the future.

In summary, Algorithm 1 describes theGestureFlowproto-col using inter-session network coding.

IV. EXPERIENCES WITHGESTUREFLOW

We dedicate this section to investigations of howGes-tureFlow performs in real-world systems. We implementedMusicScore, a collaborative music composition application,from scratch with the iPad Programming SDK, consisting over59, 000 lines of code. Users interact withMusicScoreto com-pose music using only multi-touch gestures.MusicScoretakesfull advantage of theGestureFlowframework to stream gestureevents among multiple participating users, such that composerscan enjoy a live collaborative experience. BothMusicScoreand theGestureFlowframework have been implemented inObjective-C in the Xcode programming environment. Fig. 12shows a scenario of a liveMusicScorecomposition session onthe iPad, in which collaboration is achieved usingGesture-Flow.

To minimize the computational load on the iPad, we haveincluded an optimized implementation of random networkcoding in theGestureFlowframework. Our implementation

8

Algorithm 1 GestureFlowrunning on the source node ofsessionSj .Event: Received a multi-touch event

1: Encapsulate the new multi-touch event into an original block, b(j)i

, withproper zero-padding.

2: if the maximum size of the coding window has not been reachedthen3: Includeb(j)

iinto the coding window

4: Increment the size of the coding window5: end if

Event: Received an ACK from a receiver in the sessionSj

6: Compute the smallest sequence numberr from all ACKs received so farfrom receivers.

7: Advance the coding window by removing all original blocksbeforeb(j)r

(inclusive).8: Include more buffered original blocks into the coding window, if any,

until the maximum size of the coding window has been reached.9: Recompute the size of the coding window based on the numberof

original blocks included.Event: Received a coded block10: Add the coded block to the coding window.11: Reduce the coefficient matrix (corresponding to blocks in the coding

window) to its RREF using Gauss-Jordan elimination.12: if b

(q)p (q 6= j) and earlier blocks can be decodedand b

(q)p+1 cannot be

decodedthen13: Decode blocks tillb(q)p (inclusive)14: Send ACK containingp to the source node ofSq

15: end if16: if b

(q)i

(i ≤ p, q 6= j) is not included in the received coded blockthen

17: Removing blocks associated withb(q)i

from the coding window18: end ifEvent: The network is ready for a block to be transmitted19: Produce and transmit a linear combination of all blocks in the coding

window with randomly generated coefficients.

Fig. 12. MusicScorein action: two users are collaboratively composing amusical piece with support from theGestureFlowframework.

of network coding is able to progressively decode incomingcoded blocks using Gauss-Jordan elimination, while takingfull advantage of SIMD instructions available in the ARM v7architecture, used by CPUs powering the iPad (all generations)and the iPhone (including 3GS, 4 and 4S). TheGestureFlowimplementation itself contains over8, 000 lines of code.

A. Performance Analysis

As our primary QoE metric, we first present measurementresults with respect to the gesture recognizing delays. In eachrun of our experiments, we measure the gesture recognizingdelay in a collaborative music composition session betweena pair of iPads runningMusicScore, and the correspondingCDF curve is derived from multiple runs of our experiments.Our experiments are performed in both Wi-Fi networks and

3G cellular networks to better capture the performance ofGestureFlow. Given a type of Internet connectivity (Wi-Fi or3G), two iPads are connected to the Internet via two differentISPs, reflecting a more dynamic network condition. We havealso implemented a traditional TCP-based streaming protocol,namedTCP Relay, as a baseline for our comparisons. Forfairness,TCP Relayalso transmits data blocks through bothdirect TCP link and two-hop relay paths to minimize bothend-to-end delays and delay jitters.

As shown in Fig. 13 and Fig. 14, whenGestureFlowis used,averages of gesture recognizing delays are102.6 msec and253.3 msec for Wi-Fi and 3G users, respectively. In contrast,TCP Relaysuffers from much longer gesture recognizingdelays: 183.6 msec and485.1 msec on average, for Wi-Fi and 3G users, respectively. The shorter delays achievedby GestureFlowcan be attributed to both the adoption ofinter-session network coding and a sliding coding window,specifically designed for gesture streaming. Besides, sincedevices in cellular networks (3G or EDGE) cannot directlyconnect with each other via TCP, due to NAT restrictions,they have to connect to the same set of relays with publiclyaccessible IP/port (e.g., dedicated relay servers for users incellular networks) for data exchange. This results in longerend-to-end delays between 3G users, as shown in Fig. 14.In contrast, UDP-basedGestureFlowcan easily achieve NATtraversal in cellular networks, and as a result achieve shorterend-to-end delays.

0 100 200 300 400Delay (msec)

0.0

0.2

0.4

0.6

0.8

1.0

CD

F

Mean=102.6

Mean=183.6

GestureFlowTCP Relay

Fig. 13. CDF of gesture recogniz-ing delays between Wi-Fi users usingGestureFlowand TCP Relay.

0 100 200 300 400 500 600 700Delay (msec)

0.0

0.2

0.4

0.6

0.8

1.0

CD

F

Mean=253.3

Mean=485.1


Fig. 14. CDF of gesture recognizingdelays between 3G users usingGes-tureFlow andTCP Relay.

Furthermore, we would like to evaluate the performanceof GestureFlowin a real-world scenario with four users in“all-to-all” broadcast sessions, with one iPad user connectingto the Internet through campus Wi-Fi, one iPad user usingthe household Wi-Fi access point, and two iPhone 4S usersconnecting to the Internet through 3G and EDGE, respectively.Similar to the two-user experiments, these four devices areconnected to different ISPs and located in different locationsin the same city. Table III summarizes the average gesturerecognizing delays in bothGestureFlowand TCP Relaybe-tween each pair of devices, over20 runs of experiments. It isclear thatGestureFlowachieves better performance: gesturerecognizing delays are23−52% shorter compared to those inTCP Relay.

Next, we evaluate an important design choice adopted inGestureFlow: the use of multiple paths between the source andeach receiver to minimize end-to-end delays. Fig. 15 shows the

9

TABLE IIICOMPARISONS OFGESTURERECOGNIZINGDELAYS (MSEC) USING

GESTUREFLOW AND TCP RELAY.

GestureFlow Wi-Fi 1 Wi-Fi 2 3G EDGE/ TCP Relay

Wi-Fi 1 − 103/178 192/376 309/517Wi-Fi 2 89/184 − 167/257 274/415

3G 224/391 188/294 − 364/493∗

EDGE 347/487 301/428 398/519∗ −

Note: there is no direct TCP connection between cellular devices due toNAT restrictions. Relay paths have been used as a result.

average percentage of blocks a node receives from relayingnodes in bothGestureFlowand TCP Relayin the four-user“all-to-all” streaming scenario, along with the95% confidenceinterval. We observe that inGestureFlow, Wi-Fi and 3G usershave more than10% of the received blocks from relayingnodes, and up to30% of received blocks are from relayingnodes for the EDGE user, due to longer network delays ondirect EDGE links. As a result, EDGE users rely more onrelay paths that have shorter delays than those direct ones.However, only a small percentage of blocks are observed fromrelaying nodes inTCP Relay, especially for the EDGE user,which indicates that it fails to take full advantage of multiplepaths asGestureFlowdoes.

Wi-Fi 3G EDGE

Receiver's Connection Types

0

510

15

20

25

30

35

40

Dat

a fr

om R

elay

(%

) GestureFlowTCP Relay

Fig. 15. The percentage of blocksfrom relaying nodes over all receivedblocks in GestureFlowand TCP Re-lay.

4 6 8 10 12 14Number of Nodes

0

50

100

150

200

250

300

350

Del

ay (

ms)


Fig. 16. The average of gesturerecognizing delays with different net-work sizes usingGestureFlow andTCP Relay.

To investigate the scalability ofGestureFlow, we furtherstudy the correlation between the gesture recognizing delayand the number of participating users. Note that all par-ticipating nodes are Wi-Fi users in this experiment. Shownin Fig. 16, as the number of nodes increases, the averageof gesture recognizing delays inGestureFlowvaries mildlyaround100 ms. We can even observe a slight decrease inthe gesture recognizing delay when the number of nodes islarge in GestureFlow, e.g., 14 nodes, which is due to anincreased number of relay paths that may provide shorter end-to-end delays. In contrast, the average of gesture recognizingdelays inTCP Relayincreases significantly when the systemscales up, which is mainly due to congested TCP connectionsthat are overwhelmed by relayed blocks. Such an observationimplies that a set of complex relay selection and rate controlalgorithms are required in TCP-based gesture streaming, asopposed to the simpler design of inter-session network codingin GestureFlow.

We also investigate the bandwidth overhead inGestureFlow

by evaluating the difference between the gesture streamingbit rate, which is computed as the average of four broadcastsessions, and the upload bit rate per user, which is definedas the average upload bit rate each user devotes to everybroadcast session. As shown in Fig. 17, the gap between thesetwo curves becomes wider as the streaming bit rate becomeshigher. The reason is that during bursty periods, every userhas to contribute more bandwidth to upload coded blockscontaining blocks from other sessions, which introduces morebandwidth overhead. Yet, the bandwidth overhead for eachuser is less than5 kbps in general, which is reasonable. Itis critical to point out that even with overhead considered,the upload bit rate per user is only about8 kbps on average,which is fairly low in streaming systems. This verifies ourdesign philosophy that bandwidth is not a major concern inGestureFlow.

0 50 100 150 200 250 300Time (sec)

0

5

10

15

20

Bitr

ate

(kbp

s)

Touch Upload

Fig. 17. Bandwidth overhead per user.

B. User Experience Evaluation

Although our experimental results have so far shown thatGestureFlowis able to provide a better Quality of Experiencefor gesture streaming in interactive media applications byproviding shorter gesture recognizing delays, it may not yet befully convincing. We are more interested in the actual feedbackfrom real-world users whenGestureFlowis in use, since itreflects the Quality of Experience directly. As a result, we haveconducted a series of experiments to capture user feedback.We invite users to useMusicScorein a two-user interactivemusic composition session over Wi-Fi hotspots. They areasked to rate gesture recognizing delays they have experiencedin a 5-min interactive collaboration session, selecting from 4categories: 1) delay is too long,i.e., not usable from a user’sperspective; 2) delay is long, but still tolerable; 3) satisfying,but with a noticeable delay; 4) excellent.

0 100 200 300 400Delay (msec)

1

2

3

4

Rat

ing

Fig. 18. User experience ratings ofdifferent gesture recognizing delaysin MusicScoreusingGestureFlow.

0 200 400 600 800Delay (msec)

1

2

3

4

Rat

ing

Fig. 19. User experience ratings ofdifferent gesture recognizing delaysin MusicScoreusingTCP Relay.

10

In Fig. 18 and Fig. 19, we show gesture recognizing delaysof collaborative sessions and their corresponding user expe-rience ratings, whenGestureFlowand TCP Relayare in usethrough Wi-Fi connections, respectively. Clearly,GestureFlowis able to garner higher user experience ratings with shortergesture recognizing delays compared toTCP Relay. Similartrends are also observed with other connection types. Whilemost Wi-Fi users reported better collaboration experienceswhen gesture recognizing delays are shorter than150 msec,users are observed to be more tolerable to longer delays in 3Gnetworks. Our evaluation results show that users tend to givea rating of4 even though their experienced delays are around300 msec in 3G networks. This can be explained as usersusually expect 3G networks to be slower than Wi-Fi. Validatedby both shorter gesture recognizing delays and higher userexperience ratings, we believe thatGestureFlowis able toachieve a satisfactory Quality of Experience.

C. Performance of Network Coding

Since we apply network coding inGestureFlow, it is impor-tant to justify this design choice. Fig. 20 shows the relationshipbetween the maximum coding window sizeW and the gesturerecognizing delay. We observe that the gesture recognizingdelay increases whenW is getting either smaller or larger,and reaches its minimum whenW equals to8. The underlyingreason is that whenW is set to be too small, the sourceneeds acknowledgments for almost every block to advance thecoding window. Subsequent blocks have to wait a longer timebefore they can be coded and transmitted, which increases thedelay, especially in bursty periods. On the other hand, ifW

is too large, the received coded blocks always contain codingcoefficients for newly coded blocks, which increases the delayin the decoding process.

2 4 6 8 10 15 20Maximum Coding Window Size W

0100200

300400

500600700

Del

ay (

ms)

Wi-Fi 3G EDGE

Fig. 20. The average delays along with95% confidence intervals in differentexperiment settings.

As an adaptive maximum coding window size adjustmentmechanism is specifically designed for QoE-aware gesturestreaming, we would also like to verify its effectivenessthrough performance evaluation. In Fig. 21, we compare thegesture recognizing delays when the adaptive maximum cod-ing window size adjustment mechanism is on and off amongWi-Fi users. Clearly, benefited from the shrunk maximumcoding window when gestures are recognized, the averagegesture recognizing delay is reduced from121.5 msec to102.6 msec. Similar results are also observed in measurementsamong users with other Internet connection types.

Having evaluated the coding window size and its adaptiveadjustment mechanism, we now proceed to observe the actualnumber of blocks to be coded at each node. We plot the CDFfor the number of original blocks and the number of relayedblocks at each node, which are shown in Fig. 22 and Fig. 23,respectively. From Fig. 22, we can see that90% of time thereis no more than5 original blocks to be coded at a node. Thisindicates that most of the time there is very little delay addedin both the encoding and decoding processes, as blocks donot have to wait too long to be transmitted or relayed. Theunderlying reason is thatGestureFlowhas very bursty traffic.Since users remain idle most of the time, the actual codingwindow size is naturally reduced. Similarly, Fig. 23 showsthat 90% of the coding windows have a size of no morethan 11 blocks, with an average of around4 blocks. Thisindicates that, in general, there is only one block or two fromeach broadcast session required to be recoded at the relayingnode, which justifies the use of inter-session network coding.By mixing a limited number of coded blocks from multiplesessions together, recoded blocks generated by relaying nodesare useful to downstream receivers with high probability.

Another concern when applying network coding is its CPUload and memory usage, which are mainly introduced byGauss-Jordan elimination in the decoding process. We havemeasured the CPU load and memory usage over time at aniPhone 3GS node, with results shown in Fig. 25. As we cansee, the average CPU usage is8.4%, with peaks correspondingto bursty bit rates in Fig. 17. The dashed line shows thememory usage over time, which is2.4% on average. iPadnodes have even lower CPU loads as they enjoy a higherCPU frequency in their Cortex A8 architecture, and the samememory usage as the iPhone 3GS (both have256 MB ofmain memory). As such, the CPU load and memory usageof network coding inGestureFloware acceptable.

It is critical to point out that coded blocks in networkcoding, either from source nodes or relaying nodes, are con-sidered useful only when they arelinearly independentwithone another, or else they are regarded as redundant blocks.The ratio of linear dependence among coded blocks withdifferent coding window sizeW is also investigated, whenevaluating the performance of network coding. For a specificdata block, its linear dependence is computed as the percentageof linearly dependent blocks over the total number of coded

50 100 150 200Delay (msec)

0.0

0.2

0.4

0.6

0.8

1.0

CD

F

Mean=102.6

Mean=121.5

WithWithout

Fig. 21. A comparison of gesturerecognizing delays betweenGesture-Flow with and without the adap-tive coding window size adjustmentmechanism.

0 5 10 15 20Number of Original Blocks

0.6

0.7

0.8

0.9

1.0

CD

F

90th percentile

W=8W=10W=20

Fig. 22. CDF of the number oforiginal blocks to be coded at a nodewith different choices of the maxi-mum coding window size (W ).

11

0 5 10 15 20 25 30Number of Relayed Blocks

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0C

DF

90th percentile

11 blocks

Mean = 3.8 Median = 2.0

Fig. 23. CDF of the number ofrelayed blocks to be coded at a node.

0 5 10 15 20Linear Dependence (%)

0.0

0.2

0.4

0.6

0.8

1.0

CD

F

W=8W=10W=20

Fig. 24. CDF of linear dependencein different settings of the maximumcoding window size.

0 50 100 150 200 250 300Time (sec)

0

5

10

15

20

Usa

ge (

%)

CPU Memory

Fig. 25. The CPU load and memory usage of network coding in an iPhone3GS device.

blocks involving the data block. By plotting the CDF of lineardependence of all coded blocks in our experiment in Fig. 24,we find out that in90% of them, around15% of blocks arelinearly dependent, which is an alarmingly high percentage.The percentage of linear dependence is even higher when thecoding window size is becoming smaller,e.g.,it becomes18%whenW = 8.

A high percentage of linear dependence among codedblocks implies a large portion of redundant blocks, whichunnecessarily consumes bandwidth. Though we emphasizethat gesture streams typically incur very low bit rates, theyare highly bursty as well. Shown in Fig. 3, the bursty bitrate reaches10 kbps in a session. The bandwidth waste dueto linearly dependent blocks may escalate with concurrentbroadcast sessions. More importantly, a high percentage oflinear dependence may result in longer gesture recognizingdelays as nodes have to wait for more useful blocks to decodea gesture. With QoE awareness, we need to carefully analyzeand address the challenge of linear dependence.

V. THE PROBLEM OF L INEAR DEPENDENCE

A. Analyzing the Effects of Linear Dependence on QoE

In this section, we show theoretical insights on how lineardependence among coded blocks negatively affects the Qualityof Experience of users by increasing their gesture recognizingdelays.

We first describe our system model formally. Assume thatN

users participate in an gesture broadcast session. Each of themis not only a source that generates gestures, but also a receiverand a relay of blocks from other sessions. The network ismodelled as a directed acyclic graphG = (V , E). V is the set ofnetwork nodes that represent participating users,i.e., |V| = N .E is the set of network links. The links are characterized by

the total link capacityeij expressed in blocks per second, andthe average packet loss rateπij , where(i, j) ∈ E denotes thelink between the nodesi and j in V . We denote the averagepercentage of linear dependence among coded blocks byld.

It is obvious that the gesture recognizing delay dependson the average packet loss rate and the percentage of lineardependence among coded blocks. To be exact, the expecteddelay observed at each node can be computed by estimatingthe average number of blocks that it receives before it candecode those gestures. LetDi be the average delay observedat nodei for receiving a sufficient number of blocks such thatit can decode gestures in coding windows of all other users.Di has the form of

Di = di

∞∑

k=(N−1)W

kPi(k).

In this equation,k is the number of blocks that nodeireceives before it can decode the gestures;Pi(k) denotes theprobability of decoding these gestures after receiving exactly k

blocks; the constantdi denotes the average delay for receivingone block and can be approximated asdi = 1

∑

j∈V−ie−i

,

where V−i is the set of nodes inV without node i, i.e.,V−i = V \ {i}. Note thatk includes all coded blocks thatcan be either linearly dependent or independent from otherblocks.

Since at mostW original blocks, including all receivedcoded blocks from otherN−1 broadcast sessions, are allowedto be coded together to produce a new coded block at eachnode, the minimum number of blocks needed for decodinggestures from all other users equals to(N − 1)W . That isto say, the probability of decoding with fewer blocks than(N−1)W equals0. Hence, the probabilityPi(k) of decodinggestures from all other users with exactlyk blocks correspondsto the probability of forming a full rank system upon receivingthe kth block but not before that. Analytically,

Pi(k) =

(

k − 1

k − (N − 1)W

)

p(N−1)Wi (1− pi)

k−(N−1)W ,

wherepi represents the probability that a useful block arrivesat nodei.

Since a block is considered useful if it is not lost due topacket erasures and it is linearly independent to coded blocksthat a node has already received, the probabilitypi can berepresented by the link capacity, the packet loss rate on eachlink, as well as the average percentage of linear dependenceamong coded blocks. It takes the following form:

pi =(1 − ld)

∑

j∈V−ieji(1− πji)

∑

j∈V−ieji

.

More formally, the probability that a useful block arrives ateach node is defined as the fraction of total useful blocksarrived over its incoming bandwidth capacity.

From our analysis, we can see that when the percentageof linear dependence among coded blocksld increases, theprobability pi of a useful block arriving at each node willdecrease. This then results in an increase of probabilityPi(k),since ∂Pi(k)

∂pi< 0. As a consequence, the average gesture

recognizing delayDi observed at each node will increase.

12

B. Mitigating Linear Dependence

To mitigate the high percentage of linearly dependent blocksthat incur longer gesture recognizing delays, we are inspiredby systematic Reed-Solomon codes, and propose to generatecoding coefficients for the original blocks at each node basedon the Vandermonde matrix inGestureFlow.

With coding coefficients generated by the Vandermondematrix, each node codes original blocks first, rather than codescoded blocks belonging to its own session from the onset.These original blocks can be seen as a special case of codedblocks, with coding coefficients as rows in an identity matrix.After coding all original blocks, a node starts to generate andcode coded blocks from its own session. In order to codek original blocks using an(n, k) Vandermonde matrix overa Galois fieldFq, a node is able to generate up ton − k

coded blocks, after original blocks are coded. InGestureFlow,a (n−k)×k Vandermonde matrixG [3] of the following formis used to generate these coded blocks:

G =

1 1 1 · · · 1

1 2 3 · · · k

1 22 32 · · · k2

......

.... . .

...

1 2n−k−1 3n−k−1 · · · kn−k−1

.

Since matrixG is a Vandermonde matrix, it is easy to seethat anyk × k submatrix ofG has a non-zero determinantand is nonsingular, and as a result every subset ofk rowsof G is guaranteed to be linearly independent. As such, linearindependence among all original blocks coded from the sourceis guaranteed with the use of the Vandermonde matrix. Theseblocks from the source are always innovative once received.

Compared with random linear codes, one difference ofusing the Vandermonde matrix at the source is that it isnot rateless. With the (n − k) × k Vandermonde matrixG, a maximum ofn − k coded blocks can be coded, inaddition to the original blocks. In contrast, with a randomizedgeneration of code vectors, random network coding is ableto produce a practically infinite number of coded blocks toensure successful decoding with any erasure channel.

In practice, though, this is not a serious limitation inGestureFlow. We have shown in Sec. IV-C that the optimalcoding window size,W , is 8, which implies thatk ≤ 8, andthe receiver is able to decode successfully as long ask linearlyindependent blocks — original or coded — are received. SinceW is set to be so small, even if a standard size of the Galoisfield q = 256 is used, and Galois field arithmetic is performedon GF(256) during coding,n can still be chosen to be aslarge asq − 1 = 255, which means that the code used is a(255, k) code wherek ≤ W . This is indeed a linear code witha very low rate, and implies that decoding will be successfulwith high probability. In situations where the packet loss rateis so high that fewer thank linearly independent blocks arereceived, the session is considered to be terminated.

Other parts of the transport protocol inGestureFlowremainsthe same, in a sense that the cumulative acknowledgments,progressive decoding, relay paths, and inter-session network

coding are still adopted. Note that each node still uses randomlinear network coding to generate coding coefficients whenrecoding received blocks, so that there are no restrictionsimposed on the actual number of coded blocks in the codingprocess.

C. Evaluating the Use of the Vandermonde Matrix

0 5 10 15Linear Dependence (%)

0.0

0.2

0.4

0.6

0.8

1.0

CD

F

NewOriginal

Fig. 26. CDF of linear dependencewith the Vandermonde matrix used atthe source nodes (W = 8).

2 3 4 5 6 7 8 9Window Size

0

5

10

15

20

Line

ar D

epen

denc

e (%

)

NewOriginal

Fig. 27. Linear dependence withdifferent coding window sizes.

The effectiveness of using the Vandermonde matrix at thesource inGestureFlowis evaluated with additional experi-ments with MusicScore. Fig. 26 shows the comparison ofCDFs of linear dependence between the originalGestureFlowdesign and the use of Vandermonde matrix to mitigate lineardependence. The maximum coding window sizeW is set tobe 8. It is clear that the90th percentile of linear dependenceis significantly reduced by using the new design, from18% to6%. Since blocks coded using the Vandermonde matrix fromthe source are guaranteed to be linearly independent with eachother, the linear dependence is caused by the recoding processin relaying nodes, which is acceptably low. The ratio of lineardependence with different coding window sizes is exploredin Fig. 27. We can see that the ratio of linear dependenceis decreasing as the coding window size increases, and theratio with the Vandermonde matrix used at the source is muchsmaller than the originalGestureFlowdesign.

Since the most critical design objective inGestureFlowisto satisfy a stringent delay requirement, we compare gesturerecognizing delays and their standard deviations from the Wi-Fi 1 User to other three users by using the new and originalGestureFlowdesigns, respectively, shown in Table IV. Wecan observe that by using Vandermonde matrix as codingcoefficients at the source, average gesture recognizing delaysthrough different kinds of connections have all been evidentlyreduced. Since the redundancy due to linear dependence ismitigated with the Vandermonde matrix at the source, areceived block can be used to decode with a higher probability,which reduces the decoding delay.

TABLE IVGESTURERECOGNIZINGDELAYS (MSEC) AT WI -FI 1 USER WITH NEW

GESTUREFLOW DESIGN.

(x, s) Wi-Fi 2 3G EDGE

New (89, 23) (177, 87) (287, 168)Original (103, 48) (192, 104) (309, 191)

13

In summary, our experiments in Sec. IV and Sec. V haveevaluated our important design decisions made inGesture-Flow, with the objective of reducing gesture recognizingdelays. Our results have confirmed that gesture recognizingdelays are effectively reduced with our proposed protocol inGestureFlow, and that it scales well when the number ofparticipating nodes increases.

VI. RELATED WORK

With the inception of network coding [4] and randomnetwork coding [1], [2] in information theory, the topic hasattracted a substantial amount of research attention. Analyticalstudies [2], [4] have shown that network coding is able tomaximize information flow rates in multicast sessions in directacyclic graphs. In more practical systems, Gkantsidiset al. [5],[6] have shown that the use of random network coding in peer-to-peer file sharing systems can reduce the time to downloadfiles. Annapureddyet al. [7], [8] have evaluated the useof network coding in experimental peer-to-peer on-demandstreaming systems, and have shown that network coding helpsto achieve good performance with respect to the sustainableplayback rate and system throughput. UUSee, Inc. has suc-cessfully adopted network coding into its commercial peer-assisted on-demand streaming protocol [9]. Different fromthese applications, the use of network coding inGestureFlowis specifically designed for streaming low bit-rate traffic fromgesture events, and for ensuring reliable delivery with lowdelays.

With respect to the design of transport protocols, RTP [10]and RTSP [11] are able to provide one-to-all delivery of datawith real-time properties over IP multicast. Since RTP andRTSP are originally designed for IP unicast, reliable multicastprotocols [12], [13] were proposed to improve the performanceof multi-party streaming, by reducing ACK/NAK implosionin back traffic and optimizing retransmissions on multicastchannels. In addition, network coding has been applied toincorporate with transport protocols. For example, CodeCast,presented by Parket al. [14], is a network coding basedmulticast protocol for low-latency multimedia streaming.Sun-dararajanet al. [15] have proposed a modified acknowledg-ment mechanism to incorporate network coding into TCP, withthe objective of providing better support to a unicast session.In their solution, the number of blocks involved in the sender’ssliding window is completely controlled by TCP. The receiveracknowledges the degree of freedom of the coefficient matrixof coded blocks received so far. Network coding is used ina separate underlying layer as a rateless erasure code, and isdecoupled from window-based flow control in TCP.

In comparison,GestureFlowis remarkably different. It isdesigned specifically for multiple interactive broadcast ses-sions, each involving a stream of gesture events, over reg-ular IP unicast. Acknowledgments inGestureFlowserve thepurpose of indicating to the source when an original blockhas been correctly decoded by all receivers in a broadcastsession. Rather than leaving the control of the sliding windowat the source to TCP, theGestureFlowdesign dictates veryspecific rules about how the coding window advances itself,

and about what the maximum window size is. Receivers aremore conservative in that they only acknowledge blocks thatare completely decoded, and in a cumulative fashion.

The performance improvement brought by inter-sessionnetwork coding has drawn some recent research attention inthe literature. Eryilmazet al. provide a theoretical frameworkin which a dynamic routing-scheduling-coding strategy isproposed to decide whether blocks from two sessions shouldbe coded together at a node [16]. Yanget al.propose to dividemultiple sessions into groups and construct a linear networkcoding for each group, with consideration of improving thesystem’s benefits on bandwidth and throughput [17]. Focusedon directed networks with two multicast sessions, Wangetal. discuss various aspects of pairwise inter-session networkcoding, including the sufficiency of linear codes and the com-plexity advantages of identifying coding opportunities [18].I2NC combines inter-session and intra-session network codingto improve the throughput in lossy wireless environments [19].In contrast to previous research, the primary objective inGestureFlowis to reduce the gesture recognizing delay as aQoE metric in interactive multimedia applications, which hasnot been the focus of study in previous work.

VII. C ONCLUDING REMARKS

We are firm believers that gestures represent a new paradigmfor users to interact with mobile devices, and that socialand collaborative aspects of gesture-intensive applications willusher in an era ofstreaming gesture events live, so thatapplications do not need to design and implement custom-tailored solutions. We are intrigued by the very low yet burstybit rates when streaming gesture events over the Internet, asshown in a real-world application —MusicScore— that wehave developed from scratch to compose music collaborativelyon mobile devices such as the iPad. Such low streaming bitrates, coupled with the need for guaranteed reliability, lowgesture recognizing delays, and multiple concurrent broadcastsessions when multiple users are involved, have brought usbrand new but very practical challenges that need to beaddressed with a new transport solution.

While designing theGestureFlowframework, we have trieda number of alternative designs, governed by the principlesofsimplicity and practicality. This paper presents our design ofusing random network coding with multiple paths, allowingfor recoding across multiple concurrent sessions. We intendto present not onlyhow our design inGestureFlowworks,but also why we have chosen such a design. The use ofnetwork coding has simplified our design and implementation,making them more practical. In closing, we are in the hopethat this paper only represents the first step towards a matureframework that facilitates the streaming of gestures, so thatusers interact with one another in a simple and transparentfashion to create or consume multimedia content, whereverthey may be around the world.

REFERENCES

[1] P. Chou, Y. Wu, and K. Jain, “Practical Network Coding,” in Proc.Allerton Conference on Communications, Control and Computing, 2003,pp. 40–49.

14

[2] T. Ho, R. Koetter, M. Medard, D. R. Karger, and M. Effros,“TheBenefits of Coding over Routing in a Randomized Setting,” inProc.IEEE Int’l Symposium on Information Theory (ISIT), 2003, p. 442.

[3] R. M. Roth, Introduction to Coding Theory. Cambridge UniversityPress, 2006.

[4] R. Ahlswede, N. Cai, S.-Y. Li, and R. W. Yeung, “Network InformationFlow,” IEEE Trans. Info. Theory, vol. 46, no. 4, pp. 1204–1216, 2000.

[5] C. Gkantsidis, J. Miller, and P. Rodriguez, “Comprehensive View ofa Live Network Coding P2P System,” inProc. Internet MeasurementConference (IMC ’06), 2006, pp. 177–188.

[6] C. Gkantsidis and P. Rodriguez, “Network Coding for Large ScaleContent Distribution,” inProc. IEEE INFOCOM, vol. 4, 2005, pp. 2235–2245.

[7] S. Annapureddy, S. Guha, C. Gkantsidis, D. Gunawardena,and P. Ro-driguez, “Is High-Quality VoD Feasible Using P2P Swarming?” inProc. Int’l WWW Conference, 2007, pp. 903–912.

[8] ——, “Exploring VoD in P2P Swarming Systems,” inProc. IEEEINFOCOM, pp. 2571–2575.

[9] Z. Liu, C. Wu, B. Li, and S. Zhao, “UUSee: Large-Scale OperationalOn-Demand Streaming with Random Network Coding,” inProc. IEEEINFOCOM, 2010, pp. 1–9.

[10] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: ATransport Protocol for Real-Time Applications. RFC 1889, 1996.

[11] H. Schulzrinne, A. Rao, and R.Lanphier,Real Time Streaming Protocol(RTSP). RFC 2326, 1998.

[12] S. Floyd, V. Jacobson, C.-G. Liu, S. McCanne, and L. Zhang, “A Re-liable Multicast Framework for Light-Weight Sessions and ApplicationLevel Framing,” IEEE/ACM Trans. Networking, vol. 5, pp. 784–803,1997.

[13] S. Paul, K. Sabnani, J.-H. Lin, and S. Bhattacharyya, “Reliable MulticastTransport Protocol (RMTP),”IEEE J. Sel. Areas Comm., vol. 15, no. 3,pp. 407–421, 1997.

[14] J.-S. Park, M. Gerla, D. Lun, Y. Yi, and M. Medard, “CodeCast:A Network-Coding-Based Ad Hoc Multicast Protocol,”IEEE WirelessCommunications, vol. 13, no. 5, pp. 76–81, 2006.

[15] J. K. Sundararajan, D. Shah, M. Medard, M. Mitzenmacher, and J. Bar-ros, “Network Coding Meets TCP,” inProc. IEEE INFOCOM, 2009,pp. 280–288.

[16] A. Eryilmaz and D. S. Lun, “Control for Inter-Session Network Coding,”in Proc. Workshop on Network Coding (NetCod), 2007.

[17] M. Yang and Y. Yang, “A Linear Inter-Session Network Coding Schemefor Multicast,” in Proc. 7th IEEE Int’l Symposium on Network Comput-ing and Applications, 2008, pp. 177–184.

[18] C.-C. Wang and N. Shroff, “Pairwise Inter-Session Network Coding onDirected Networks,”IEEE Trans. Info. Theory, vol. 56, no. 8, pp. 3879–3900, 2010.

[19] H. Seferoglu, A. Markopoulou, and K. Ramakrishnan, “I2NC: Intra- andInter-Session Network Coding for Unicast Flows in WirelessNetworks,”in Proc. IEEE INFOCOM, 2011, pp. 1035–1043.

Yuan Feng received her B.Engr. from the Schoolof Telecommunications, Xidian University, Xi’an,China, in 2008, and her M.A.Sc. degree from theDepartment of Electrical and Computer Engineering,University of Toronto, Canada, in 2010. She iscurrently a Ph.D. candidate in the Department ofElectrical and Computer Engineering, University ofToronto. Her research interests include optimizationand design of large-scale distributed systems andcloud services. She is a student member of IEEE.

Zimu Liu received his B.Engr. degree from the De-partment of Computer Science, Nanjing Universityof Posts and Telecommunications, Nanjing, China,in 2007, and his M.A.Sc. degree from the Depart-ment of Electrical and Computer Engineering, Uni-versity of Toronto, Canada, in 2009. He is currently aPh.D. candidate in the Department of Electrical andComputer Engineering, University of Toronto. Hisresearch interests include design and implementationof reusable mobile frameworks, measurement studieson large-scale peer-to-peer streaming systems, and

applications of network coding. He is a student member of IEEE.

Baochun Li received the B.Engr. degree fromthe Department of Computer Science and Technol-ogy, Tsinghua University, China, in 1995 and theM.S. and Ph.D. degrees from the Department ofComputer Science, University of Illinois at Urbana-Champaign, Urbana, in 1997 and 2000. Since 2000,he has been with the Department of Electrical andComputer Engineering at the University of Toronto,where he is currently a Professor. He holds the Nor-tel Networks Junior Chair in Network Architectureand Services from October 2003 to June 2005, and

the Bell University Laboratories Endowed Chair in ComputerEngineeringsince August 2005. His research interests include large-scale multimedia sys-tems, cloud computing, peer-to-peer networks, applications of network coding,and wireless networks. Dr. Li was the recipient of the IEEE CommunicationsSociety Leonard G. Abraham Award in the Field of Communications Systemsin 2000. In 2009, he was a recipient of the Multimedia Communications BestPaper Award from the IEEE Communications Society, and a recipient of theUniversity of Toronto McLean Award. He is a member of ACM and aseniormember of IEEE.

gestureflow: qoe-aware streaming of multi-touch gestures ...liu.zimu.ca/files/zliu-jsac12.pdf · 1...

Documents