managed cdn

Managed CDN – Optimizing the Behavior of Hierarchical VOD

Robert Duzett

Principal Architect

Arris Group

1825 NW 167th Pl., Beaverton, OR 97006

503‐495‐9267 [email protected]

Jeremy Craven

Senior Architect

Arris Group

1825 NW 167th Pl., Beaverton, OR 97006

503‐495‐9349 [email protected]

1

Managed CDN – Optimizing the Behavior of Hierarchical VOD

Robert Duzett – Principal Architect, Arris Group

Jeremy Craven – Senior Architect, Arris Group

Introduction

Deep Content The growth in VOD over many years has precipitated continuous architectural evolution to absorb huge increases in content libraries and streaming demand.

Content libraries today are generally several thousand hours of pre‐packaged VOD titles, combined increasingly with short‐term, time‐shifted content from services like StartOver. A rapidly expanding ratio of HD content, a much wider spectrum of licensed titles, and a diversification of content categories (regional, local, TV shows, & viral sources) are all pushing the total content library size into the tens of thousands of hours. This trend is expected to continue unabated and even accelerate to levels of hundreds of thousands of hours of SD‐equivalent content in the future.

The prevalent industry approach to accommodating this growth involves large CDN libraries, wide‐ranging hierarchy (tiers of storage and servers of various kinds and sizes), and 3‐screen delivery.

Architectural Tools Architectural principles that will facilitate and enable deep content libraries, and will be developed to some degree in this paper, are:

• Hierarchical content – an architectural approach to content management that involves partitioning or copying portions of a content library across multiple tiers of storage, often split across geographical or local boundaries. For example, national or regional central libraries may be shared by, or cached at, multiple metro head‐ends and/or further distributed to local edge servers.

• Heterogeneous content – a method of storing unique video programs at multiple disparate servers or edge sites, such that, in its extreme/strictest application, every server has unique content and every video program is stored at only one server. In its practical application, the need for redundancy and balanced loads results in some content overlap across servers.

• Network hierarchies; Content Delivery Networks (CDNs); Caching and Content Management

2

Storage & networks Throughout this deep‐content, large‐VOD architectural space, at all levels of hierarchy and design, one continuously runs into a major axis of freedom – storage vs. network tradeoffs – that seems to drive every major decision. Many of these tradeoffs are referenced at various points in this paper.

Three areas of focus – optimizing hierarchical VOD This paper will focus on identifying and applying architectural principles, design tradeoffs, and nuts‐and‐bolts lessons and tools to three general areas where they can significantly impact the efficiencies of hierarchical VOD solutions. The three areas of focus are mapped to three separate chapters:

1) Optimizing Cache Behavior 2) Optimizing Network Behavior 3) Applying Storage and Network tradeoffs to multi‐tier design

Simulations ‐ In each of the three areas, real‐world data applied to simulations will quantify and illustrate the highlighted approaches. The data driving the simulations are very accurate representations of real subscriber behavior because they come from detailed, session‐by‐session play logs and asset directories taken over 7–14 days from multiple active VOD deployments in the field. This data can be applied in many ways against any number of architectural scenarios and configurations to give an accurate picture of design consequences.

Chapter 1: Optimizing Cache Behavior

Understanding how Usage Profiles change with Library Growth A common and overused rule of thumb for characterizing the VOD content usage profile says that 80% of streams will come from the most popular 20% of the content library. However, as libraries have grown over the years, this handy rule has proven less and less useful. When a library doubles in size, its profile shape does not scale linearly with it. There are three possible dynamics driving the change in profile in two opposite directions:

1) The most significant growth in a content library could be niche content. A smaller portion of the total library (but close to the same absolute number of titles) is significantly popular while a larger portion is “long‐tail”, i.e. programs that are rarely watched. This dynamic tends to keep the front‐end of the profile curve static (against absolute content hours).

2) The increase in library size tends to broaden the choice for individual subscribers, creating additional demand for long‐tail niche content while reducing somewhat the demand for commonly popular titles. This dynamic tends to flatten out the front‐end of the profile curve (against absolute content hours).

3) The increase in HD mix, as well as additional title versions in 3‐screen formats, tends to flatten the curve simply because the average title is larger (we’re assuming here that the profile curve is mapped against SD‐equivalent hours, not bit‐rate invariant hours).

3

It is important to know which of these dynamics bears the most sway and by how much. In order to safely project caching efficiencies into the future, one must project changes in content usage profiles, which can only be achieved by understanding profile sensitivity to library growth.

Content libraries have grown rapidly in the last few years, from a couple thousand hours in 2006 to 15,000–20,000 hours in 2010. Projections into the future anticipate rapid growth to 100,000 hours and more. The graph below models content usage profiles for a series of ever‐larger libraries.

0%10%20%30%40%50%60%70%80%90%100%

0 1000 2000 3000 4000

% of streams

Content hours (sorted by popularity)

Profile trends as Content Library grows

1.7k hrs (2006)5k hrs (2008)10k hrs (2010)15k hrs (2010)

As libraries continue to grow, profile curves continue to flatten in absolute terms, but not linearly with the magnitude of the library growth. For example, if the most popular 20% of a 1000‐hour library (i.e. 200 hours) sources 80% of the streams, it does not follow that a full 20% of a 5000‐hour library (i.e. 1000 hours) is required to source 80% of its streams. Instead, 13% (i.e. 650 hours) may be sufficient. For the profiles shown above, the “80/20” rule is actually as follows (keeping the “80” numerator fixed):

1700 hours (2006) 80/17 (80% of plays from 17% of the content) 5000 hours (2008) 80/13 (80% of plays from 13% of the content) 10000 hours (2010) 80/10 (80% of plays from 10% of the content) 15000 hours (2010) 80/9 (80% of plays from 9% of the content)

So, in absolute terms the profiles continue to flatten (after all, the 80% streaming mark grew from 200 hours to 650 hours in the example), but not linearly with library growth (library grew 5x, 80% mark increased by 3.25x). There is insufficient data to say whether the flattening is starting to slow as libraries become quite huge, but it is likely there is an asymptote beyond which library growth will have little or no effect on absolute content hours required to source 80% of the streams. Eventually one might expect the ever‐growing long tail to be increasingly filled with “almost‐never‐viewed” titles. The number of viewers, the daily viewing time, the average viewing bit‐rate, and the portion of viewing time that is devoted to niche content will all eventually stabilize even though content libraries could continue to grow. More sophisticated models of subscriber video‐viewing behavior in the presence of huge libraries, 3‐screen capabilities, and so on are required to make any reasonable guesses as to the location of such

4

an asymptote. Could it occur at library sizes of 20,000 hours? 50,000 hours? When does dynamic #1 fully overtake dynamic #2? Dynamic #3 (HD mix, other formats) will obviously take care of itself as HD & 3‐screen penetrations approach near‐saturation levels.

Another important factor in usage profile dynamics is marketing. There is anecdotal evidence that relatively simple changes in a VOD marketing campaign can have a big impact on both take‐rates and profile shape.

Choosing appropriate Cache update/replacement mechanisms The content usage profiles examined above are static representations of a library’s typical usage over some large period of time – 1 day or 1 week for example. Calculated over smaller periods of time more in line with the dynamics of actual VOD streaming behavior, the content library’s usage profile is seen to change constantly, day by day and hour by hour – the most popular programs right now are not necessarily so popular tomorrow or even later today, and newly available or suddenly relevant programs could become popular rather quickly. In order to maintain high cache effectiveness (hit‐rate), the cache content must be updated regularly, according to the logic of a cache replacement mechanism. A cache replacement mechanism encompasses algorithms and policies that attempt to optimize caching behavior and/or cost. For this analysis, we will consider three different cache update mechanisms.

• LRU replacement – LRU (“Least‐Recently‐Used”) is a simple and popular approach that inserts a program into the cache immediately on first‐miss, making room for it by replacing one or more programs, currently in the cache, that were “least‐recently‐used”. This algorithm is very responsive to changes in content usage behavior, but threatens to “thrash” or “churn” the cache content, especially for relatively small caches.

• LFU replacement – LFU (“Least‐Frequently‐Used”) maintains usage counts for all the programs and keeps the most popular ones in the cache. A program is inserted into the cache when its count reaches that of the “least‐frequently‐used” program (the one with the lowest count) currently resident in the cache. The responsiveness of this algorithm to changing behavior can be tuned by adjusting any “re‐biasing” mechanism included in the algorithm.

• Periodic update – in this approach, a usage profile of the entire library is kept over a fixed period of time. At the end of the period, the sorted list of popular programs is compared against those residing in the cache and any differences are corrected by transferring, in popularity order, newly‐popular programs to the cache as quickly as possible while deleting newly‐cold programs. The fixed period for this update is programmable, but for this analysis a period of one day is used, so this cache update mechanism is called “Day‐cache”.

The chart below shows effective cache hit rates, reflecting peak cache‐miss streaming traffic, for the three caching methods across three different cache sizes. This chart reflects results from a simulation model driven with log data from a typical real‐world VOD site. The content library size is 5400 hours.

5

75%

80%

85%

90%

95%

100%

1,000 2,000 3,000

cache size (hours of content)

edge cache: hit rates

lru

lfu

day‐cache

Studying only this chart, one is tempted to conclude that LRU and LFU are about the same, and both are far superior to the day‐cache approach. However, it is important to remember that a key objective of the cache is to reduce network load, not necessarily to achieve high cache hit‐rates. Other side effects of a cache replacement algorithm besides miss‐rate can impact the resulting load on the network. It is, therefore, more relevant to measure and compare actual network loads for these three approaches rather than just their hit‐rates. This will be illustrated in the simulation results and analysis in Chapter 2: Optimizing Network Behavior – further below.

Other comparisons:

• Algorithm simplicity ‐ LRU is the simplest of the three replacement algorithms; day‐cache is quite straight‐forward; LFU is more complex.

• Management ‐ LRU can be completely managed locally; all required content state is internal to the cache, since every cache miss becomes a cache insertion; day‐cache is managed completely external to the edge server, making it the simplest for the edge server; LFU requires maintaining some global content state (counts) but can be managed locally, globally, or in combination.

• More comparisons are explored in Chapter 2.

Optimizing Caching at the Edge Site Probably the first and most fundamental driver for establishing an edge site configuration is the cache model, or cache streaming profile – the dynamic interaction of streaming sessions with the content library. The cache streaming profile defines the cache efficiency in terms of the highest expected miss traffic relative to the peak streaming rate:

effective‐cache‐hit‐rate = 1 – (maximum‐miss‐rate / maximum‐streaming‐rate)

The following data is extracted from simulations that were driven by real‐world field data taken in July 2010 from a major cable operator’s VOD service in a metropolitan region (results were verified by

6

comparing to a second, totally different and independent deployment with a similar‐sized cache). Peak streaming demand was over 41 Gbps sourced from a 10,000‐hour library, serving 1881 service groups. These service groups could naturally map to 5, 10, or 20 edge locations if desired, which is the basis of some of the analysis that follows. For these runs, the simulator was configured to use LRU pull‐through.

0

2,000

4,000

6,000

8,000

10,000

12,000

2000 3000 5000

peak m

iss traffic

(Mbp

s)

cache size, in hours

network b/w, for 3 cache sizes

Choosing an appropriate cache size – Upon inspection of the above graph, a cache of 2000 hours seems too small to be very effective (peak miss transfer rate exceeds a 10 Gb pipe). The jump to 3000 hours makes a sizable improvement in network load while a further jump to 5000 hours makes another huge dent in required network bandwidth. These bandwidths can all potentially be improved by applying the network optimizations examined in Chapter 2. Keeping that in mind, 3000 and 5000 hour cache sizes both merit continued consideration, although 5000 hours might seem a little large since it is half the size of the library. Appropriate tradeoffs should be made between storage and network costs.

Identifying potential edge configurations – The streaming load that drives the simulations comes from a fairly large metropolitan area with a central (regional) site and many outlying areas supporting a total of 1800 service groups. It is conceivable to spread these service groups among 5, 10, or 20 edge sites for example, each edge site including streaming server(s) with cache storage. Each site would typically source a fraction of the total streaming load corresponding to its fraction of the total number of sites. Based on the simulation results above, one would tentatively configure each server with 3000‐5000 hours of cache.

Determining heterogeneity across edge caches – Given multiple independent edge sites serving exclusive service groups, it can be expected that although much of what is popularly viewed will be similar across all the sites, there will be some variety in viewing profiles such that the content residing in the various edge caches will include some unique titles not shared by all. It could be speculated that this heterogeneity of content could lead to a reduction in total miss traffic over the network since the sites

7

are not sharing a single cache but are free to populate their individual caches according to more localized viewing profiles. One might expect the “effective” cache size of the aggregate caches to be somewhat larger than a single one of the caches. In a simulation of this scenario, the amount of unique content found across all the edge sites during peak streaming periods was measured for each physical cache size (2000, 3000, 5000 hours), as shown here:

‐

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

1 edge 5 edges 10 edges 20 edges

unique

aggregate con

tent, in ho

urs

unique cached content, for 'n' edge caches

2000 hrs

3000 hrs

5000 hrs

The heterogeneity factor for these cases is quite large. However, the total miss traffic across the network from these multiple edge sites does not show a sizable effect from the heterogeneity factor. The graph below shows that content transfers in response to cache misses are just as high for a configuration of multiple independent edge caches as they are for a single edge cache serving all service groups.

‐

2,000

4,000

6,000

8,000

10,000

12,000

14,000


peak m

iss traffic

network b/w, for 'n' edges, 3 cache sizes

2000 hrs

3000 hrs

5000 hrs

8

While the aggregate unique content across the edge sites was larger than the nominal cache size, the unique content at any individual site was not available for sharing by any other edge site, thus negating the heterogeneity effect to a great degree. However, heterogeneity does have a major impact on servers with shared or overlapping service group affinities.

Affinities and sharing – There are situations in which it is highly advantageous for multiple streaming servers to share overlapping access to the same service groups, i.e. service group affinity. Three of these possible affinity scenarios are:

1) Multiple servers make up an edge site for reasons of scaling capacities and/or redundancy. 2) Multiple edge sites have some service group overlaps. 3) A large head‐end or regional site shares service group accessibility with nearby peer sites or

satellite edge locations, and stream‐around capability is in place.

In any of these situations, the effectiveness of the servers’ aggregate cache storage can be greatly enhanced by ensuring that the content across those caches is as heterogeneous as possible. In this case, the caches’ common access to one another’s service groups makes their heterogeneity very powerful at creating an effective cache size that is much larger than any single cache, i.e. virtually global.

Simulation results reflected in the two graphs below show the effectiveness of using affinity opportunities to advantage. Note that the no‐shared‐affinity case (affinity=1) and the paired affinity case (affinity=2) were both simulated with individual caches of 3K hours each, while the affinity group of 5 was simulated with individual caches of only 1K hours. Even so, the affinity‐5 case still yielded significant additional miss‐rate reduction.

‐

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

affinity=1, 3Kcache affinity=2, 3Kcache affinity=5, 1Kcache

aggregate un

ique

con

tent, h

rs

Affinity sharing: effective cache size, 10 regions

9

‐

1,000

2,000

3,000

4,000

5,000

6,000

7,000

affinity=1, 3Kcache affinity=2, 3Kcache affinity=5, 1Kcache

peak network load

, Mbp

s

Affinity sharing: peak miss traffic, 10 regions

The effective cache size of an affinity group of edge caches increases with the group membership, reflecting a significant level of content heterogeneity across those caches. This effective cache size is an important factor in provisioning storage and network bandwidth in a tiered system.

Chapter 2: Network Efficiencies Simplistically, there are two kinds of streams – those transmitting “hot” content and those transmitting “cold” content. Hot streams are sourced by caches, while cold streams have to be delivered across an inter‐tier network. This chapter is about optimizing the efficiency of cold stream delivery over networks.

When the cache misses Streamaround or Pullthru? Given a tiered architecture, it follows that some of the stream requests will be targeted at video content that is not present in the cache of the corresponding edge server. A mechanism must be provided to expeditiously deliver this piece of “cold” or “long‐tail” content to the subscriber, from a higher‐tier server that has a copy (e.g. from the central library, the regional cache, etc.). There are two ways to do this – either a higher‐tier server streams the program directly to the subscriber, “streaming‐around” the edge server over a managed network that is switched into the delivery network at the edge; or the edge server “pulls‐through” the program content from the central library server and streams it out itself, alongside the streams sourced from its cache. A term that is often heard is “Stream‐thru”, which is generally just a form of what we are here calling “pull‐through” but with some vendor‐specific differences.

10

Central Library

Content delivery engine

Edge Cache

Streaming engine

switch

Central Server

Edge Server

Pull-Through

Staging engine & buffers

Pullthrough The pull‐thru approach sources all streams from the caching edge server, so programs requested for streaming, but not resident in the cache, must be “pulled” into the edge server from the central library, over an intervening network. See the block diagram above. A separate streaming network between the center and the edge is not required. Since the pulled data is not “streamed” by the central server nor paced in any formal way, the edge server must pull it “faster‐than‐real‐time” (FTRT) in order to ensure timely and well‐paced streaming of the program to the subscriber. If the program associated with the pulled‐through stream is deemed to now be cacheable, it can be inserted into the cache as it is being pulled‐through. Many streaming sessions deliver only a portion of the program content. If the cache (or the pull‐through logic) is title‐based, all remaining portions of the program must also be pulled‐in to complete the cache insertion. Pull‐through network optimizations related to FTRT pacing and block‐based approaches will be examined in depth further below.

Streamaround In the stream‐around approach, the caching edge server streams only those programs already stored in its cache. See the block diagram above. Programs not stored in the edge server’s cache are streamed directly from the central library server and switched into the streaming transport infrastructure at the

11

edge, in effect streaming “around” the edge server. The central server must, therefore, have streaming capability with capacity to cover the worst‐case cache‐miss streaming load. In addition, the central server must be ready to deliver updates to the edge server’s cache when requested. The cache will request updates according to its replacement algorithm and policies. Once the cache logic determines that a given title from the library should be stored at the edge, that title is transmitted as quickly as possible from the central library server to the edge server’s cache, over an intervening content distribution network (call this “churn” traffic to distinguish it from streaming traffic). Since the central server is fully capable of streaming a program before it is fully available in the edge cache, there is no strict minimum transmission rate required of any cache update (churn) transmission, and thus no “faster‐than‐real‐time” (FTRT) requirement as there is for pull‐through. The content loading logic at the edge server and the content delivery engine at the central server are accordingly simpler than for the pull‐through model. In summary, the stream‐around model entails network support, between the central and edge servers, for two types of traffic: 1) tightly‐paced direct streaming and 2) unrestricted loosely‐paced cache churn traffic. Depending on the network resources, usage profiles, and management policies for a given deployment, these two traffic types can be supported with separate and independent networks or treated as logically separate traffics across a combined physical network (the two clouds in the diagram become one).

Comparing Network Loads Since one of the primary objectives of a tiered VOD architecture is to reduce the network throughput loads, it will be helpful to compare the network requirements of stream‐around versus pull‐through. In the interest of achieving an apples‐to‐apples comparison, a single cache replacement mechanism will be applied, initially, to both methods, across a range of cache sizes.

Since LRU turns every “miss” into a cache insertion, it is a natural candidate for pull‐through but less‐so for stream‐around, since every miss would invoke a transfer of the program to the cache anyway in addition to the stream‐around.

Conversely, the periodic‐update approach (“day‐cache”) is an excellent candidate for stream‐around but not so for pull‐through, since its content update traffic is remarkably un‐dynamic and small relative to the miss traffic.

Fortunately, the LFU algorithm offers a good balance of being 1) dynamic in updating the cache while 2) treating miss events separately from cache insertion events. Therefore, LFU replacement is chosen as likely to best illuminate the stream‐around versus pull‐through differences.

(The cache replacement mechanisms – LRU, periodic‐update, and LFU – were described in Chapter 1: Optimizing Cache Behavior).

Simulations Data – analysis is based on simulations driven by two independent sets of session logs and asset directories from the field:

12

1) “Dataset A” ‐ 14 days of log data (August 2008) from a 5400‐hour, 5700‐stream deployment owned by a domestic MSO, tagged “Site A”;

2) “Dataset B” ‐ 7 days of log data (April 2010) from a 7100‐hour, 4900‐stream deployment owned by a different domestic MSO, tagged “Site B”.

These two samples were taken two years apart, have very different content libraries, and serve widely disparate geographies and dissimilar demographics, yet the simulation results were essentially the same.

Simulation and Analysis – The log data was used to drive simulators modeling a tiered architecture, i.e. a central server connected to a caching edge server. The analysis compares pull‐through and stream‐around architectures, both (initially) using an LFU cache replacement algorithm. A faster‐than‐real‐time multiplier of 1.2 and a trick‐play overhead of 20% are applied to pull‐through traffic and cache‐update traffic.

The chart below shows peak network loads (in Gbps) for the network connecting the higher‐tier server to the edge server; these loads include both streaming and cache update (“churn”) components for the two methods – pull‐through and stream‐around ‐ across three cache sizes (1000 hours, 2000 hours, and 3000 hours). The 3000‐hour cache is probably beyond a practical size given that it begins to approach the size of the library itself (5400+ hours), but it serves to illuminate the point of diminishing returns. When provisioning cache, storage and network costs should both be carefully considered and traded‐off appropriately. When studying the chart below to make those tradeoffs, keep in mind that with no edge caching at all, the streaming load across this network would be over 21 Gbps (5700 streams).

0.0

1.0

2.0

3.0

4.0

5.0

6.0

1000 2000 3000

peak b/w

(Gbp

s)

cache size (hrs of content)

Edge Cache: inter‐tier network load

pull‐thru lfu

strm‐arnd lfu

LFU It is apparent from this data that, at least for an LFU replacement approach, stream‐around and pull‐through put similar loads on the inter‐tier network. Stream‐around is slightly better for smaller caches, while pull‐through is marginally better for mid‐ to large‐size caches. So, in terms of total network

13

bandwidth, it doesn’t make a huge difference whether a VOD system is deployed using stream‐around or pull‐through. This is not surprising, since the programs for all the stream requests that “missed” the cache have to be transmitted to the edge regardless of whether it’s done stream‐around or pull‐through; and a program that is newly inserted into the cache has to be transported from the central library to the edge cache, regardless of stream‐around or pull‐through. The major differences, which apparently tend to cancel one another, are:

• When a “miss” causes a title to be both streamed and inserted into the cache, both operations are accomplished with one full‐title pull‐through, but stream‐around involves two separate network operations in this case – the stream‐around and the cache load (although the stream‐around could comprise less than a full title if the user stops the session early, which happens frequently).

• Because of the FTRT and trick‐play overhead and associated buffering, pull‐through always uses a little more bandwidth for a given stream.

Recall that when LFU was selected for the network loading analysis, it was presumed that 1) LRU is particularly well‐suited for pull‐through and 2) periodic‐update (Day‐cache) is particularly well‐suited for stream‐around. It should therefore be instructive to see how network loads would be affected by these algorithms. The following chart compares all four combinations (including the two already shown) – pull‐through LRU, pull‐through LFU, stream‐around LFU, and stream‐around Day‐cache – as simulated for both sites.

0.01.02.03.04.05.06.07.08.09.0

pull‐thru lru

pull‐thru lfu

strm

‐arnd lfu

s‐a day‐cache

pull‐thru lru

pull‐thru lfu

strm

‐arnd lfu

s‐a day‐cache

pull‐thru lru

pull‐thru lfu

strm

‐arnd lfu

s‐a day‐cache

1K‐hour cache 2K‐hour cache 3K‐hour cache

peak b/w

(Gbp

s)

edge‐cache: inter‐tier network load

LRU Note that pull‐through LRU is very inefficient for the small (1000‐hr) cache, yet compares favorably with LFU for mid‐ to large‐size caches. Although LRU exhibits the same cache hit‐rate as LFU at all three cache sizes (see cache replacement hit‐rate data in Chapter 1: Optimizing Cache Behavior), its network load for the 1k‐hour cache is much higher than for LFU. Pull‐thru LRU pulls through a full title to update the cache for every miss, while pull‐thru LFU pulls through the full title only on cache updates. For a small cache, LFU cache updates are significantly less‐frequent than misses (high play‐counts are

14

required to reach the LFU threshold of the cache) but for larger caches, cache updates and misses are close to the same (a single play of a title often results in a cache insertion). Hence the big LRU/LFU network load disparity for the small cache, yet near‐equal results for the two larger caches.

Daycache Also note that day‐cache is the most network‐efficient approach at the small cache size, even though its cache hit‐rate is significantly inferior to the other replacement algorithms. Every miss results in a stream‐around, so streaming network traffic is high, but a cache update occurs only once a day so churn traffic is very low. The more dynamic algorithms (LRU & LFU) tend to move titles in and out of cache many times a day, so their churn traffic is always much higher than that of the day‐cache algorithm. However, as the cache size increases, the hit‐rates for LRU and LFU continue to improve significantly while that of the day‐cache improves much more gradually. Consequently, the day‐cache network load decreases marginally while the LRU and LFU network loads drop substantially. The stagnation of the day‐cache hit‐rate across increasing cache size can be attributed to its lack of dynamic response. The upper 1k hours of popular content may change little from day‐to‐day, but the next lower levels of popularity – to 2k or 3k hours – will change much more dynamically, and so the more dynamic replacement algorithms will overtake the day‐cache in cache effectiveness as the cache size grows. Implementing an update period shorter than a day (e.g. “hour‐cache”) could improve the results, but a managed update at short intervals could prove impractical (see “future investigation” at the end of this report). Keep in mind that the churn traffic (cache updates) for the day‐cache is much lower than for the other two approaches. Because of this lower and simpler ingest load, a day‐cache edge server is inherently less expensive and of higher streaming capacity.

Note that pull‐thru LRU constitutes an undesirable outlier for small caches and stream‐around Day‐cache constitutes an undesirable outlier for larger caches. LFU gives consistently good results across the spectrum.

Studying network characteristics to appropriately apply streamaround or pullthrough There are three key characteristics of the tiered VOD architecture that will be examined here: 1) network bandwidth, 2) network quality, and 3) overall simplicity.

The empirical results achieved above are sufficient to characterize network bandwidth loads, but they are not alone sufficient to make judgments about the network’s QoS requirements or about architectural simplicity. Therefore, some factual and logical observations about the differences among the choices and how they affect the architecture are made here.

Network bandwidth – it has been demonstrated above that the total throughput loads on the inter‐tier network of a 2‐tier system are effectively the same for pull‐through and stream‐around. Network bandwidth has less to do with stream‐around vs. pull‐through than it does with the underlying cache replacement algorithm.

LRU is significantly less network‐efficient than other methods for smaller caches yet increasingly efficient for medium to large caches; while a day‐cache is relatively more efficient for smaller caches yet relatively inefficient for larger caches. Before choosing an algorithm based on cache size, however, one should cautiously consider that the small vs. medium cache boundary can be evasive, depending on

15

library size and usage profile, both of which change over time and across deployments. A cache considered to be medium‐sized today may behave more like a small cache a year from now. Also keep in mind that the LRU algorithm, because of its aggressive cache insertion policy, reacts to abnormal or short‐lived events by overly thrashing (“churning”) the cache. The only cache replacement algorithm (of the three studied) that exhibits both stability and consistently good network efficiency as the cache size grows is LFU.

Network quality – stream‐around and pull‐through place different requirements on the traffic‐shaping and QoS policies of the inter‐tier network.

Stream‐around – The cache update traffic to support the stream‐around model consists of full‐title transfers that need not be closely paced or necessarily tightly dead‐lined. However, to execute the direct streaming function of stream‐around, the inter‐tier network must support a strictly‐paced streaming traffic without risk of significant congestion or interruption, just like the streaming network that it will “switch‐into” at the edge. The network’s streaming traffic must be well‐managed but, because of its closely‐paced fixed‐size packets, will consequently be well‐behaved. In fact, if the network capacity headroom is sufficiently generous, one could argue that the well‐mannered behavior of the streaming traffic is conducive to merging other fairly‐well‐behaved critical traffic on the same network, for example cache updates. If additional traffic is allowed that is not well‐controlled or appropriately limited, however, the network cannot be used reliably for direct streaming traffic.

Pull‐through ‐ One of the arguments in favor of a pull‐thru architecture is the expectation that the intervening network can be less‐tightly managed because it doesn’t have to support direct streaming to the edge. The intervening network is simply part of the content delivery system/network, and a separate streaming network between the center and the edge is not required. However, the demands on the inter‐tier network are not as relaxed as one might hope. Since the pulled data is not “streamed” by the central server nor paced in any formal way, the edge server must pull it “faster‐than‐real‐time” in order to ensure timely and well‐paced streaming of the program to the subscriber. While the goal may be to get as close as possible to “just‐in‐time” delivery of this content from the center, the non‐uniform, non‐paced behavior of the central server and the overall network traffic will cause uneven transmit rates which the edge server must absorb with initial latency and buffers. In addition, tag, index, and trick‐play files must be pulled in (or regenerated in the server) along with the video content, which also contribute to the faster‐than‐real‐time (“FTRT”) multiplier. The initial latency, the buffer depth, and the faster‐than‐real‐time (“FTRT”) multiplier are all sensitive to the worst‐case behavior expected of the central server and the intervening network with its QoS policies. Any trick‐play operations, chapter jumps, or other non‐linear behavior of the stream will add to the uncertainty and complexity of the pull‐through logic and the real‐time demands on the network. There is a real risk of uncertainty in the resulting streaming quality. Extra bandwidth will generally need to be provisioned for this kind of traffic – it’s reasonable to rate usable bandwidth at 50‐70% of line rate, depending on various factors. In the end, to maintain an acceptable level of video quality for the subscriber, the inter‐tier network could very well require not only additional bandwidth, but a level of quality that is close to the quality and isolation of a managed streaming network anyway. This QoS level could be reduced somewhat if a lower standard of quality can be accepted for “cold” (i.e. not in the edge cache) content. This would seem a dangerous policy,

16

however, even if the cache miss rate is as small as 5% for example. In such a situation, one out of 20 streams could be at risk of visible degradation. Promising methods for optimizing the network behavior of pull‐through are examined later in this chapter.

Overall simplicity – architectural choices about pull‐through vs. stream‐around have significant impacts on the overall simplicity/complexity of the solution, as well as the cost of server resources.

• Pull‐through does not require a streaming engine at the central library server, while stream‐around does.

• Pull‐through does not require strict pacing of direct‐streaming traffic across the inter‐tier network, while stream‐around does.

• Pull‐through may imply a higher level of complexity at the content delivery engine at the central server, to manage the FTRT demands and bit‐rate characteristics of many parallel pull‐through streams.

• Pull‐through implies that every cold‐title stream is moved across two network segments – once as it’s pulled across the inter‐tier network and then again as it’s paced onto the streaming network. Both of these functions comprise extra provisioning and capabilities at the network and at the edge server (more below).

• Stream‐around implies only one content source for the streaming pump at the edge server, while pull‐thru implies two different content sources for the streaming pump – the cache and the inter‐tier network. Transfers from these sources must be meshed – pull‐thru requires the edge server to manage the meshing of pull‐thru traffic with the pace of streaming, including:

o Latency and buffer management, o FTRT network pacing, and o Dealing with the impact of stream discontinuities and tag files, etc. on the pull‐thru

behavior (more on trick‐play later in this chapter).

• A significant and perhaps overriding advantage for stream‐around is that it allows operators to leverage existing h/w, s/w and architectures. They could bring deep content and hierarchy to market without any of the complexity of new s/w, h/w, integrations, and architectures.

CDN futures – As long as network facilities are sufficient to support stream‐around, this is the solution that will provide the best overall quality of viewing experience and offers the operators maximum quality control as well as allowing them to utilize already in place hardware and software. However, considering the possibility that tiered solutions will include reliance on long‐distance, highly‐shared, lightly‐managed network trunks, a reliable pull‐through capability will be essential, including a research effort for characterizing and improving its reliability.

Network intelligence; regulated traffic In any tiered architecture, content must frequently be delivered to an edge server, across a network, from a higher‐tier of storage. If the content is being pulled in order to stream it right away (pull‐through cache miss), there is an implied and obvious requirement that the network transmission must maintain a faster‐than‐real‐time rate (FTRT) in order to keep up with the streaming rate. Any characteristic bursts and stalls on the network have to be accommodated with initial load waiting (extra initial latency) and

17

buffering. The content delivery behavior is dependent on how deterministic the network is and how much one is sharing the network with unknown or poorly regulated agents.

There are various ways of regulating the data transmission across the network. The most straight‐forward approach is for the edge server to simply request the full title then receive the data as the entire contents of the program are transmitted as fast as the network can naturally make it happen. This approach runs the risk of creating a bursty, contentious network environment. Hopefully, the resulting transmissions are always fast enough to maintain the FTRT requirement. If the intent is to place the title into the cache as well as stream it, the transmitted data can simply be written to its allocated space in the cache as it is received, and the video pump can stream it from there at the desired streaming rate, decoupled from the content receive rate. If the title is not destined for cache insertion but only for streaming (i.e. the cache algorithm is not LRU), there must be enough buffering to fully decouple the rates. The buffer size would need to absorb the maximum difference in rates between the streaming rate and the network transmission rate for the longest possible transmission.

If the content that is being transmitted across the network is only intended for cache insertion (i. e. not to be streamed immediately, which happens only in a stream‐around scenario), there is no strict FTRT requirement and no need for rate‐decoupling buffers.

Another way of regulating the network transmission is to request the entire title, but at some capped FTRT rate, for example at 2.0x the streaming rate. This approach assumes some rate management intelligence at either or both ends of the network segment and a protocol that supports it (e.g. TCP or other rate‐regulated protocol). This approach will smooth out the traffic across the network, reducing the bursts and stalls as well as increasing its utilization and efficiency (beware of TCP retransmissions & windowing). It also relaxes somewhat the buffering requirements for an active stream pull‐through.

Keep in mind that almost all transfer networks (i.e. not CBR streaming networks) require some much higher capacity than the average rate. Typically, the maximum payload utilization is 50‐70%.

Block and titlebased pullthrough; further regulation The above‐mentioned approaches to regulating network transmissions were all title‐based, in that a full content title was always requested for delivery across the network. Other approaches are block‐based, in which fractional portions of the title (blocks) are individually requested and independently transmitted. Assume these blocks are all fixed‐size.

One block‐based approach is similar to the title‐based approach that caps transmission at a fixed FTRT rate. However, in the block‐based version, the network can be allowed to transmit at maximum available network rate. The edge server itself maintains the FTRT rate by requesting blocks at FTRT intervals, such that no additional rate management intelligence is required at the other end. The FTRT rate and program size determines maximum buffer depth.

Another block‐based approach is to request a burst of blocks at maximum available network rate, to fill a receive queue, and then to request additional blocks at network rates only as needed to maintain the

18

queue. This approach minimizes the requisite buffer depth, but transmission rate inconsistencies still have to be accommodated.

The cache itself can also be block‐based rather than title‐based. In a block‐based cache, individual blocks of a title are cached. The more popular portions of a title may thus be found in the cache while other portions are left out. Block‐based caches tend to be more efficient because of their finer‐grained use of the storage space, but they are also more complex and require more processing overhead to manage the large number of blocks and partial titles.

If the edge server is pull‐through with a block‐based cache, network transmissions invoked by a cache miss need only transmit those blocks that are actually streamed. It turns out that a large number of streaming sessions deliver only a fraction of the full title, so this is a significant network bandwidth optimization.

Stream‐around could also take advantage of block‐based cache to reduce transfers for cache updates. Otherwise, stream‐around invokes very few of the same problems and is less needful of any of the above approaches for either cache content load or for the stream‐around itself. Cache updates are not real‐time‐constrained, stream‐around operations only stream those portions of the content viewed by the user, and trick‐play is already taken care of by the streaming pump. Direct streaming on a network is generally UDP constant‐bit‐rate, so the traffic is inherently smooth and high utilizations are easily attained (>90%). Even when streaming is http with adjustable bit rates, the bandwidths should stack up efficiently to a relatively smooth, optimized utilization. For all streaming scenarios, the streaming network must be high‐quality and well‐controlled, however.

Simulations As explained in the introduction, all the simulations supporting this paper are driven by session‐by‐session play logs and asset directories taken from active VOD deployments in the field. The following simulations were driven with data from a deployment with a 10,000 hour library, 40 Gbps peak streaming, and 1800 service groups.

Several simulations were performed, to illustrate some of the network optimizations from the various approaches described above. The three graphs below show, at the same scale for the same 3‐day period of streaming sessions, a network‐rate (max 9 Gbps) title‐based pull‐through model, an FTRT‐capped title‐based pull‐through model, and an FTRT‐capped block‐based pull‐through model (cache is also block‐based in this model). Per‐transfer FTRT rate was set at 2.0x streaming bit‐rate. Cache size is 3000 hours and cache replacement is LRU for all three scenarios. As shown, the peak network utilization for the three are 9 Gbps, 5.2 Gbps, and 0.5 Gbps respectively, and improvements in network behavior are readily apparent.

19

Title‐based pull‐through‐fast:

Title‐based pull‐through with transfer cap at FTRT=2.0x:

Block‐based pull‐through with transfer cap at FTRT=2.0x; block‐based cache:

20

Trickplay When designing a pull‐through system, a significant challenge is responding to trick‐play operations and other discontinuities while pulling through content for a live stream, in such a way as to preserve optimum experience for the user. Solving this is rather tricky and fraught with risk. Several approaches are possible:

1) One can simply pull ahead as fast as possible, buffering all content, and respond to trick‐play events to the extent content is available in the buffers or cache. This is probably risky, depending as it does on a bursty and potentially contentious network, made so by this very model.

2) One can respond to trick‐play events dynamically by invoking trick‐play pull‐through intelligence at the edge server and/or the storage server to deliver only the video assets necessary to keep up with the trick‐play at the appropriate speed (e.g., use HTTP range requests and byte offsets to collect scattered blocks or i‐frames)

3) One can implement trick‐play by pulling video data in a scattered fashion (as needed) from manifest files of clips and video segments making up the title. Timely delivery of these could be tricky.

For most trick‐play approaches, the operation is handled dynamically using the normal video asset, perhaps with an assist from auxiliary index or view files. The auxiliary files for an asset being transferred due to a cache miss should be pulled in their entirety into the edge server as quickly as possible after the transfer starts. These tend to be very small files so they will transfer quickly.

Chapter 3: Applying Storage and Network Tradeoffs to Multitier Design Determining the appropriate characteristics and structure for the hierarchical elements of a VOD solution requires the anticipation and trading‐off of:

• VOD requirements – concurrent streaming, content library, content usage profiles, service topologies, demographics

• Storage base costs – hard drives, solid‐state media, enclosures, infrastructure, management software

• Storage complexity – content hierarchy, management, distribution

• Network base costs – infrastructure, bandwidth, software

• Network complexity – topology, QoS, management, troubleshooting

These parameters help drive the tiered structure of the solution – i.e., how many tiers; the location and size of library elements, caches, and streaming servers; network topology & capacities; etc. A lot of what drives all of this is the nature of an individual edge site and its relationship to the next‐higher‐tier, whether that is a central library server, a regional library or cache server, or another peer edge site.

Network and Storage tradeoffs – Tradeoffs can be made between the size of storage at each tier and the transport infrastructure between the tiers. Generally speaking, the more storage capacity applied at

21

the edge, the less transport bandwidth that should be needed to support streaming and content churn. The particular topology of a given deployment, combined with its content library size and expected demand profile, will determine the appropriate ranges of provisioning to consider when trading off edge storage resources with network bandwidth capacities. The appropriate mix of edge storage and network bandwidth will be chosen based on cost efficiency, reliability, and/or practical resource limitations (e.g. existing or budgeted transport infrastructure).

The graph below illustrates the tradeoff of network bandwidth requirements and storage requirements. One extreme is fully centralized storage (no cache or other storage at the edge); the other extreme is fully distributed (edge cache=full library of 10,000 hours); a hierarchical, tiered approach (central library + cached edge server) covers the range between the two extremes.

Extreme measures: fullycentralized – Fully‐centralized systems have a single (modulo redundancy) centralized (e.g. national or regional) library store, and no caches at the edge. The central server has to source and transmit all the streams for the subscribers at the edge, and the intervening network has to be provisioned and managed to support the full streaming bandwidth and pacing protocol (or stream‐through protocol if edges are buffered non‐cache streamers). Note in the chart above the full streaming load on the network when there are no edge servers (cache size = 0). This is a reasonable approach for some operators with well‐provisioned networks and a desire to keep operational complexity and storage costs to a minimum. Storage and streaming are readily scaled at the center, but upgrades in streaming capacities impinge directly on the network bandwidth and sooner or later force network upgrades.

0.00

10.00

20.00

30.00

40.00

50.00

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

center‐to‐ed

ge b/w

(Gbp

s)

Cache size (hours)

Network vs Storage ‐ 10,000‐hr library

b/w savings

total b/w

Extreme measures: fullydistributed – Fully‐distributed systems maintain complete copies of the library at every edge site. In effect, each edge site has a “cache” of the full library (minus any content irrelevant to that local site). These edge sites would be networked to a central (CDN?) library. Note from the chart above that if the full library is replicated at the edge site (“cache” size = 10,000 hours), the only transport bandwidth required from the center is that necessary to “churn” (update) the content library each day (new titles loaded into the library, some older titles retired). Increases in streaming demand

22

are easily met with purely local upgrades of streaming capacity at the edge site. Each site must, of course, be provisioned with the storage capacity for a full library, and any growth in the library size must be reflected at every site with a storage upgrade. Also, any increase in the rate of library churn would have to be reflected in additional network bandwidth. Finally, the prospect of populating and managing multiple, widely dispersed instances of a large library gives hesitation, although how much easier it may be to manage edge caches has not been well‐established either.

Obviously, a key concern about library growth, whether for a fully‐distributed model or just libraries in general, are storage costs of the (replicated?) library.

Sidebar: A look at Storage Costs – One way to get a perspective on the cost of storage, no matter where the library might be placed, is to project the capacity of a fixed number of drives over future years, as hard drive capacities increase. As an example, the graph below shows the content capacity of 72 high‐capacity Enterprise hard drives, starting with ~30,000 hours in early 2009 (in a RAID5 layout), using 1TB drives, then doubling every 18 months as higher‐capacity drives become (economically) available. Seventy‐two is an easily manageable quantity of drives that easily fits within 12 rack units or less, and is relatively inexpensive. It’s also reasonable to expect that the cost of this 72‐drive array and its infrastructure (including for fault resilience) will stay relatively constant (modulo inflation) across generations of disk drive introductions. The point is that content libraries of increasingly huge capacities can be accommodated with a small number of drives at nominal cost. One may not need to take extreme measures to architect around the prospect of a large future library, once one considers the advances in capacity that come as a natural consequence of Moore’s law and is willing to patiently match that curve. In terms of storage cost, the ideas of 1) growing a central library to immense proportions, 2) replicating a large library at several regional centers, or 3) replicating a large portion of the library at many edge sites, remain viable and practical options to trade off against other design factors, as long as we don’t get ahead of ourselves.

One must also fully consider the costs of alternative approaches. For example, edge caches built from large quantities of solid‐state storage can be very expensive. The point of those caches is to get much more throughput in a denser package by 1) investing in a smaller, faster (usually solid‐state) storage medium, and 2) taking on the complexities and risks inherent in making it an integral and efficient element of the storage hierarchy. As the size or even existence of this cache is considered, the costs and benefits of this investment and risk must be traded off against the costs and benefits of the network bandwidth the cache hopes to displace, and of other storage alternatives.

23

31,304

62,609

125,217

250,435

0

50,000

100,000

150,000

200,000

250,000

300,000

2009‐03 2010‐09 2012‐03 2013‐09

conten

t hou

rs

Content capacity ‐ 72 HDDs (RAID5)

(1TB)(2TB)

(4TB)

(8TB)

Of course, a full swap‐out of a library every 18 months in order to double capacity could be considered impractical and costly. A more practical approach might involve periodic incremental upgrades such as the following model.

The chart below shows how fast the storage capacity of an edge site could grow by adding a single array of 24 higher‐capacity enterprise hard drives every 18 months (exploiting HDD capacity doublings), and retiring the oldest array at every upgrade once a 72‐drive capacity (<= 12 rack units) is reached (i.e. every 24‐drive array is kept for a service lifetime of 4.5 years). If a deployment started with a single 24‐unit array of 1TB drives in 2009, to provision a library of 5,000–10,000 hours, and then followed the proposed upgrade schedule, the library would now be over 30,000 hours and would grow to over 70,000 in early 2012 and 140,000 in late 2013. This is just an illustration, of course, and larger libraries can be attained ahead of this schedule by simply adding an appropriate number of drive arrays. The lesson of this discussion is that periodic, incremental, low‐cost upgrades can yield very steep capacity growths. This model requires that a library server support independent, mixed‐capacity volumes and hassle‐free storage upgrades.

24

2009‐03 2010‐09 2012‐03 2013‐09

total (hrs): 10435 31304 73043 146087

020,00040,00060,00080,000100,000120,000140,000160,000

total con

tent storage

(hrs)

constant‐cost storage upgrades (24 HDDs every 18 mo.)

1TB/2TB

1TB

1TB/2TB/4TB

2TB/4TB/8TB

A Happy Medium – Hierarchical Storage and Caching – There are two main motivations for implementing an on‐demand service with a tiered, caching approach. First, the tiered architecture greatly reduces, compared to the fully‐centralized system, the network bandwidth requirements between the central library and the edge. Second, it reduces the total storage requirements, as compared to the fully‐distributed model.

As demonstrated in Chapter 1: Optimizing Cache Behavior ‐ even a relatively small cache can off‐load a tremendous amount of network bandwidth and latency, and further increases in cache size provide additional, though diminishing, network savings. Pulling back from a fully‐distributed implementation, an appropriate cache size can be chosen to optimally balance network and storage resources. The incremental costs and physical constraints of cache storage and network capacities must be traded‐off to find the (near‐) optimum location and balance of cache and network resources for a given tiered deployment.

Pushing regional cache out to the edge – If all the streaming demand for a region is serviced by a single regional cached server (i.e. there are no edge servers, the regional server streams directly to all edge sites) fed by a central library, it is equivalent to a single edge server servicing the entire region (see “1 edge” in the chart below). If the region’s same streaming load is pushed out instead to ‘n’ edges provisioned with the same sized cache that was at the regional hub, and these edges are fed directly from a central server (not through a regional cache, but perhaps through a regional network hub), the total network load between the central library and the edge tier is the same as it was for the central library to the regional cache.

25

‐

2,000

4,000

6,000

8,000

10,000

12,000

14,000


peak m

iss traffic

network b/w, for 'n' edges, 3 cache sizes

2000 hrs

3000 hrs

5000 hrs

Pulling edge caches in to the regional center – Note from the chart above (copied from Chapter 1) that a single edge cache, i.e. a regional cache, of 3000–5000 hours could displace all the caches of any number of edges (i.e. there would be no edge caches or servers, the regional center would stream directly to all edges) for a similar low network load to the central library. This means, in the case of five edge sites, only 5000 hours of storage need be provisioned rather than 5 x 5000 = 25000 hours, for example. This would work fine as long as sufficient streaming network bandwidth can be provided between the regional center and the edge sites. This illustrates the traditional tension and tradeoffs that used to be applied to “centralized library” vs. “fully distributed library” VOD deployments (see corresponding discussions above), which could now be applied similarly to “centralized regional cache” vs. “fully distributed edge cache”. It’s all about the trade‐off of storage and network resources as well as operational efficiencies.

Building a 3 tier deployment – The simulation results reflected in the chart above can also be used to provision a 3‐tier system consisting of a central library, a regional cache, and a collection of cached edge sites. For example, if five edge sites are deployed, each with a 3000‐hour cache, then a regional cache of at least 5000 hours would support the total effective cache size of the edge servers (see heterogeneity chart in Chapter 1), acting as a shared, global cache for all the connected edge sites, amplifying the overall hit rate as seen from the central server and center‐to‐region network. The regional cache only has to load a popular title once into its own cache for the many insertions of that same title across all the edge caches. However, keep in mind that the caches at the edge, if provisioned and used appropriately, greatly reduce the network load from the library on their own and source a very large portion of the streams directly from their caches.

Two tiers is plenty – For all but the largest of MSOs, it’s not clear that there is a lot of benefit to a caching middle tier that can’t be more easily and cheaply obtained by provisioning more network bandwidth from the center, increasing the size of the edge caches, applying some of the affinity principles discussed in Chapter 1, or simply making the regional center the bottom tier by removing the

26

edge caches and provisioning the region‐to‐edge network for full streaming to the edge (see “pulling edge caches in to the regional center” above). Again, it’s a storage vs. network vs. complexity tradeoff.

As CDN libraries grow, the portion of streams populated by rarely‐viewed long‐tail content will undoubtedly rise (at least for a while – see conjecture about a library profile asymptote in Chapter 1). These will tend to pollute caches and reduce their effectiveness. This can be mitigated somewhat with improved cache algorithms (transition from LRU to LFU or better), but in the end the best solution will be to stream these cold titles directly from the library so as not to pollute any caches. The deteriorating library profile has effectively made these caches look “small” and, as shown in Chapter 2, a periodic cache update (e.g. “day‐cache”) of only the truly popular content, pushed down to the edge from the library, is the most effective replacement approach for small caches. Also, as shown in other sections of this paper, stream‐around makes better utilization of the network resources when delivering cold (long‐tail) content.

There could also be merit to eliminating edge streaming servers in certain situations, placing a cached streaming server at a regional location instead to act as a shared source of popular content for all the edge sites. This server would be treated similarly to a cached edge server in that the central library server would push popular content to the regional cache and stream around the regional center to stream cold titles directly to the edges.

In summary, tiered VOD appears to reach maximum effectiveness at two tiers, not three. Where network quality and capacities are plenty, the high tier is a central library with capabilities to push down popular content to the lower tier’s caches while streaming cold titles around the lower tier directly to the edge streaming network. Where networks are less conducive to stream‐around, the lower tier caches will pull‐through from the central library, applying the optimizations of chapter 2 as appropriate to maximize the efficiency of the inter‐tier networks. The lower tier consists of caching servers, either at edge sites or at regional centers.

Summary and Further Research

Observations, Conclusions, Recommendations This paper examined many architectural options and tradeoffs, demonstrating that the challenges of achieving efficient and satisfying access to deep and varied content still have many open channels of exploration. Also, various possibilities exist for optimizing key elements of the solution space. Some of the more important results from this paper include:

Chapter 1:

• Content usage profiles are evolving further away from the 80/20 rule and some conjecture that while libraries continue to grow, profile flattening is approaching an asymptote.

• Groups of independent edge servers can be analyzed to obtain a heterogeneous content factor to reflect unique content across their aggregate caches.

27

28

• Sharing of service group affinities by multiple edge servers can be exploited to increase effective cache size and thus significantly increase cache efficiencies.

Chapter 2:

• Inter‐tier network loads are effectively the same for pull‐through and stream‐around, but these loads are much more differentiated when they operate in combination with certain cache replacement algorithms.

• There are significant tradeoffs and relative merits of pull‐through and stream‐around with respect to network efficiency, quality, and overall simplicity.

• Significant network efficiencies can be gained by application of rate regulation methods and block‐based transmissions in a pull‐through environment.

Chapter 3:

• Given a flexible approach to storage volume upgrades, rapid content library growth can be accommodated by an inexpensive incremental storage upgrade model.

• The regional center and the edge can profitably be seen as competitors for placement of the lowest cache tier; several tradeoffs can be made. Two‐tier solutions could be the most effective answer for most deployments.

Future investigation • Research a subscriber model that could help predict where/when the profile curves stop flattening.

• Investigate improvements to the periodic‐update cache replacement method by shortening the update period.

• Investigate other cache replacement algorithms in the context of VOD caching.

• Investigate other methods of heterogeneously aggregating edge caches.

• Investigate in more detail network rate‐regulation ideas and opportunities

• Improve on pull‐through trick‐play behavior and optimizations

• More modeling and analysis of 2‐tier vs. 3‐tier solutions

managed cdn

Documents