CDN For High Definition Video

Specs: 22 M subs, 12 Tbps @ peak, 10 ms max jitter, 20 ms to first byte

Product Vision & Solution

 

Low Cost, Short Time To Market

Solution: COTS hardware running open-source software

The legacy video CDN was built on proprietary hardware and software from blue-chip companies. It was expensive and took 10 years to deploy. Commercial Off The Shelf (COTS) hardware would keep costs down while scaling with Moore’s law. Open source software would get us a large community to support the software

We used Apache Traffic Server, RIAK KV, small amount custom code and an ingenious way to keep things simple

Seamless Elasticity

We needed the ability to add servers, upgrade drives and add new clusters of caches very often

Solution: Client-side Routing + Distributed Topology Using RIAK

We reduced the configuration to a very small set of parameters. Each server advertised a set of tuples

Server Configuration = array of { server-name, volume name, number of 128 GB buckets }

We distributed these configurations automatically using RIAK KV. A client could query ANY server to download the topology. It would hash the filename of the requested content and use the hash to pick the right { server-name, volume-name } pair

Fault Tolerant, of ANY number of servers

Since all content was available in one many origins, we could withstand any number of failed caches. Of course, our capacity would be impacted. ┬áBut then again, the number of servers deployed reflect peak mother’s day demand, so we could withstand some failures. Multiple failures are extremely rare

Solution: Uniform RESTFUL interfaces

If a cache failed, the client could always get content from the origin. Because the southbound interfaces of both the ATS cache and the origin are the same. a client to could failover to the origin if needed. Using consistent hashing, the client could also get the content from another ATS node

No Central Management System

Central systems are hard to keep in sync with reality

Solution: RIAK KV

Configuration is distributed to all nodes, eliminating a management system. This combined with the consistent hashing and blacklisting of broken nodes, we eliminated central management servers

Simple Location Architecture

Leave a Reply

Your email address will not be published. Required fields are marked *