Quik – An Elastic Object Store For Video

Why Yet Another Object Store ?

We didn’t think we needed one, either. But CEPH, Swift, CleverSafe, RIAK CS and numerous other object stores failed to meet the extraordinary requirements of cable television quality video-on-demand

Specs For High Definition Video On Demand for 22 M subscribers

Jitter 50 nano sec The Lower The Better
Bitrate 15 Mbps Higher is better. But not useful beyond a point
Bandwidth 12 Tera bits / sec Ever increasing demand. Never enough!

See the sidebar for an analogy that explains those specs

Product Vision

A No-Frills, Multi-site, High Performance, Easy To Manage Elastic Storage for Video

No-frills was key. Experience showed that more features and frills made solutions complicated and buggy, often making them unusable

No Frills

Its no good to be a jack of all trades and a master of none

It takes guts and articulation to evangelize a simple, no frills solution. Naysayers would say ‘but it can’t do this’. A good product manager needs patience, tact and articulation. Anticipate objections and concerns before you present your vision

Quik eliminates the following frills

  1. Arbitrary sized files: Our file sizes ranged from 1Gb to 120 GB. There was no need to support Tera Byte sized files
  2. Content Classes: All our video had practically the same QoS. No one video could be served poorly or slower than others
  3. Storage classes: We had Caches (CDN) that used privatized SSD over spinning disks based on required bandwidth. There was no need to replicate that feature in the Origin
  4. Consistency (See CAP theorem) is a non issue with Video files. Video files dont change. Even if we ingested a bad or wrong video file, there is a  system outside the origin that would tell the clients to use a different file. In fact, we need a system that invalidates multiple layers of caches. That function doesn’t belong to the origin. Hence, there was no need to enforce consistency

See framespeed.com for a full-featured, advanced capability object store that has entirely different value proposition. Also proudly my brainchild

Multi Site Performance

Every object store we tried had an inferior performance when a client downloaded content from a secondary site. For video on demand, the requirements are absolute. If you want a freeze-free, high quality video, you must meet the performance specs 100% of the time

Easy To Manage

When we tried other solutions, we had to hire consultants and experts who knew the particular object stores well. And when we realized an object store didn’t do well, we had to retrain the entire staff or hire new consultants to learn a new one. This was exhausting, frustrating and a waste of money

Solution: What if we could use existing tools like Linux file system commands and simple scripting with python or bash or any language of the team’s choice to manage the object store. Thats exactly what we did, with clever architecture. See below

Elastic

An object store is synonymous with elasticity. This was base functionality that was uncompromisable. For example, RIAK CS needs the same size & kind of servers, has restrictions of file sizes and is a nightmare to maintain. It uses an exotic functional programming language called Erlang. Erlang was great for telephones. It sucks for object stores

The Insight

We found Content is almost always exactly at a predictable location. Also disk fragmentation wasn’t an issue because there was always at least 20% more room for growth reasons. This all means that the control paths very rarely had to be executed.

The Solution

The solution had three parts to it

Balance

Balance is the crux of an object store. It ensures that there are enough copies of each file to withstand some failures. It places those copies in different servers in different failure zones. It makes sure disks are evenly loaded, so that one disk, server or data center is overcrowded.

Federation

Quik Federates this function to ever server. Ever server can figure where a file is supposed to be. If the file is misplaced, any server can initiate a transfer of that file to the right location, without hurting current sessions, or stepping on other servers that trying to move the same file.

The federation scheme used was an elegant yet simple proprietary algorithm

Control Plane

The same client-centric location scheme used in the CDN described was repurposed for Quik. One difference is that there is no ‘origin’ to fetch content from, since Quik is the origin. Even servers had ‘client’ processes – the organizer, replicator and tombstone reapers had to find the ideal location of the content just like the end clients

Data Plane

The Data path had to be fast and have high bandwidth with the least jitter possible. We shortened the data path dramatically as shown below

In other object stores, the long data paths increase jitter, create bottleneck and require high CPU

 

Leave a Reply

Your email address will not be published. Required fields are marked *