We at Zynga have been busy building a game server that can scale to support hundreds of millions of users. Although the industry has a decade of collective experience in deploying massive web applications that scale, new applications like social gaming continue to push the envelope in terms of the size and scope of data, number of users, write throughput, etc.
Typically most web applications (e-commerce, search or blogs) are read-heavy. In contrast, social games are write-heavy and depend on interactions between the user and the game state and also between users themselves. In a social game, the ratio of reads to writes could be as high as 1:1. This can be an incredibly challenging throughput to handle and has driven us to take a fresh approach to storage design in our game platform.
We’ve refined this approach over the course of two major iterations.
Iteration 1: A key-value store using Memcache and MySQL
Around the time when we launched FarmVille in 2009, we decided to store the game data in key-value format in a distributed cache built on memcache. This alleviated the database/disk latency. All writes and reads were delegated to a memcache layer with a separate queueing process that persisted data into a MySQL database. In order to better handle the load, memcache, MySQL and the queueing process were run on separate machines.
Distributed storage and hashing
We experimented with several hashing algorithms to distribute the keys uniformly across the memcache server clusters. We chose a simple and efficient hashing technique which, in our case, is a modulo selection algorithm instead of a more complicated ketama hashing. It provided for a fast server selection operation, an O(1) operation and a relatively good distribution of keys without hot spots.
We built in redundancy by providing one or more slaves to each MySQL master. If a master server were to fail, a slave was quickly promoted to master, thus reducing the impact of the outage.
Iteration 2: Integrated storage in a node: Membase
Although our first iteration was able to withstand the load of FarmVille, the largest social game of the time, we realized there was significant room for optimization. There were others in the industry, including Couchbase Inc., who were thinking just as hard about solving the storage requirements of a write-heavy web application. So, we partnered with developers at Couchbase in an open source environment to build a new storage solution.
The first problem we addressed with our first iteration was to reduce the number of moving parts and processes in it. The entire persistence and replication was moved up the stack and a single node with Memcache API + persistence was built: enter Membase.
A single node provided the memcached protocol compatibility to applications with an extension that supported persistence and replication.
Persistence and replication is natively supported in Membase. The Membase engine features persistence capabilities but with some unique qualities. In most memcached deployments, memcache is used to reduce read-latency. In Membase we optimize write throughput as well as reads.
With social games, some data may be updated hundreds or thousands of times per second. It’s not necessary to persist every possible mutation of this item, but it’s important to persist it on a regular basis. The persistence logic is controlled by two tunable parameters – minimum data age and queue capacity. By tuning these two parameters to what is appropriate for the application, we can set the safety margin for the items. Owing to this capability, we refer to the Membase engine as the ‘eventually persistent’ engine.
For replication we created a protocol atop the existing memcached binary protocol named TAP. TAP allows us to stream mutations from a storage node. With MySQL deployments, we’re accustomed to talking about replication in “seconds behind master,” but because of the simplification and efficiencies in TAP, our replication is almost always within milliseconds.
Scaling the storage layer
The very nature of a social game means it will quickly spread throughout the social graph. We need to be able to respond accordingly by scaling the storage layer without interruption.
In early deployments of Membase, we had a very simple approach. By relying on the hashing algorithm and our replication capabilities, we could simply double the size of the cluster by promoting existing slaves as masters, add the newly promoted nodes to the hashing and remove items that no longer hash to the old masters. This gave us very simple elasticity all without interrupting the application layer.
Membase also provides a more sophisticated elastic scaling technique using a concept called vbuckets. This is discussed extensively by Dustin Sallings in his blog.
While social games drive a slightly different workload than other large-scale workloads, by working on the Membase project in an open source environment, we have been able to simplify our operations and scale the workload of our data storage requirements. For more information on this project, visit the membase community site. Please also share any thoughts in the comments below.