How do you manage configuration files across a wide array of games, played by millions of users and served out of multiple data centers?
Over the years Zynga has found that the right solution to this problem varies for each unique service and product. I’d like to present one of these solutions today: Apache ZooKeeper.
Apache ZooKeeper allows Zynga to update thousands of configuration files in under a second. Game servers at Zynga need to interact with various levels of persistent and volatile storage. To provide increased performance, game nodes talk directly to these storage nodes via IP addresses and domain names stored in a local configuration file.
When a storage node fails it needs to be replaced by a hot spare or replicated slave. This update needs to happen as soon as possible as it could affect a critical aspect of the user experience. As games and services grow, related storage pools typically need to grow in parallel. When this expansion occurs, every node that accesses that storage pool needs to update their configuration file at the same time. Failure to keep these configuration files in sync during any of these events could lead to severe data corruption.
Apache ZooKeeper is an open source project that enables a highly reliable distributed coordination ensemble (http://zookeeper.apache.org/). Designed for high read, low write systems, teams using ZooKeeper typically use it for tasks such as configuration management, leader election, presence protocols and/or group services.
ZooKeeper in its purest form is a key/value storage system that saves data objects (referred to as zNodes) into its synchronized ensemble. Clients create a socket connection to one of the servers in the ensemble and register a “watch” event against a particular zNode. When a change is made to that zNode, the ZooKeeper servers synchronize with each other and then notify each client so that appropriate downstream action can be taken.
Zynga has extended the basic client/server functionalities of ZooKeeper to allow our servers to take the data from a zNode and apply valuable business logic on top of it. This could be in the form of updating a configuration file, interacting with a system process, triggering a downstream event or even building tools for operations personnel.
ZooKeeper has proven itself at scale time and time again. In roughly a second, we can push out a configuration change to many thousands of subscribed client connections from a single ZooKeeper ensemble.
Critical business logic and validations then kick off to ensure that configurations were updated correctly and services were properly adjusted to the new data.
Are you using ZooKeeper or other tools to solve similar problems? Please share any thoughts or questions in the comments section below.