Clustered Akka, Building Akka 2.2 Today, Part 2

Last time we discussed some of the shortcomings of the current version of Akka. In this post we will see how, using existing open source frameworks, we can build Akka 2.2 functionality using Akka 2.0.

The first step towards a clustered system is cluster membership management. In a master/slave setup rather than a truly clustered system this is an easy task, new slave nodes start up and try to connect to the master node that is responsible for keeping track of work items and sending them out to slave nodes. The cluster, in this setup, is whoever the master thinks is in the cluster. However, in a system with no single point of failure, there is no physical machine that can be guaranteed to be available and therefor no easy way to for the cluster to agree on who’s in and who’s out.

Forming a shared view of which nodes are currently healthy and reachable in a cluster running on an unreliable and unpredictable network infrastructure is a research level problem, and even implementing a known algorithm for it is no easy task. Thankfully, we can borrow from the open source world and base at least the initial discovery phase on an existing library such as JGroups. This library can be used as a cluster node membership framework, solving the problem of creating a shared view of the cluster across nodes. By having each node in our Akka cluster start up and announce the network address and port where it can be reached via Akka, we know how to generate an address for any actor with a well known local path within the remote actor system and can use Akka for the rest of the communication. Akka clustering support step 1, done.

Knowing a list of available servers in a cluster is not enough, however. In any form of meaningful clustered system, we also need to coordinate work tasks across the currently running instances. By assuming that all servers in the cluster are capable of performing any service (though they may not all be doing the same thing at any given point in time), we reduce this problem to allocating n partitions in our problem space across m nodes in the cluster. The number n must be chosen depending on the specific problem, but for a system running a large number of independent but identical operations, such as serving web clients, n should be a number large enough to be distributed evenly across the m servers that are likely to be operating in the cluster.

As JGroups provides a list of servers known to be sorted the same way across the members in the cluster, the allocation of partitions to cluster nodes can be done independently in each node. As long as a deterministic algorithm is used (that may include aspects such as giving certain nodes a larger part of the partition space or a preference for a specific set of partitions) each node will come up with the same partition allocation map at “almost” the same time. When a node reaches the conclusion that it should be responsible for a partition, it fires up an Akka actor subsystem that can be reached from the rest of the cluster. Similarly, if a node find itself currently hosting a partition but reaching the conclusion that it shouldn’t, it shuts down the sub system representing that partition.

We now have a set of partitions running on a set of clusters, but how do they communicate between each other – what about location transparency? We archive this by resolving an actor reference using a combination of a well known path, representing the path of an actor relative to its partition, and a routing key. The routing key may be a string or a numeric value, regardless it will be reduced down to an integer and mapped to a partition out of the n available ones. The routing key may either be static, representing a singleton service that needs to live somewhere in the cluster, or can vary per request in order to for example route a request from an external user to the partition currently handling that particular user.

When an actor ref for a particular well known path and partition is resolved, a local actor is created responsible for sending messages on to wherever the partition is currently executing. The actor will be subscribed to changes broadcast by the cluster membership layer and immediately switch to sending to a new physical network path when a change is detected. In order to improve reliability (though delivery is still not guaranteed), a layer on top of the JGroups cluster membership view can be added that waits until a remote system is guaranteed to be running and reachable before messages are forwarded to it after a reallocation of partitions.

Although likely not as feature complete as the Akka clustering solution will one day be, the above has been designed in detail, implemented and tested in about a week, and does support dynamic cluster membership, true location transparency, work distribution and best effort delivery to remote systems with failover. In addition, restarting of stateful actors on new physical machines has been demonstrated by using write-through distributed caches for state changes. Though still excited about what Akka will be like in a years time, all of this can be achieved today with limited effort!