As demand for efficient concurrency is increasing, the need for simple asynchronous programming is becoming more and more important. In the multithreaded world, the driving factors is the realization that threads are best used for scheduling units of computation rather than units of work from a business logic perspective. In the JavaScript world, even with JavaScript on the server (Node.js), asynchronous IO operations is the only way to achieve any form of multitasking, and in fact the lack of blocking IO operations is often quoted as a strength as it forces programmers to be good.

Asynchronous programming is tightly linked to the concept of continuations, registering a function with surrounding closure as a callback when some operation completes successfully or fails. Although it is possible to do this ad hoc, reinventing the wheel with a slightly different shape for each use, the pattern of providing callbacks in order to handle some future results has become common and uniform enough to deserve its own concept in most programming language. Indeed, the word Future is often used to describe exactly this concept.

So what is the benefit of having a Future? For many years, both .NET, JavaScript and the Java platform survived perfectly well without such a formalized concept. JavaScript used callback functions for AJAX, Java (when asynchronous operations were attempted at all) used various callback interfaces and .NET relied on an unholy mix of .NET events, callback functions, the Begin/End Pattern and the extremely misguided Event Based Asynchronous Pattern. All these approaches were sufficient for simple continuations, but lacked one important quality: composability.

Imagine writing the code for asynchronously downloading a file using callback style asynchronous programming. Not too bad, right? Depending on the language you may end up with some ugly anonymous types and explicit state, or you may be able to use anonymous functions and stateful closures – either way you pretty much know how to get it right. Now imagine downloading n files in parallel, or downloading a file from n locations using only the first one that completed, or retrying a download n times if it failed before giving up completely.  Like the look of the code you had to write?

Requirements for the Future

Having en entity explicitly representing the future allows you to solve patterns like the above once, correctly, then just applying the building blocks you need when you need them. Let us take a look at what other qualities we may want from such an entity before we dig into the way it has been implemented on the various platforms. Our ideal Future should strive to meet the following requirements:

  • Composable – as discussed above, this is the may reason for making the future a thing rather than a pattern. Composable means that our Future should be transformable using chaining, filtering, multiplexing and demultiplexing, with each operation generating a new Future just as generic as the first.
  • Minimal – following the Principle of least privelage, the Future should contain only operations required for its purpose – accessing and transforming a future result (or error). This means, as an example, that the action of completing the Future should be separate from the Future itself.
  • Thread safe – while somewhat irrelevant in a single threaded environment, Futures will be used for concurrent as well as asynchronous programming where multithreading is supported. Thread safe means that all consumers must see the same result of the Future, and that the order in which the Future gets a result and the result being accessed must not affect the semantics. Thread safe also holds for the producing side, there may be a race between for example setting the result of the Future and marking it as Cancelled, our implementation must ensure that, once set, the Future keeps its result.

Let us take a look at some implementations of Futures in .NET, JavaScript and Java and see how well they meet the above.

Futures in JavaScript:

Although there is no concept of a Future in the JavaScript standard library, jQuery – all JavaScript developers best friend, has introduced the concept of a Deferred. The use of Deferred in JavaScript, originally a pretty wrapper around the jQuery ajax call, has been made a lot more predominant in larger scale JavaScript applications. How well does the jQuery deferred meet our requirements?

  • Composable – the deferred allows for chaining and error handling by adding callbacks for successful results, failures or both. The pipe function allows for transformations of resulting values, either using a synchronous or asynchronous function. The built in when function allows for hassle free fork/join patterns where a number of asynchronous operations need to complete before another operation can be initiated. Conclusion: Excellent.
  • Minimal – Although the jQuery Deferred is a read/write object that allows for the result of the Future to be set by anyone, jQuery also provides a read only view called Promise that only allows for the result of the future to be observerd. Conclusion: Excellent.
  • Thread safe – While JavaScript is single threaded, there is still potentially a race between a Future being given a value and callbacks being registered. The jQuery Deferred guarantees that the callback will be invoked regardless of whether the result was set before or after the operation. Completing the Future more than once is also safe, only the last status will be used, as is subscribing multiple times. Conclusion: Excellent.
Futures in .NET:

Microsoft, true to form, introduced a completely new name for the Future in .NET 4, called a Task. Tasks can be created from operations running on a background thread using a task factory, or any asynchronous operation by using creating a task from a Task Completion Source. Tasks allow for continuation callbacks to be registered and allow for fine grained control over how (same thread or using a scheduler). In addition, it is possible to block until a result is available, optionally providing a timeout. The .NET task implementation has rich support for cancellation, decoupling the ability to trigger cancellation from both the task itself and the operation being cancelled. As for our requirements:

  • Composable – continuations for all the completion modes allow for sane composability. In addition, a Task provides a WaitHandle that allows for Select style wait operations over multiple tasks. Built in support allows for combining multiple tasks and waiting for one out of many tasks to complete. C# 5 takes Tasks to a completely different level with built in language support for writing asynchronous code in a traditional imperative style and having the compiler do the heavy lifting. Conclusion: Excellent.
  • Minimal – the .NET task does suffer from some compatibility issues, having to implement a WaitHandle and carry asynchronous state to support the older IAsyncResult interface used in the Begin/End pattern. Setting the result of the Task is nicely separated from consuming it, but Task also carries two really ugly additions – Start() and RunSynchronously(). While these make some sense for CPU bound operations meant to execute on a background thread, scheduling should be a separate concern from consuming a Future, and the methods are completely meaningless for asynchronous IO operations. Conclusion: Acceptable.
  • Thread safe – the Task is safe to use in a multithreaded environment, all threads are guaranteed to see the same final state and there is no dangerous race between adding continuations and generating a result. The Task Completion Source even provides TrySet method that both safely updates the state of the related Task but also reports back whether or not a state was accepted, allowing for tasks to be cancelled from a thread different from the one producing a result. Conclusion: Excellent.
Futures in Java
The situation in Java, or rather, the JVM, is somewhat more complicated. In addition to the Java Future interface (from java.util.concurrent), a number of frameworks and languages provide their own version. A quick look at the Java Future will tell us why:
  • Composable – not really, the Java Future only allows for blocking until a result is generated or polling for a result, neither which allows operations to compose gracefully. The Java Future is acceptable for fork/join parallelism where a number of results need to be collected at the end of an operation, but not much more, and definitely not asynchronous IO operations. Conclusion: Poor.
  • Minimal – although too minimal in some respects, the Java Future adds cancellation support. Cancellation of Futures can be done well, and has been in the .NET Task and CancellationToken, but allowing everyone with (what should be) a read only view of a future result to cancel the operation (affecting all other consumers), is a clear violation of the Separation of Concerns principle as well as the Principle of least privelage. Conclusion: Poor.
  • Thread safe – java.util.concurrent.Future is an interface and as such can’t guarantee correct behavior of an implementation. The implementation provided by the framework, FutureTask, is however thread safe when it comes to concurrent updates. Conclusion: Acceptable.
As the Java Future implementation is so obviously flawed, everyone has come up with their own version. Indeed, just looking at the Scala standard library we find scala.actors.Future that, although providing a composable representation of a future is weirdly tied into the scala actor concept. Another Scala incarnation is the horribly crippled unloved scala.parallel.Future. Scala based framework such as Twitter and Akka reinvents this wheel again and rolls their own. The only Future that stands a chance to get any wide spread acceptance in the JVM world is the future scala.concurrent.Future of the upcoming Scala 2.10. Let us take look at what it provides:
  • Composability – the new scala Future allows for blocking as well as continuation passing operations and has a rich set of functions for composing futures, treating Future as a monad. The Scala Traits concept allows for a rich set of higher order operations on futures accessible directly on the type without requiring more than a handful of operations to be explicitly implemented by a custom Future. While the traits will not be as nice to use from Java when implementing custom Futures, a Promise can be used to create one and (eventually) give it a result. Conclusion: Excellent.
  • Minimal – although providing a rich set of methods for operating on Futures, there is a clean separation of concern and the Future is a pure read only view of an eventual result. The new Scala Future has no concept of cancellation however, which means that cancellation will be built in a number of different flavours on top of this construct. Conclusion: Good.
  • Thread safe – when tasks are created from an accompanying Promise, the Future as well as the Promise itself is fully thread safe. The Promise allows for tryset operations that allow for multiple threads to attempt to set a status concurrently with predictable safe semantics. The Future being a Trait (or interface when exposed to Java) means however that the consumer of a Future cannot know with full certainty that the Future will follow sane multi threaded semantics. Conclusion: Good.
Conclusion

Futures, though apparently trivial, come in many different flavours and too often in not very tasty ones. JavaScript, ironically, has one of the purest implementations of a Future in the jQuery Deferred/Promise, and the Future of the JVM, if anything, is the upcoming scala.concurrent.Future.

 

Last time we discussed some of the shortcomings of the current version of Akka. In this post we will see how, using existing open source frameworks, we can build Akka 2.2 functionality using Akka 2.0.

The first step towards a clustered system is cluster membership management. In a master/slave setup rather than a truly clustered system this is an easy task, new slave nodes start up and try to connect to the master node that is responsible for keeping track of work items and sending them out to slave nodes. The cluster, in this setup, is whoever the master thinks is in the cluster. However, in a system with no single point of failure, there is no physical machine that can be guaranteed to be available and therefor no easy way to for the cluster to agree on who’s in and who’s out.

Forming a shared view of which nodes are currently healthy and reachable in a cluster running on an unreliable and unpredictable network infrastructure is a research level problem, and even implementing a known algorithm for it is no easy task. Thankfully, we can borrow from the open source world and base at least the initial discovery phase on an existing library such as JGroups. This library can be used as a cluster node membership framework, solving the problem of creating a shared view of the cluster across nodes. By having each node in our Akka cluster start up and announce the network address and port where it can be reached via Akka, we know how to generate an address for any actor with a well known local path within the remote actor system and can use Akka for the rest of the communication. Akka clustering support step 1, done.

Knowing a list of available servers in a cluster is not enough, however. In any form of meaningful clustered system, we also need to coordinate work tasks across the currently running instances. By assuming that all servers in the cluster are capable of performing any service (though they may not all be doing the same thing at any given point in time), we reduce this problem to allocating n partitions in our problem space across m nodes in the cluster. The number n must be chosen depending on the specific problem, but for a system running a large number of independent but identical operations, such as serving web clients, n should be a number large enough to be distributed evenly across the m servers that are likely to be operating in the cluster.

As JGroups provides a list of servers known to be sorted the same way across the members in the cluster, the allocation of partitions to cluster nodes can be done independently in each node. As long as a deterministic algorithm is used (that may include aspects such as giving certain nodes a larger part of the partition space or a preference for a specific set of partitions) each node will come up with the same partition allocation map at “almost” the same time. When a node reaches the conclusion that it should be responsible for a partition, it fires up an Akka actor subsystem that can be reached from the rest of the cluster. Similarly, if a node find itself currently hosting a partition but reaching the conclusion that it shouldn’t, it shuts down the sub system representing that partition.

We now have a set of partitions running on a set of clusters, but how do they communicate between each other – what about location transparency? We archive this by resolving an actor reference using a combination of a well known path, representing the path of an actor relative to its partition, and a routing key. The routing key may be a string or a numeric value, regardless it will be reduced down to an integer and mapped to a partition out of the n available ones. The routing key may either be static, representing a singleton service that needs to live somewhere in the cluster, or can vary per request in order to for example route a request from an external user to the partition currently handling that particular user.

When an actor ref for a particular well known path and partition is resolved, a local actor is created responsible for sending messages on to wherever the partition is currently executing. The actor will be subscribed to changes broadcast by the cluster membership layer and immediately switch to sending to a new physical network path when a change is detected. In order to improve reliability (though delivery is still not guaranteed), a layer on top of the JGroups cluster membership view can be added that waits until a remote system is guaranteed to be running and reachable before messages are forwarded to it after a reallocation of partitions.

Although likely not as feature complete as the Akka clustering solution will one day be, the above has been designed in detail, implemented and tested in about a week, and does support dynamic cluster membership, true location transparency, work distribution and best effort delivery to remote systems with failover. In addition, restarting of stateful actors on new physical machines has been demonstrated by using write-through distributed caches for state changes. Though still excited about what Akka will be like in a years time, all of this can be achieved today with limited effort!

 

This blog has already mentioned message passing framework such as Erlang and Akka a number of times and hinted at the benefits of using them. Using message passing frameworks is a paradigm that puts Actors and Messages at the center of the stage rather than, more traditionally, Objects and Messages. As quoted by the Akka web site, the main advantages compared to a traditional OOP style software architecture are:

  • Concurrency – Multiple actors can execute concurrently on the same machine without having to resort to traditional multithreading constructs such as locking and condition variables, mutable state is local to a (single threaded) actor.
  • Distributed Computing – By making everything asynchronous and based on immutable messages, there is no impedance mismatch when messages are sent to remote actors. Unlike OOP style remoting, the asynchronous and unreliable nature of network communication is already present in the model.
  • Error Handling – By treating exceptions as expected in a large scale long running system, and allowing for select subsystems of an application to be restarted, Erlang and Akka makes it possible to build self healing software systems.

While Akka is an impressive framework that does indeed deliver a lot of the above compared to traditional OOP based approaches to large systems programming, it is far from done yet, and in some aspects far from as done as the documentation suggests. For example, while Akka does have mature support for remoting, the documentation also claims to support location transparency. As noted in the Wikipedia page linked to, location transparency allows for resources to be used independent of both the user’s location and the resource location.

Location transparency is a very valuable quality in a distributed, fault tolerant system. For example, location transparency can allow an application to send a message to “an email sending service” or “the server currently responsible for user X”, regardless of which physical machine that currently happen to run that specific service. Akka, in version 2.0.2 being the current version at the time of writing, does not support location transparency – finding any resource is done using explicit network paths provided in either code or configuration.

True location transparency requires yet another feature currently missing from Akka – cluster membership support. In a system with a large number of nodes, or a system were uninterrupted reliable service is a priority, it must be possible to stop and start physical servers without having to restart and reconfigure the other members of the cluster. The Akka team is obviously aware of the importance of cluster membership support and are busy hacking away at it for the upcoming version, but for now we will have to live with Akka being a distributed rather than clustered system.

Next time, how to build Akka 2.2 in a week based on Akka 2.0.

 

A virtual function is a very important concept in Object Oriented Programming (OOP) language, and forms the main method for polymorphism. In short, virtual functions are the glue that allows for the behaviour of a method to change depending on the runtime type of a specific object instance, not just the static type known or declared at compile time.

As virtual functions require an additional level of indirection to be invoked, languages that focus on performance without compromises such as C++ default to non-virtual method functions and require you to opt in to using virtual functions using the C++ keyword virtual. At the time Java was released, the primary focus was reliability and ease of use over raw performance and as such virtual was the default, requiring methods to be explicitly marked as non-overridable using the keyword final. The reasoning behind this decision was, presumably, that since virtual functions are more powerful than non-virtual ones, they must also be “better” and should be used everywhere.

C# on the other hand decided to follow the C++ line rather than the Java one and make non-virtual the default. Not only that, Microsoft also came up with a whole family of keywords (new, abstract, virtual, override, and sealed) to do the job of the one keyword virtual in C++. It turns out that the language gurus at Microsoft did not do this just to be different from C++ and Java. In fact, the new keywords all address specific problems with the previous approaches.

OOP constructs is sometimes used by frameworks to allow for customisation and extensibility. The most complex such example in the .NET framework is the type hierarchies used by Windows Forms and Windows Presentation Foundation. Although there are many advantages to using in OOP this way, it also poses great challenges when it comes to versioning.

As an extreme example, if a class in such a framework introduces a new abstract function, then all existing sub classes will be incompatible with the new version of the framework without modification. A more likely scenario is an addition of a new function with the same name as an existing, accepting different parameter types. If such a function is called with parameter types that do not match the function arguments exactly, such as passing an int to a method accepting long, adding a new overload may make the method call ambiguous and break backwards compatibility. You would think, however, that adding a new function with a name never previously used in the framework cannot break compatibility. As we will see, that may not always be the case.

To illustrate the problem, we will use a Java snippet with some extremely poorly chosen method names. We will use a simplified version of a Stream class, supporting reading and writing from and to a resource such as a file or network connection. We have a base class Stream provided in a framework, and a special subclass TopSecretStream for reading and writing from a top secret storage mechanism.

class Stream
{
    public abstract String read();
    public abstract void write(String str);

    // Other useful functions implemented using the above abstract ones ...
}

class TopSecretStream extends Stream
{
    public String read() {
        // Reads from top secret storage.
    }

    public void write(final String str) {
        // Writes to top secret storage.
    }

    public void foo()
    {
        // Deletes the files and triggers the explosives.
    }
}

The TopSecretStream acts as a Stream, reading and writing text, but also has the very badly named method foo with some nasty side effects.

Now, what happens if in a future version of the framework someone decides to add an extra method to Stream:

class Stream
{
    public abstract String read();
    public abstract void write(String str);
    public void foo(); // Flush pending changes, called every 2 seconds

    // Other useful functions implemented using the above abstract ones ...
}

Disaster! Just by running the application against a new version of the framework, we have now accidentally deleted all the files and triggered the explosives! Although TopSecretStream never meant for Stream to call foo(), the unfortunate naming caused Java to confuse the two methods, interpreting the version in TopSecretStream as meant to override the first. Surely, this is not how it is meant to work?

C# is different, and allows for much more control over how methods are selected. Given classes Foo, Bar and Baz where Foo is the base class, Bar subclasses Foo and Baz subclasses Bar – each method declaration in Bar needs to specify both whether the method should be called when an instance of Bar is statically typed as Foo, and whether it should be possible for a method in Baz to be called instead of the one declared in Bar when an instance of Baz is statically typed as Bar. Let’s take a look at a few combinations of the C# keywords and what they do:

  • no modifiers – The method cannot be called when an instance of Bar is typed as Foo, Baz cannot change which method that gets called when an instance of Baz is typed as Bar.
  • new – As above, explicitly stated. In addition, the compiler will not complain of Foo already has a method of the same name
  • virtual – The method cannot be called when an instance of Bar is typed as Foo, but Baz can change which method that gets called when an instance of Baz is typed as Bar.
  • abstract – As virtual, but Baz, or a sub class of Baz, must provide an overriding implementation of the function.
  • override – Foo (or a base class of Foo) must already declare a method with the same name, marked as abstract, virtual or override but not sealed. This method will get called when an instance of Bar is typed as Foo. Baz can change which method that gets called when an instance of Baz is typed as Bar.
  • override sealed – As override, but Baz cannot change which method that gets called when an instance of Baz is typed as Bar.

In addition, the new keyword can be prepended to virtual and abstract to explicitly state that a declaration in a base class should be ignored, and silence any compiler warnings complaining about a name clash.

Armed with our new found understanding of the above keywords, what would our example above have looked like in C#?

class Stream
{
    public abstract String Read();
    public abstract void Write(String str);

    // Other useful functions implemented using the above abstract ones ...
}

class TopSecretStream extends Stream
{
    public override String Read(
    {
        // Reads from top secret storage.
    }

    public override void Write(String str)
    {
        // Writes to top secret storage.
    }

    public void Foo()
    {
        // Deletes the files and triggers the explosives.
    }
}

… and after adding the method to Stream:

class Stream
{
    public abstract String Read();
    public abstract void Write(String str);
    public virtual void Foo(); // Flush pending changes, called every 2 seconds

    // Other useful functions implemented using the above abstract ones ...
}

As Foo in TopSecretStream not marked with override, .NET knows not to confuse it with Foo in Stream. If we recompile our application against a new version of the framework, the compiler will warn about a name clash and we can chose to either accept it by adding the new keyword to Foo in TopSecretStream or, even better, renaming it to something more sensible!

 

In part 1 we discussed two reasons for why remote method calls have never been successfully hidden under the covers of normal OOP style programming: Remote method calls may be orders of magnitudes slower than local ones. They may also fail catastrophically without the calling system doing so. Alternatives such as Erlang, Akka and the MailboxProcessor in F# have popped up that make the potentially remote and asynchronous nature of these types of operations explicit using message passing. However, are such frameworks at all relevant for systems running on a single computer?

A “single computer” used to refer to a system with one execution unit (processor). Even though multi processor systems have been used for a long time in heavy duty servers, they have belonged to a specialist domain and usually used to serve a large number of users, rather than running a single application quickly. When the PC world went multi core, all that changed and programmers everywhere are forced to care. It is rare to find a computer with a single logical processor, high end Intel chips such as the i7 processor includes 6 cores and offers 12 logical execution units using hyperthreading. The term “single computer” will therefore have to be redefined. It can be seen as a system of processing units all connected to a shared, “fast” temporary storage (RAM memory). However, as we shall see, fast and shared may be competing goals.

Let us examine what the real cost of shared memory is. To avoid being fooled by optimisers, garbage collectors and other mechanisms that are usually our friends in high level languages, we will have to turn to C++ and even (*gasp*) assembler. To measure the cost of using shared memory, a very basic computation will be used: incrementing a counter 1,000,000,000 times, either using a single processor or using multiple cores concurrently. First, we need a simple profiling function:

void runWorkers(
    void (*worker)(uint32_t*, uint32_t),
    int concurrency,
    uint32_t iterations)
{
    uint32_t counter = 0;
    boost::timer::auto_cpu_timer t;
    vector threads(concurrency);
    for (int i = 0; i < concurrency; i++) {
        threads[i] = new boost::thread(worker, &counter, iterations);
    }
    for (int i = 0; i < concurrency; i++) {
        threads[i]->join();
        delete threads[i];
    }
    cout << "All done! Count is now " << counter << endl;;
}

(Don’t you just love C++ function pointer type declarations?)

What the above achieves is to take some function to execute across multiple threads. When started, the function will be passed the number of iterations to execute and a pointer to a location in memory to work on. Finally, it waits for each thread to complete and, using boost::auto_cpu_timer, reports timings for the whole sequence. Next, let’s throw some work at it.

void incrementShared(uint32_t *counter, uint32_t iterations)
{
    __asm {
        mov ecx, iterations // Count the iterations
        mov edx, counter    // Remember pointer to shared location
$loop:  inc DWORD PTR [edx] // In each iteration, increment variable in memory
        loop $loop          // Repeat until ECX is 0
    }
}

As promised, there’s assembler, in this case Intel style x86 assembler embedded into Visual C++. Ignoring the setup instruction, we end up with a tight loop that runs the inc (increment) operation the specified number of times, incrementing our variable in memory in each iteration. Let us take it for a spin, starting out in the traditional way with a single execution unit:

    cout << "Single worker, shared increment, 1000000000 iterations ..." << endl;
    runWorkers(incrementShared, 1, 1000000000);

which in the test environment used results in:

Single worker, shared increment, 1000000000 iterations ...
All done! Count is now 1000000000
 2.700709s wall, 2.558416s user + 0.000000s system = 2.558416s CPU (94.7%)

2.7 seconds to add one a billion times may be impressive in its own right, but with all those cores available to us, we should be able to do better. The test environment happens to have four cores, making four threads the ideal number for parallelising CPU bound work loads. We will split the one billion iterations into equal quarters:

    cout << "4 workers, shared increment, 250000000 iterations each ..." << endl;
    runWorkers(incrementShared, 4, 250000000);

which outputs:

4 workers, shared increment, 250000000 iterations each ...
All done! Count is now 401385749
 1.318384s wall, 3.541223s user + 0.000000s system = 3.541223s CPU (268.6%)

You may (or may not) be surprised by the results. We did get a speedup, but only roughly a doubling of the throughput, not quite the theoretical four times that we were hoping for. Worse, we appear to have lost our ability to count! That harmless “inc”, even though it is a single assembler instruction, does not execute atomically with regards to updating memory. Instead, the inner workings of the processor has pipelined it into a series of operations – reading from memory, incrementing the value and writing it back into memory. Store buffers, speculative execution, caching and the phase of the moon work together to ensure that each core “sometimes” increment the value in memory correctly but often overwrite it with a value that is already stale, cancelling the updates made by other cores.

To solve this problem, we need some way of protecting us from changes to our variable in memory while we increment and update it. For larger chunks of code, this may be done using monitors, mutexes, critical sections or similar threading primitives. The cheapest way in our case, however is to go straight to the processor, asking it to do it for us using the lock prefix. This is the underlying primitive of Interlocked.Increment in .NET, AtomicInteger in Java and std::atomic in C++11. Let’s try it on our problem:

void incrementSharedLocked(uint32_t *counter, uint32_t iterations)
{
    __asm {
        mov ecx, iterations      // Count the iterations
        mov edx, counter         // Remember pointer to shared location
$loop:  lock inc DWORD PTR [edx] // Increment, atomically this time
        loop $loop               // Repeat until ECX is 0
    }
}

Note that the only difference between this and the previous version is the lock prefix, simply telling the processor that we care about concurrent updates to the shared memory address. The output when running this, in the same style as above:

4 workers, locked increment, 250000000 iterations
All done! Count is now 1000000000
 77.033589s wall, 282.50221s user + 0.03120s system = 282.53341s CPU (366.8%)

We can count again, which is good, but look at the timings! What previously took a second now takes 77! Obviously that innocent looking lock prefix wasn’t that harmless after all if it could slow us down by a factor of close to a hundred, two orders of magnitude. Let’s try running the exact same thing but on a single thread a billion times instead of 4 threads with a quarter of a billion each:

Single worker, locked increment, 1000000000 iterations
All done! Count is now 1000000000
 10.354197s wall, 10.046464s user + 0.000000s system = 10.046464s CPU (97.0%)

Yes, the lock prefix does appear to cost us a bit, but now we’re seeing a slowdown of a factor four, not a hundred. Something else is going on here. Next time: What exactly is going on, why should you care about cache coherency and finally – how does this all relate to message passing?

 

Ask a developer to explain how programming works and they will probably show you examples of print statements and maths. Next, maybe conditional statements and loops. After the immediate basics people tend to venture into object oriented programming (OOP). No one can deny that OOP has been a successful idea in software development, and hundreds of thousands of successful software projects are around to prove it. However, even though many find it hard to think about software development without OOP, there are alternatives.

The basic idea of OOP is that logic and data should be bundled together in a unit called an object. This has the useful property that the exact structure of the data can be hidden behind an interface and that changes to this structure will only require changes to the logic in the same unit; allowing for safe refactorings and optimisations. The sometimes less fortunate side effect is that mutable data, state, can be passed around freely between parts of the system, and potentially used by multiple execution paths (threads and processes) concurrently.

The assumption that state can be accessed efficiently and reliably across a software system is a fallacy born out of the decades of single processor computing. At least for PCs, there was one processor and one type of  random-access memory (RAM). Even though operating systems were able to give a convincing experience of multitasking, there was no true concurrency. This assumption no longer holds. Computers are more connected than ever, it is rare to have a game or application that does not perform any type of network access and many depend on a connection to the internet to function at all. It is also rare to find a processor architecture with a single processing unit, even mobile phones are now multi core!

To deal with distributed systems, i.e. systems where the individual parts do not have direct access to shared data storage, it is inevitable to use a message passing, directly, or indirectly. Various attempts have been made to hide the fact that messages are being exchanged and allow the programmer to think in terms of objects, CORBA, DCOM, .NET Remoting, and Java RMI being a few examples. Though these framework provide a somewhat more familiar software development experience, and to some degree allow flexibility in how components in a software system are deployed physically, they have inherent problems.

By hiding the fact that an operation is (or may be) performed remotely and through massage passing under the hood, the programmer may write code that, though efficient if running as a local application, can run orders of magnitude slower when parts of the system are executing remotely. A harmless looking operation such as accessing a counter that normally would complete in microseconds if performed as a local method call may take milliseconds if running on another computer on a local network or even seconds if accessed over the Internet over a congested connection.

Another problem, closely related, is the fact that a remote operation may not complete at all! A system running on a single computer and accessing only local resources will usually fail or work as a whole. If the RAM memory or a execution unit fails, you don’t normally expect a software system to continue gracefully. If your network connection is temporarily disconnected or a remote machine stops functioning however, you would expect something more than the entire system crashing or freezing. Hiding the fact that some method calls may fail completely again makes it likely that the programmer will not handle such catastrophic errors.

To address these issues, some software systems opt to embrace the message passing paradigm rather than hiding it under a thin object oriented layer. Erlang is an example where message passing as well as partial catastrophic failure are first class concepts. Originally design and used for building telecom switches, it has proven itself as a platform that allows for massive scalability and reliability, even claiming somewhat ridiculous records of “nine nines” (99.9999999% availability). Other systems have been inspired by the same ideas, such as the MailboxProcessor shipped with F# and the Akka message passing framework running on the JVM.

In the next post, finally, we will see how, when a multi threaded application is running on a modern multi core processor, you are already using message passing under the hood and how it has the same type of implications for software architecture as it has for distributed systems.

Update May 16 You can now continue reading in part 2.

 

In 1995, Netscape announced a nifty little scripting language called JavaScript. Why JavaScript? Because Java was hip, cool and trendy (believe it or not). A language with no real connection to Java other than having a vaguely C-like syntax (you got to have those curly brackets), borrowed some of Java’s fame and glory by using its name.

JavaScript quickly found its place on the web and support was added in more and more web browsers. So far, it was used only for the lesser task of “scripting” however, not “real programming”. Validating an email entered in a form looked vaguely like an email address or that two passwords matched was the normal level of use in a web page, and if you wanted to be really fancy you could build something like a web chat! But no one would dream of doing anything so crazy as building an enterprise level office suite using it.

If you wanted to build an application on the web, rather than a web page, you could use Java Applets, though no more than a handful ever bothered. Flash appeared on the scene and became popular for animations and games, later on for streaming video. Microsoft launched Silverlight and offered a slightly more business focused touch to web applications. While all this was good, and used by many, Internet kept moving.

Google launched Gmail and were so unhappy with the performance of it that they had to build their own web browser, Google Chrome, beating all other browsers at the time in terms of JavaScript performance. Web 1.0 turned into web 2.0, and even if no one could agree on exactly what it is, it sure appeared to use a lot of JavaScript. People started talking about HTML5, the semantic web, software as a service. Microsoft confused themselves and everyone around them by saying that their vision for the web was HTML and JavaScript (but rushed to say that Silverlight is nice too when people started asking questions). Adobe said “no you take it” and washed their hands of Flex. It was clear that JavaScript was the future of the web.

Then something funny happened.

Not only was it enough that JavaScript now owned the web, it had to start digging into the realm of Applications. Not Web Applications, not “Rich Internet Applications”, but Applications. Google added its own framework for offline applications in Chrome, software projects like PhoneGap came on the scene and promised “native” apps for mobile devices built on HTML and JavaScript, without sacrificing the advantages of touch, tight platform integration and the app monetization system called the app store. Microsoft took it even one step further and announced that JavaScript would be a first class citizen in the new Windows 8 Metro world, with as much access to the Windows Runtime as .NET or C++. And Node.js took JavaScript to the server!

It is nice, though somewhat ironic, to see that JavaScript, the bastard half brother of the Java once promising “write once, run everywhere” is now much closer to fulfilling that promise than its big brother ever was. But looking at JavaScript as a language and the historical accidents leading up to its success one cannot help but wondering “could we not have done better?”.

 

Events, callbacks, observers, whatever you want to call them – the need to dynamically inject a block of executable code into another is a problem that arises in most software systems. Different programming languages and frameworks have addressed this differently. Java, for example, lacking anything like a function pointer does everything through callback interfaces. .NET on the other hand based it around the idea of a Delegate, a type safe function pointer that can be attached to an Event.

An Event looks very much like a property of a delegate type, and omitting the event keyword will actually work just fine. The only thing event does (when not using custom add and remove overrides) is to prevent anyone outside of the class to set the delegate directly. You are free to add and remove event handlers, but you can’t, say, remove all event handlers in one operation.

interface IMailbox
{
  event EventHandler MessageReceived;
}

class MailboxNotifier
{
  public MailboxNotifier(IMailbox mailbox)
  {
    mailbox.MessageReceived += HandleMessageReceived;
  }

  private void HandleMessageReceived(object sender, EventArgs e)
  {
    MessageBox.Show("You've got mail");
  }
}

Easy! Maybe a bit too easy. The number one cause of memory leaks in the .NET world is event handlers that have not been unregistered. For this specific case, we also most likely want a way of stopping this thing popping up annoying message boxes every time you get an email. To fix this, we unregister from the event, in the same style as when we added our handler.

class MailboxNotifier : IDisposable
{
  private readonly IMailbox mailbox;

  public MailboxNotifier(IMailbox mailbox)
  {
    this.mailbox = mailbox;
    mailbox.MessageReceived += HandleMessageReceived;
  }

  public void Dispose()
  {
    mailbox.MessageReceived -= HandleMessageReceived;
  }

  private void HandleMessageReceived(object sender, EventArgs e)
  {
    MessageBox.Show("You've got mail");
  }
}

Ignoring the fact that I cheated and didn’t properly implement the Dispose Pattern, this looks just fine doesn’t it? Well, it works, but we have been forced to keep track of the instance of the mailbox. This means that 1) even if nothing is using the mailbox, it will still not be garbage collected as long as we keep this reference and 2) some rather mundane typing just for the sake of being good citizens and cleaning up after ourselves.

It gets even worse though. .NET 2 (yes, that was ages ago) brought anonymous functions to the language, C# 3 further refined the syntax. Using the C# 3 lambda syntax, the example above can be rewritten as

class MailboxNotifier
{
  public MailboxNotifier(IMailbox mailbox)
  {
    mailbox.MessageReceived += (sender, e) => {
      MessageBox.Show("You've got mail");
    };
  }
}

Awesome, less is more and we are happy. But wait, if this is the starting point, how do we clean up after ourselves?

class MailboxNotifier : IDisposable
{
  private readonly IMailbox mailbox;

  public MailboxNotifier(IMailbox mailbox)
  {
    this.mailbox = mailbox;
    mailbox.MessageReceived += (sender, e) => {
      MessageBox.Show("You've got mail");
    };
  }

  public void Dispose()
  {
    mailbox.MessageReceived -= ???
  }
}

As the event handler is no longer named, we can’t unregister from our event! The only way to do this properly is to keep a reference to the event handler, effectively naming it and losing the benefit of the shorter syntax:

class MailboxNotifier : IDisposable
{
  private readonly IMailbox mailbox;
  private readonly EventHandler messageReceivedHandler;

  public MailboxNotifier(IMailbox mailbox)
  {
    this.mailbox = mailbox;
    this.messageReceivedHandler = (sender, e) => {
      MessageBox.Show("You've got mail");
    };
    mailbox.MessageReceived += messageReceivedHandler;
  }

  public void Dispose()
  {
    mailbox.MessageReceived -= messageReceivedHandler;
  }
}

We just want to do one basic thing but end up having to carry around an instance of the mailbox as well as an instance of our message handler, only used for cleaning up!

Luckily, some bright guys over at Microsoft thought about events and asynchronous operations and realized that IEnumerable, being a sequence of items in “space”, was very much like what they called IObservable, being a sequence of items in time. This opened up a whole bag of interesting operations and was released under the name Reactive Extensions. Among the many benefits of using IObservable over event is the fact that events/observables are now “things” that can be passed around and transformed. For this post, the important part is however that they offered a new solution to handling subscriptions.

IObservable has a method, Subscribe(IObserver observer). This is very much like the += syntax used to add an event handler in the traditional .NET model. However, there is no Unsubscribe(IObserver observer). Instead, Subscribe() returns another “thing” representing the subscription. The specific type chosen is IDisposable, which makes a good choice, but it might as well have been say just an instance of Action. The key here is that to unsubscribe, you need one thing, something representing the subscription. You do not need to keep track of either what you subscribe to, or what is handling events from the subscription. The final version of our example then, using IObservable instead of .NET events:

interface IMailbox
{
  public IObservable<Message> MessageReceived;
}

class MailboxNotifier : IDisposable
{
  private readonly IDisposable mailboxSubscription;

  public MailboxNotifier(IMailbox mailbox)
  {
    this.mailboxSubscription = mailbox.Subscribe(_ => {
      MessageBox.Show("You've got mail");
    });
  }

  public void Dispose()
  {
    this.mailboxSubscription.Dispose();
  }
}

We are happy.

© 2012 SoftMemes Suffusion theme by Sayontan Sinha