RSS .92| RSS 2.0| ATOM 0.3
  • Home
  • About
  •  

    It wasn’t a race, but . . .

    May 6th, 2009

    This DICE study I have been working on is as much about scalability – what happens to throughput when you add an additional unit of compute power – as about raw performance.  Nevertheless, testing eight different solutions to the same problem did provide some insight into overall performance characteristics.  I’ll tell you how the eight approaches stacked up in terms of speed, but first let me summarize the problem and the eight solutions.

    The problem was to update the objects (“counters”) in a shared data area.  Each update was represented by an object (“updaters”) containing a reference to the counter object to be updated.  The updater objects also had to be written to the shared data area.  High ratios of updaters to counters promoted contention for access to the counters, creating classic hot spots.

    Three solutions used Terracotta. The first had several instances of a simple program that processed a list of updaters.  The instances tussled for access to the counters.  The second solution was like the first except that each instance of the program had a list of updaters that referred to a distinct subset of the counters.  With this approach there was no competition for the Counters.  The third Terracotta solution had each instance of an updating program fed by a private queue.  Each queue contained updaters that referred to a distinct set of counters so, again, there was no competition for the counters.

    Among these three approaches, the second proved to be the fastest overall, achieving over 5,000 updates per second when run with two or four instances of the updating program.  The first (and simplest) approach was the second fastest.  It’s best performance came with only a single updating program running, when it achieved over 3,400 updates per second.  Overall throughput declined precipitously as additional instances were added.

    The third approach was the slowest, but got faster consistently as the number of instances was increased from one to two to four to eight (about the limit of my test environment).  With one instance of  the updating program running throughput was 576 updater per second.  At eight updaters throughput was 1,282 updates per second.

    The five GigaSpaces solutions worked as follows:

    1. A simple non-PU client that connects to a partitioned space and executes reads  and writes against the space.

    2. Clients send updater objects to a partitioned space to be processed.  Updates performed by PUs against local space instances (space-based architecture) that use  a polling containers to detect the arrival of new updater objects.  Clients use writeMultiple() to improve throughput.

    3.  Just like no. 2, but using FIFO features to preserve ordering of updates per counter.

    4. Clients invoke remote methods advertised by PUs to update counters.  Updaters are passed as arguments.  PUs do work against local space instances.

    5. Clients send update requests to spaces as Task objects.  Spaces execute the tasks.

    Among these five approached, numbers two and three, both of which use writeMultiple() and polling containers, were the fastest by substantial margins.  Number two delivered over 30,000 updates per second with two clients and four updaters.  Number three came close to 17,000 updates per second with one client and two updaters.

    Next fastest was number five at about 3,800 updates per second with one client and two updaters.  Number four peaked at around 2,400 updaters per second with two clients and two updaters.  Slowest of the five was number one, which reached between 1,000 and 1,100 updates per second in a variety of configurations.

    Analyzing these results and explaining the differences in performance are topics too large for this post.  A few things are clear  however,  from even the simple set of results presented above:

    1. The concept of locality – which decomposes into the related concepts of proximity and exclusivity – is profoundly important in designing solutions to this class of problem.

    2. A very wide range of results is possible depending on the solution architecture.

    3. For raw speed, GigaSpaces’ polling container construct offers a significant advantage over any of the other choices examined here.