RSS .92| RSS 2.0| ATOM 0.3
  • Home
  • About
  •  

    It wasn’t a race, but . . .

    May 6th, 2009

    This DICE study I have been working on is as much about scalability – what happens to throughput when you add an additional unit of compute power – as about raw performance.  Nevertheless, testing eight different solutions to the same problem did provide some insight into overall performance characteristics.  I’ll tell you how the eight approaches stacked up in terms of speed, but first let me summarize the problem and the eight solutions.

    The problem was to update the objects (“counters”) in a shared data area.  Each update was represented by an object (“updaters”) containing a reference to the counter object to be updated.  The updater objects also had to be written to the shared data area.  High ratios of updaters to counters promoted contention for access to the counters, creating classic hot spots.

    Three solutions used Terracotta. The first had several instances of a simple program that processed a list of updaters.  The instances tussled for access to the counters.  The second solution was like the first except that each instance of the program had a list of updaters that referred to a distinct subset of the counters.  With this approach there was no competition for the Counters.  The third Terracotta solution had each instance of an updating program fed by a private queue.  Each queue contained updaters that referred to a distinct set of counters so, again, there was no competition for the counters.

    Among these three approaches, the second proved to be the fastest overall, achieving over 5,000 updates per second when run with two or four instances of the updating program.  The first (and simplest) approach was the second fastest.  It’s best performance came with only a single updating program running, when it achieved over 3,400 updates per second.  Overall throughput declined precipitously as additional instances were added.

    The third approach was the slowest, but got faster consistently as the number of instances was increased from one to two to four to eight (about the limit of my test environment).  With one instance of  the updating program running throughput was 576 updater per second.  At eight updaters throughput was 1,282 updates per second.

    The five GigaSpaces solutions worked as follows:

    1. A simple non-PU client that connects to a partitioned space and executes reads  and writes against the space.

    2. Clients send updater objects to a partitioned space to be processed.  Updates performed by PUs against local space instances (space-based architecture) that use  a polling containers to detect the arrival of new updater objects.  Clients use writeMultiple() to improve throughput.

    3.  Just like no. 2, but using FIFO features to preserve ordering of updates per counter.

    4. Clients invoke remote methods advertised by PUs to update counters.  Updaters are passed as arguments.  PUs do work against local space instances.

    5. Clients send update requests to spaces as Task objects.  Spaces execute the tasks.

    Among these five approached, numbers two and three, both of which use writeMultiple() and polling containers, were the fastest by substantial margins.  Number two delivered over 30,000 updates per second with two clients and four updaters.  Number three came close to 17,000 updates per second with one client and two updaters.

    Next fastest was number five at about 3,800 updates per second with one client and two updaters.  Number four peaked at around 2,400 updaters per second with two clients and two updaters.  Slowest of the five was number one, which reached between 1,000 and 1,100 updates per second in a variety of configurations.

    Analyzing these results and explaining the differences in performance are topics too large for this post.  A few things are clear  however,  from even the simple set of results presented above:

    1. The concept of locality – which decomposes into the related concepts of proximity and exclusivity – is profoundly important in designing solutions to this class of problem.

    2. A very wide range of results is possible depending on the solution architecture.

    3. For raw speed, GigaSpaces’ polling container construct offers a significant advantage over any of the other choices examined here.


    Digging Deeper into Terracotta

    May 5th, 2009

    I sent a draft of my DICE paper on techniques for updating a distributed dataset to a contact at Terracotta for his comments (Terracotta features prominently in the paper).  He wrote back with two observations.  The first had to do with some clumsiness in my explanation of how Terracotta provides high availability.  His point is well made, and I’ll be revising the draft to reflect his ideas.

    The second point will require more research.  I had observed that, although using Terracotta doesn’t require learning any new APIs, it does require strong skills in programming for concurrency.  My contact readily concedes this, but he asked me to consider the impact of Terracotta’s integration modules (TIMs) on the programmer’s learning curve.

    As he pointed out, my DICE work focused on using Terracotta with native Java and home-rolled applications.  According to him, the preconfigured integrations of Terracotta and popular third party packages such as Spring and EHCache allow developers who use these third party packages to gain the benefits of Terracotta without having to acknowledge or manage any new concurrency issues.  This is an interesting perspective that I will have to evaluate through experimentation.

    If you have experience adopting Terracotta, with or without the TIMs I mentioned, I would like to hear your impressions of the learning curve.


    GigaSpaces Distributed Transaction Performance

    May 4th, 2009

    I’ve done some quick performance testing of GigaSpaces’ distributed transactions using their mahalo implementation.  These are GigaSpaces-only transactions as opposed to transactions involving GigaSpaces and some other persistent store, in which case JTA/XA would be required.

    As a reminder, GigaSpaces considers a transaction to be distributed if it involves more than one primary space partition.  So a transaction that operates on two or more partitions of a partitioned space would be distributed, as would a transaction that operates on two or more different spaces.  A transaction that operates on only one partition of one space is not distributed even if that space is replicated.

    I set my test up as follows:

    • A non-pu client acquires a proxy to a (remote of course) clustered space and writes pojos to that clustered space.
    • The writes are single-threaded, one-at-a-time, and synchronous. (GigaSpaces offers other choices that would undoubtedly be faster).
    • The pojos are routed, so the writes end up going to more than one partition.

    I ran two GSCs on two virtual hosts.  When I ran without backups each GSC managed one partition.  When I ran with backups each GSC managed two partitions.  The primary for each partition ran on adifferent host than the backup when backups were used.

    The client ran on the physical host.

    Ping times on my network run at about .19 ms.

    Each test consists of 10,000 operations of two writes (to primaries) each.  When I ran without backups and without transactions, I got 2,000 operations per second.  Using transactions that dropped to 322 operations per second.

    Working with backups and without transactions,  I got 714 operations per second.  Using transactions that dropped to 208 operations per second.

    These performance figures have little to do with fully optimized GigaSpaces performance.  As I mentioned above, there are faster ways to do these writes than the simple approach I used for these tests.  What the results do indicate, however, is that you can expect to pay a 3x – 6x performance cost for using distributed transactions over independent writes.

    Of course transactions have different characteristics than do independent writes, and those characteristics may justify the performance cost.  As a rule, though, it is clear that distributed transactions should be avoided because of their performance implications unless you have a compelling need for transactional behaviour.


    Rolling the DICE

    May 3rd, 2009

    For weeks I’ve been working on a comparison of techniques for updating a distributed data set using Terracotta and GigaSpaces.   This weekend I finally got a draft out for review.  It is mostly ok,  but I got results that I can’t explain when I tested two of the GigaSpaces implementations (there are eight implementations in total – three with Terracotta and five with GigaSpaces).  I’ve asked my friends at the two vendor organizations to take a look at the draft and give me their comments.  Maybe my GigaSpaces contact can help me resolve those two mysteries.  Either way the paper is about done and I’m starting to move on to some new work.

    Next on the agenda:

    • I have spent some time fiddling with transactional techniques with GigaSpaces.  I’m planning to take that work a bit further and write up my results.
    • GemFire has been on my list of products to investigate for awhile.  As of today I have it installed on my test network and I expect to start working with it next week.

    Let me know if you are interested in the DICE paper or the GigaSpaces transactional work.