RSS .92| RSS 2.0| ATOM 0.3
  • Home
  • About
  •  

    An Even Briefer Look at Distributed Transactions in GigaSpaces

    May 25th, 2009

    A couple of weeks ago I posted a quick example and explanation of a GigaSpaces local transaction.  You can find the post here and get the code here.

    In today’s short post I will extend that example to use a distributed transaction.  We’ll do this in two steps: first we’ll break the example; then we’ll fix it.

    As a reminder, a GigaSpaces distributed transaction is any transaction that operates on more than one primary space.  In the example code, our client program executed a local transaction when it wrote two instances of TestClass to a single instance remote space.

    In that example routing was not a concern because we created the space as unpartitioned, and we did not declare a space routing field. Behind the scenes, however, GigaSpaces selected one (the id field) for us.  You can check this on the Space Browser tab by expanding the GSSimpleTranExample space node, clicking on “Classes”, then clicking on “TestClass”.  The name of Routing Filed will appear on the Classes Info tab just above the table showing the fields (only one in our case) in the class.

    Now drop the space using Undeploy Application on the Cluster Runtime tab.  Then recreate it as a partitioned space with two partitions and no backups.  Rerun the client application, and it will fail with this error message:

    Exception in thread “main” org.openspaces.core.TransactionDataAccessException: Invalid operation – local transaction spans over multiple spaces – [GSSimpleTranExample_container2:GSSimpleTranExample, GSSimpleTranExample_container1:GSSimpleTranExample] !
    You might be using hash based load balancing (partitioned schema) while writing data into multiple spaces and not into a single node.
    Please Use Jini Transaction manager with your operations.
    ; nested exception is net.jini.core.transaction.TransactionException: Invalid operation – local transaction spans over multiple spaces – [GSSimpleTranExample_container2:GSSimpleTranExample, GSSimpleTranExample_container1:GSSimpleTranExample] !
    You might be using hash based load balancing (partitioned schema) while writing data into multiple spaces and not into a single node.
    Please Use Jini Transaction manager with your operations.

    The reason is that GigaSpaces attempted to route each of the two writes to  different partitions, which turned our local transaction into a distributed transaction.  Because we configured the application with a local transaction manager, the transaction fails.

    To fix the application we need to specify a distributed transaction manager instead of a local one.  Here’s how:

    Find the line in the Spring application context file, GSSimpleTranExample.xml, in which we specify a transaction manager:

    <!– @page { margin: 2cm } P { margin-bottom: 0.21cm } –><os-core:local-tx-manager id=transactionManager” space=gSSimpleTranExample”/>

    and replace it with a line that looks like this:

    <!– @page { margin: 2cm } P { margin-bottom: 0.21cm } –>

    <os-core:distributed-tx-manager id=“transactionManager” />

    Note that the distributed transaction manager, unlike a local transaction manager, is not associated with a particular space.

    Now run the client application.  This time it should work.  You can confirm the transactional behaviour using the techniques described in the earlier post.


    GigaSpaces Distributed Transaction Performance

    May 4th, 2009

    I’ve done some quick performance testing of GigaSpaces’ distributed transactions using their mahalo implementation.  These are GigaSpaces-only transactions as opposed to transactions involving GigaSpaces and some other persistent store, in which case JTA/XA would be required.

    As a reminder, GigaSpaces considers a transaction to be distributed if it involves more than one primary space partition.  So a transaction that operates on two or more partitions of a partitioned space would be distributed, as would a transaction that operates on two or more different spaces.  A transaction that operates on only one partition of one space is not distributed even if that space is replicated.

    I set my test up as follows:

    • A non-pu client acquires a proxy to a (remote of course) clustered space and writes pojos to that clustered space.
    • The writes are single-threaded, one-at-a-time, and synchronous. (GigaSpaces offers other choices that would undoubtedly be faster).
    • The pojos are routed, so the writes end up going to more than one partition.

    I ran two GSCs on two virtual hosts.  When I ran without backups each GSC managed one partition.  When I ran with backups each GSC managed two partitions.  The primary for each partition ran on adifferent host than the backup when backups were used.

    The client ran on the physical host.

    Ping times on my network run at about .19 ms.

    Each test consists of 10,000 operations of two writes (to primaries) each.  When I ran without backups and without transactions, I got 2,000 operations per second.  Using transactions that dropped to 322 operations per second.

    Working with backups and without transactions,  I got 714 operations per second.  Using transactions that dropped to 208 operations per second.

    These performance figures have little to do with fully optimized GigaSpaces performance.  As I mentioned above, there are faster ways to do these writes than the simple approach I used for these tests.  What the results do indicate, however, is that you can expect to pay a 3x – 6x performance cost for using distributed transactions over independent writes.

    Of course transactions have different characteristics than do independent writes, and those characteristics may justify the performance cost.  As a rule, though, it is clear that distributed transactions should be avoided because of their performance implications unless you have a compelling need for transactional behaviour.