RSS .92| RSS 2.0| ATOM 0.3
  • Home
  • About
  •  

    A Simple Illustration of Terracotta Locking Strategies

    June 28th, 2009

    If you are familiar with Terracotta then you know that its hallmark feature is  data sharing between Java programs running on different virtual machines.  Making use of this very powerful capability requires that the the programs that  share the data use Java synchronization to prevent conflicts between operations that access the shared data from corrupting the data or returning incomplete results.  Terracotta data sharing also requires that Terracotta be configured to detect and honour the Java synchronization instructions. In this post we refer to the combination of Java synchronization and Terracotta configuration as a “locking strategy”.

    The choice of locking strategy can have a profound impact on a Terracotta application’s  performance.  Even in a very simple application, there may be several points of data contention at which locking strategies are required, and several possible locking strategies for each contention point.   This post provides a very brief illustration of how locking strategies are implemented with Terracotta, and of the impact that a small change in locking strategy can have on  application performance.

    This post barely scratches the surface of Terracotta’s remarkable data sharing capabilities, and it completely bypasses many other powerful and important features of the product.  Readers are advised to regard  this post as a very simple illustration of some basic principles of working with Terracotta, and nothing more.

    Readers should also understand that this post demonstrates use of Terracotta’s low level concurrency features.  Many Terracotta users will find themselves using Terracotta’s integration modules (TIMS), which provide out-of-the-box integration with productivity frameworks such as Hibernate and Spring.  The intent of the developers of these integration modules seems to be to shield users of the module as much as possible from the kinds of low level concurrency concerns that feature prominently in this post.

    As a starting point, we create two Java programs from three classes:
    A – a data POJO.
    TCLockingExampleMain – a program that creates and instance of A and updates A’s data field.
    TCLockingExampleReporter – a program that indicates whether or not our data is shareable.

    Here is the source code for all three classes prior to implementing any locking strategy:

    package tcLockingExample;

    public class A {
    int primInt;

    public void primIntInc() {
    this.primInt++;
    }
    }

    package tcTLockingExample;

    public class TCLockingExampleMain {
    static A aInstance = new A();

    /**
    * @param args
    */
    public static void main(String[] args) {
    for (int x = 0; x < 10; x++ ) {
    aInstance.primIntInc();
    System.out.println(“aInstance.primInt: ” + aInstance.primInt);
    }
    }
    }

    package tcTLockingExample;

    import java.util.Date;

    public class TCLockingExampleReporter {

    /**
    * @param args
    */
    public static void main(String[] args) {
    System.out.println(new Date() + ” TCLockingChoicesMain.aInstance.primInt:” + TCLockingExampleMain.aInstance.primInt);
    }
    }

    We will be tinkering with the first two classes as the example progresses.

    The first step is to establish that the programs work as expected when run without Terracotta.  As the following output illustrates, both programs run:

    aInstance.primInt: 1
    aInstance.primInt: 2
    aInstance.primInt: 3
    aInstance.primInt: 4
    aInstance.primInt: 5
    aInstance.primInt: 6
    aInstance.primInt: 7
    aInstance.primInt: 8
    aInstance.primInt: 9
    aInstance.primInt: 1

    but, of course, there is no data sharing between them:

    Sun Jun 07 17:48:16 BST 2009 TCLockingChoicesMain.aInstance.primInt:0

    Next we run each prgram as a Terracotta application, but without configuring Terracotta to share data between them.  The results are the same:

    2009-06-07 18:06:48,029 INFO – Terracotta 3.0.0, as of 20090410-200435 (Revision 12431 by cruise@su10mo5 from 3.0)
    2009-06-07 18:06:48,334 INFO – Configuration loaded from the file at ‘/home/dan/workspace/TCLockingExample/tc-config.xml’.
    2009-06-07 18:06:48,494 INFO – Log file: ‘/home/dan/workspace/TCLockingExample/terracotta/client-logs/terracotta-client.log’.
    2009-06-07 18:06:49,907 INFO – Connection successfully established to server at 192.168.1.20:9510
    aInstance.primInt: 1
    aInstance.primInt: 2
    aInstance.primInt: 3
    aInstance.primInt: 4
    aInstance.primInt: 5
    aInstance.primInt: 6
    aInstance.primInt: 7
    aInstance.primInt: 8
    aInstance.primInt: 9
    aInstance.primInt: 10

    and from TCLockingExampleReporter:

    2009-06-07 18:09:03,467 INFO – Terracotta 3.0.0, as of 20090410-200435 (Revision 12431 by cruise@su10mo5 from 3.0)
    2009-06-07 18:09:03,772 INFO – Configuration loaded from the file at ‘/home/dan/workspace/TCLockingExample/tc-config.xml’.
    2009-06-07 18:09:03,905 INFO – Log file: ‘/home/dan/workspace/TCLockingExample/terracotta/client-logs/terracotta-client.log’.
    2009-06-07 18:09:05,199 INFO – Connection successfully established to server at 192.168.1.20:9510
    Sun Jun 07 18:09:05 BST 2009 TCLockingChoicesMain.aInstance.primInt:0

    Next we configure Terracotta to be aware of all three classes, and set a trap for ourselves by establishing the instance of A  in TCLockingExampleMain as a root class (a Terracotta root is an object that is identified as shared by the application’s Terracotta configuration):

    <dso>
    <instrumented-classes>
    <include>
    <class-expression>tcTLockingExample.A</class-expression>
    </include>
    <include>
    <class-expression>tcTLockingExample.TCLockingExampleMain</class-expression>
    </include>
    <include>
    <class-expression>tcTLockingExample.TCLockingExampleReporter</class-expression>
    </include>
    </instrumented-classes>
    <roots>
    <root>
    <field-name>tcTLockingExample.TCLockingExampleMain.aInstance</field-name>
    </root>
    </roots>
    </dso>

    This is a trap because, having established  TCLockingExampleMain.aInstance as a shared object, we are obliged to implement a locking strategy wherever we write to it in our code.  Because we have not done this,   TCLockingExampleMain fails:

    2009-06-07 18:13:48,927 INFO – Terracotta 3.0.0, as of 20090410-200435 (Revision 12431 by cruise@su10mo5 from 3.0)
    2009-06-07 18:13:49,240 INFO – Configuration loaded from the file at ‘/home/dan/workspace/TCLockingExample/tc-config.xml’.
    2009-06-07 18:13:49,371 INFO – Log file: ‘/home/dan/workspace/TCLockingExample/terracotta/client-logs/terracotta-client.log’.
    2009-06-07 18:13:50,845 INFO – Connection successfully established to server at 192.168.1.20:9510
    com.tc.object.tx.UnlockedSharedObjectException:
    *********************************************************************
    Attempt to access a shared object outside the scope of a shared lock.
    All access to shared objects must be within the scope of one or more
    shared locks defined in your Terracotta configuration.

    Caused by Thread: main in VM(0)
    Shared Object Type: tcTLockingExample.A

    The cause may be one or more of the following:
    * Terracotta locking was not configured for the shared code.
    * The code itself does not have synchronization that Terracotta
    can use as a boundary.
    * The class doing the locking must be included for instrumentation.
    * The object was first locked, then shared.

    For more information on how to solve this issue, see:

    http://www.terracotta.org/usoe

    *********************************************************************

    at com.tc.object.tx.ClientTransactionManagerImpl.getTransaction(ClientTransactionManagerImpl.java:360)
    at com.tc.object.tx.ClientTransactionManagerImpl.fieldChanged(ClientTransactionManagerImpl.java:653)
    at com.tc.object.TCObjectImpl.objectFieldChanged(TCObjectImpl.java:317)
    at com.tc.object.TCObjectImpl.intFieldChanged(TCObjectImpl.java:357)
    at tcTLockingExample.A.__tc_setprimInt(A.java)
    at tcTLockingExample.A.primIntInc(A.java:7)
    at tcTLockingExample.TCLockingExampleMain.main(TCLockingExampleMain.java:11)
    Exception in thread “main” com.tc.object.tx.UnlockedSharedObjectException:
    *********************************************************************
    Attempt to access a shared object outside the scope of a shared lock.
    All access to shared objects must be within the scope of one or more
    shared locks defined in your Terracotta configuration.

    Caused by Thread: main in VM(0)
    Shared Object Type: tcTLockingExample.A

    The cause may be one or more of the following: . . .

    To fix this we will implement our first locking strategy, by synchronizing the method in A that writes to the data member of the shared instance:
    package tcLockingExample;

    public class A {
    int primInt;

    synchronized public void primIntInc() {
    this.primInt++;
    }
    }

    and configuring Terracotta to apply its locking to that method:

    <dso>
    <instrumented-classes>
    <include>
    <class-expression>tcLockingExample.A</class-expression>
    </include>
    <include>
    <class-expression>tcLockingExample.TCLockingExampleMain</class-expression>
    </include>
    <include>
    <class-expression>tcLockingExample.TCLockingExampleReporter</class-expression>
    </include>
    </instrumented-classes>
    <roots>
    <root>
    <field-name>tcLockingExample.TCLockingExampleMain.aInstance</field-name>
    </root>
    </roots>
    <locks>
    <autolock>
    <method-expression>void tcLockingExample.A.primIntInc()</method-expression>
    <lock-level>write</lock-level>
    </autolock>
    </locks>
    </dso>

    Now TCLockingExampleMain works:

    2009-06-07 18:24:22,965 INFO – Terracotta 3.0.0, as of 20090410-200435 (Revision 12431 by cruise@su10mo5 from 3.0)
    2009-06-07 18:24:23,284 INFO – Configuration loaded from the file at ‘/home/dan/workspace/TCLockingExample/tc-config.xml’.
    2009-06-07 18:24:23,414 INFO – Log file: ‘/home/dan/workspace/TCLockingExample/terracotta/client-logs/terracotta-client.log’.
    2009-06-07 18:24:25,324 INFO – Connection successfully established to server at 192.168.1.20:9510
    aInstance.primInt: 1
    aInstance.primInt: 2
    aInstance.primInt: 3
    aInstance.primInt: 4
    aInstance.primInt: 5
    aInstance.primInt: 6
    aInstance.primInt: 7
    aInstance.primInt: 8
    aInstance.primInt: 9
    aInstance.primInt: 10

    and TCLockingExampleReporter shows that the data is being shared:

    2009-06-07 18:27:21,486 INFO – Terracotta 3.0.0, as of 20090410-200435 (Revision 12431 by cruise@su10mo5 from 3.0)
    2009-06-07 18:27:21,804 INFO – Configuration loaded from the file at ‘/home/dan/workspace/TCLockingExample/tc-config.xml’.
    2009-06-07 18:27:21,933 INFO – Log file: ‘/home/dan/workspace/TCLockingExample/terracotta/client-logs/terracotta-client.log’.
    2009-06-07 18:27:23,271 INFO – Connection successfully established to server at 192.168.1.20:9510
    Sun Jun 07 18:27:23 BST 2009 TCLockingChoicesMain.aInstance.primInt:10

    Next we try a different locking strategy.  We remove the synchronization from A’s method, and instead synchronize TCLockingExampleMain’s write operation on the root object:

    package tcLockingExample;

    public class TCLockingExampleMain {
    static A aInstance = new A();

    /**
    * @param args
    */
    public static void main(String[] args) {
    for (int x = 0; x < 10; x++ ) {
    synchronized(aInstance) {
    aInstance.primIntInc();
    }
    System.out.println(“aInstance.primInt: ” + aInstance.primInt);
    }
    }
    }

    We also update the Terracotta configuration, applying a Terracotta autolock (which means that Terracotta will add its locking wherever it sees Java synchronization) to TCLockingExampleMain’s main() method instead of A’s incrementer method:

    <dso>
    <instrumented-classes>
    <include>
    <class-expression>tcLockingExample.A</class-expression>
    </include>
    <include>
    <class-expression>tcLockingExample.TCLockingExampleMain</class-expression>
    </include>
    </instrumented-classes>
    <roots>
    <root>
    <field-name>tcLockingExample.TCLockingExampleMain.aInstance</field-name>
    </root>
    </roots>
    <locks>
    <autolock>
    <method-expression>void tcLockingExample.TCLockingExampleMain.main(java.lang.String[])</method-expression>
    <lock-level>write</lock-level>
    </autolock>
    </locks>
    </dso>

    This strategy also works:
    2009-06-07 18:45:39,099 INFO – Terracotta 3.0.0, as of 20090410-200435 (Revision 12431 by cruise@su10mo5 from 3.0)
    2009-06-07 18:45:39,425 INFO – Configuration loaded from the file at ‘/home/dan/workspace/TCLockingExample/tc-config.xml’.
    2009-06-07 18:45:39,555 INFO – Log file: ‘/home/dan/workspace/TCLockingExample/terracotta/client-logs/terracotta-client.log’.
    2009-06-07 18:45:41,441 INFO – Connection successfully established to server at 192.168.1.20:9510
    aInstance.primInt: 1
    aInstance.primInt: 2
    aInstance.primInt: 3
    aInstance.primInt: 4
    aInstance.primInt: 5
    aInstance.primInt: 6
    aInstance.primInt: 7
    aInstance.primInt: 8
    aInstance.primInt: 9
    aInstance.primInt: 10

    and data sharing is enabled:
    2009-06-07 18:44:17,498 INFO – Terracotta 3.0.0, as of 20090410-200435 (Revision 12431 by cruise@su10mo5 from 3.0)
    2009-06-07 18:44:17,816 INFO – Configuration loaded from the file at ‘/home/dan/workspace/TCLockingExample/tc-config.xml’.
    2009-06-07 18:44:17,951 INFO – Log file: ‘/home/dan/workspace/TCLockingExample/terracotta/client-logs/terracotta-client.log’.
    2009-06-07 18:44:19,710 INFO – Connection successfully established to server at 192.168.1.20:9510
    Sun Jun 07 18:44:19 BST 2009 TCLockingChoicesMain.aInstance.primInt:10

    Finally we’ll take a quick look at how the choice of locking strategy can affect performance.  To see this, we change  TCLockingExampleMain so it increments A’s integer member a million times instead of ten as in previous executions.  We also add some code to tell us how long the program took to do the million iterations:

    package tcLockingExample;

    import java.util.Date;

    public class TCLockingExampleMain {
    static A aInstance = new A();

    /**
    * @param args
    */
    public static void main(String[] args) {
    Date startTime = new Date();
    for (int x = 0; x < 1000000; x++) {
    aInstance.primIntInc();
    }
    Date endTime = new Date();
    System.out.println(“startTime: ” + startTime + ” endTime: ” + endTime
    + ” elapsed: ”
    + ((endTime.getTime() – startTime.getTime()) / 1000)
    + ” seconds”);
    System.out.println(“aInstance.primInt: ” + aInstance.primInt);
    }
    }

    When we run this program we see that the million iterations take around 26 seconds:

    2009-06-07 19:16:09,091 INFO – Terracotta 3.0.0, as of 20090410-200435 (Revision 12431 by cruise@su10mo5 from 3.0)
    2009-06-07 19:16:09,410 INFO – Configuration loaded from the file at ‘/home/dan/workspace/TCLockingExample/tc-config.xml’.
    2009-06-07 19:16:09,544 INFO – Log file: ‘/home/dan/workspace/TCLockingExample/terracotta/client-logs/terracotta-client.log’.
    2009-06-07 19:16:11,022 INFO – Connection successfully established to server at 192.168.1.20:9510
    startTime: Sun Jun 07 19:16:11 BST 2009 endTime: Sun Jun 07 19:16:37 BST 2009 elapsed: 26 seconds
    aInstance.primInt: 1000000

    Next we try our second locking strategy with a million iterations:

    package tcLockingExample;

    import java.util.Date;

    public class TCLockingExampleMain {
    static A aInstance = new A();

    /**
    * @param args
    */
    public static void main(String[] args) {
    Date startTime = new Date();
    for (int x = 0; x < 1000000; x++) {
    synchronized (aInstance) {
    aInstance.primIntInc();
    }
    }
    Date endTime = new Date();
    System.out.println(“startTime: ” + startTime + ” endTime: ” + endTime
    + ” elapsed: ”
    + ((endTime.getTime() – startTime.getTime()) / 1000)
    + ” seconds”);
    System.out.println(“aInstance.primInt: ” + aInstance.primInt);
    }
    }

    It takes about the same amount of time:

    2009-06-07 19:21:08,279 INFO – Terracotta 3.0.0, as of 20090410-200435 (Revision 12431 by cruise@su10mo5 from 3.0)
    2009-06-07 19:21:08,602 INFO – Configuration loaded from the file at ‘/home/dan/workspace/TCLockingExample/tc-config.xml’.
    2009-06-07 19:21:08,737 INFO – Log file: ‘/home/dan/workspace/TCLockingExample/terracotta/client-logs/terracotta-client.log’.
    2009-06-07 19:21:10,266 INFO – Connection successfully established to server at 192.168.1.20:9510
    startTime: Sun Jun 07 19:21:10 BST 2009 endTime: Sun Jun 07 19:21:37 BST 2009 elapsed: 26 seconds
    aInstance.primInt: 1000000

    For our last test we move the synchronization statement outside of the loop, meaning that only one lock is required instead of a million (one per iteration):

    package tcLockingExample;

    import java.util.Date;

    public class TCLockingExampleMain {
    static A aInstance = new A();

    /**
    * @param args
    */
    public static void main(String[] args) {
    Date startTime = new Date();
    synchronized (aInstance) {
    for (int x = 0; x < 1000000; x++) {
    aInstance.primIntInc();
    }
    }
    Date endTime = new Date();
    System.out.println(“startTime: ” + startTime + ” endTime: ” + endTime
    + ” elapsed: ”
    + ((endTime.getTime() – startTime.getTime()) / 1000)
    + ” seconds”);
    System.out.println(“aInstance.primInt: ” + aInstance.primInt);
    }
    }

    Execution time drops from 26 seconds to less than one second:

    2009-06-07 19:23:52,409 INFO – Terracotta 3.0.0, as of 20090410-200435 (Revision 12431 by cruise@su10mo5 from 3.0)
    2009-06-07 19:23:52,727 INFO – Configuration loaded from the file at ‘/home/dan/workspace/TCLockingExample/tc-config.xml’.
    2009-06-07 19:23:52,861 INFO – Log file: ‘/home/dan/workspace/TCLockingExample/terracotta/client-logs/terracotta-client.log’.
    2009-06-07 19:23:54,335 INFO – Connection successfully established to server at 192.168.1.20:9510
    startTime: Sun Jun 07 19:23:54 BST 2009 endTime: Sun Jun 07 19:23:55 BST 2009 elapsed: 0 seconds
    aInstance.primInt: 1000000

    As these very simple examples show, there are often several choices for how to implement locking in a Terracotta application, and the selection of a strategy can have profound implications for application performance.  For a developer who is new to Terracotta, selecting appropriate locking strategies may involve significant amounts of trial and error.  As the developer’s understanding of Terracotta grows with experience, locking strategy selection becomes easier and less experimentation is required.


    It wasn’t a race, but . . .

    May 6th, 2009

    This DICE study I have been working on is as much about scalability – what happens to throughput when you add an additional unit of compute power – as about raw performance.  Nevertheless, testing eight different solutions to the same problem did provide some insight into overall performance characteristics.  I’ll tell you how the eight approaches stacked up in terms of speed, but first let me summarize the problem and the eight solutions.

    The problem was to update the objects (“counters”) in a shared data area.  Each update was represented by an object (“updaters”) containing a reference to the counter object to be updated.  The updater objects also had to be written to the shared data area.  High ratios of updaters to counters promoted contention for access to the counters, creating classic hot spots.

    Three solutions used Terracotta. The first had several instances of a simple program that processed a list of updaters.  The instances tussled for access to the counters.  The second solution was like the first except that each instance of the program had a list of updaters that referred to a distinct subset of the counters.  With this approach there was no competition for the Counters.  The third Terracotta solution had each instance of an updating program fed by a private queue.  Each queue contained updaters that referred to a distinct set of counters so, again, there was no competition for the counters.

    Among these three approaches, the second proved to be the fastest overall, achieving over 5,000 updates per second when run with two or four instances of the updating program.  The first (and simplest) approach was the second fastest.  It’s best performance came with only a single updating program running, when it achieved over 3,400 updates per second.  Overall throughput declined precipitously as additional instances were added.

    The third approach was the slowest, but got faster consistently as the number of instances was increased from one to two to four to eight (about the limit of my test environment).  With one instance of  the updating program running throughput was 576 updater per second.  At eight updaters throughput was 1,282 updates per second.

    The five GigaSpaces solutions worked as follows:

    1. A simple non-PU client that connects to a partitioned space and executes reads  and writes against the space.

    2. Clients send updater objects to a partitioned space to be processed.  Updates performed by PUs against local space instances (space-based architecture) that use  a polling containers to detect the arrival of new updater objects.  Clients use writeMultiple() to improve throughput.

    3.  Just like no. 2, but using FIFO features to preserve ordering of updates per counter.

    4. Clients invoke remote methods advertised by PUs to update counters.  Updaters are passed as arguments.  PUs do work against local space instances.

    5. Clients send update requests to spaces as Task objects.  Spaces execute the tasks.

    Among these five approached, numbers two and three, both of which use writeMultiple() and polling containers, were the fastest by substantial margins.  Number two delivered over 30,000 updates per second with two clients and four updaters.  Number three came close to 17,000 updates per second with one client and two updaters.

    Next fastest was number five at about 3,800 updates per second with one client and two updaters.  Number four peaked at around 2,400 updaters per second with two clients and two updaters.  Slowest of the five was number one, which reached between 1,000 and 1,100 updates per second in a variety of configurations.

    Analyzing these results and explaining the differences in performance are topics too large for this post.  A few things are clear  however,  from even the simple set of results presented above:

    1. The concept of locality – which decomposes into the related concepts of proximity and exclusivity – is profoundly important in designing solutions to this class of problem.

    2. A very wide range of results is possible depending on the solution architecture.

    3. For raw speed, GigaSpaces’ polling container construct offers a significant advantage over any of the other choices examined here.


    GigaSpaces Distributed Transaction Performance

    May 4th, 2009

    I’ve done some quick performance testing of GigaSpaces’ distributed transactions using their mahalo implementation.  These are GigaSpaces-only transactions as opposed to transactions involving GigaSpaces and some other persistent store, in which case JTA/XA would be required.

    As a reminder, GigaSpaces considers a transaction to be distributed if it involves more than one primary space partition.  So a transaction that operates on two or more partitions of a partitioned space would be distributed, as would a transaction that operates on two or more different spaces.  A transaction that operates on only one partition of one space is not distributed even if that space is replicated.

    I set my test up as follows:

    • A non-pu client acquires a proxy to a (remote of course) clustered space and writes pojos to that clustered space.
    • The writes are single-threaded, one-at-a-time, and synchronous. (GigaSpaces offers other choices that would undoubtedly be faster).
    • The pojos are routed, so the writes end up going to more than one partition.

    I ran two GSCs on two virtual hosts.  When I ran without backups each GSC managed one partition.  When I ran with backups each GSC managed two partitions.  The primary for each partition ran on adifferent host than the backup when backups were used.

    The client ran on the physical host.

    Ping times on my network run at about .19 ms.

    Each test consists of 10,000 operations of two writes (to primaries) each.  When I ran without backups and without transactions, I got 2,000 operations per second.  Using transactions that dropped to 322 operations per second.

    Working with backups and without transactions,  I got 714 operations per second.  Using transactions that dropped to 208 operations per second.

    These performance figures have little to do with fully optimized GigaSpaces performance.  As I mentioned above, there are faster ways to do these writes than the simple approach I used for these tests.  What the results do indicate, however, is that you can expect to pay a 3x – 6x performance cost for using distributed transactions over independent writes.

    Of course transactions have different characteristics than do independent writes, and those characteristics may justify the performance cost.  As a rule, though, it is clear that distributed transactions should be avoided because of their performance implications unless you have a compelling need for transactional behaviour.