Random Post: NCache Concurrency Controls
RSS .92| RSS 2.0| ATOM 0.3
  • Home
  • About
  •  

    NCache Concurrency Controls

    March 8th, 2012

    Recently I have been looking into .NET-friendly in-memory data grid (IMDG) software. One of the products I have been experimenting with is Alachisoft’s NCache.

    When I analyze a distributed caching product, I like to focus first on concurrency control. Before examining performance, scalability, or advanced architectural features, it is valuable to confirm that the software can be used to construct applications that remain reliable when data is shared and subject to updates. To do this I usually construct a test case involving extreme data hot spots, and exercise the features offered by the product for managing data access in these heavy contention situations.

    If the product I’m working with offers features for dividing work into ACID style transactions then I explore those features as part of my analysis of concurrent data access. NCache does not provide features for using ACID transactions, so my work with NCache has focused on its locking features.

    NCache provides both optimistic and pessimistic concurrency control. The pessimistic locking model turns out to be unusual, at least in my experience. Although the documentation on this feature is quite sparse, with help from the (very responsive) tech. support team at Alachisoft I was able to get an understanding of how NCache’s pessimistic locking works and how to use the NCache API to control it.

    Here are some key points about NCache’s pessimistic locking:

    1. The NCache data access API provides specialized methods for acquiring locks, releasing locks and accessing items in the cache under locking constraints.

    2. Each lock represents the state of an item of data in the cache.

    3. Locks are not granted to a thread or a process. Every lock is equally visible to and manipulable by all cache client threads, regardless of which thread originally requested the lock.

    4. Locking is completely cooperative. If locking is used to control access to a data item then, to ensure that exclusive access to that item will be maintained until the routine that requested the lock relinquishes it, every potentially concurrent access to that item must be made using the locking form of the NCache data access API. Using a non-locking API method to access a locked item will bypass the lock.

    5. In at least one case, using a non-locking API method will not only bypass the lock but will remove it.

    6. NCache’s locks will never cause a client thread to block. Data access methods that access a locked item in the cache will return right away. Clients must include logic to interpret the results of the method invocation and retry it if appropriate.

    7. In some cases, to determine why a method has not been successful the side effects of the method must be examined.

    This final point requires a fuller explanation. Most of the the pessimistic locking API methods take a reference to an object called a LockHandle as one of their arguments. A LockHandle stores the information that identifies a lock on a data item. The values stored in the LockHandle you pass in may be changed by the method. Depending on which method you use and the state (locked or unlocked) of the item you are trying to access, to interpret the results of your locking call it may be necessary to inspect the values in the LockHandle you passed in. Here is an example:

    The Get() method takes a key and returns the associated cached item if there is one. The locking form of the Get() method takes three additional arguments, one of which is a LockHandle reference. Depending on the values of those arguments, the method can be used to lock an object and retrieve it in a single call. In this case the method will return null if there is no object in the cache with the specified key value, or if the object exists but is already locked. To distinguish between these two cases the caller must examine the post-call values in the LockHandle.

    NCache’s concurrency features work as intended, and with them I was able to maintain the data consistency required by my test scenario. Both the optimistic and the pessimistic features delivered correct results when data items were subject to vigorous concurrent access. However, when compared with the concurrency features provided by well known competitors, NCache’s concurrency controls are quirky and not particularly developer-friendly. Its vulnerability to pessimistic locks being accidentally bypassed or removed by clients that don’t conform to the cooperative lock management protocol makes applications that depend on exclusive data access prone to bugs. The non-blocking approach may be welcome in some situations, but the lack of blocking calls is likely to lead to increased client code complexity and reduced performance in many cases. The lack of support for ACID transactions poses a significant obstacle to building applications that must maintain strict data consistency. For some use cases NCache’s pessimistic locking or its optimistic concurrency features may suffice, but developers designing a system with shared access to writable data should study its concurrency control feature set carefully before choosing NCache.


    Four Coherence-based Solutions to a Data Hot Spot Problem

    August 31st, 2011

    Summary

    Leveraging the DICE framework I developed last year (write-up available under “Papers and Articles” at http://www.scapps.co.uk), I developed four different solutions to a simplified (non-distributed) variant of the DICE computing problem using Coherence 3.7, each using a different approach to concurrency management. I then measured the throughput of each implementation using DICE configurations that promote a high rate of concurrent requests for access to shared data objects.

    The implementation based on Coherence’s Entry Processors feature showed the highest throughput.

    Two of the four implementations provide ACID transactional guarantees. The transactional implementations provided lower throughput than the non-transactional implementations.

    One of the two transactional implementations uses a Coherence transactionality feature which is now deprecated. The other uses Coherence’s new Transactions Framework. The transactional implementation based on the deprecated Coherence feature provided greater throughput than the implementation based on the new Transaction Framework.

    Simplified DICE Computing Problem

    There are two domain classes:

    • Counters, each with a unique identifier and a counter value.
    • Incrementers, each containing a unique identifier and a reference to a Counter, and a flag that is used to distinguish between Incrementers that have and have not been processed. Each Incrementer represents an event to be processed.

    Processing an Incrementer event involves the following steps:

    1. The Incrementer object is interrogated to obtain the id of the Counter object to which it refers.
    2. The Counter object is located.
    3. The value of the Counter object’s counter field is incremented.
    4. The Incrementer object’s processed flag is set.
    5. The new state of the Counter is saved in the shared data area.
    6. The Incrementer is written to the shared data area. The throughput being measured is the rate at which these update operations (DICE ops.) are performed.

    This problem is a simplified version of the original DICE problem, which required that domain data and processing be distributed across more than one host. In the simplified version presented here, all data is stored and all processing is done on a single host. Note, however, that the solutions presented here can also work without modification with distributed clients, data or both.

    The Four Implementations

    The client program in each of the four implementations is a cluster member without local storage. The cache(s) needed by each implementation are established and managed by another cluster member which is started before the client is run.

    The four implementations differ in the ways they perform DICE ops:

    • Client non-transactional – The client’s worker threads read from and write to the caches, using explicit locks to manage concurrency. Two caches are used, one per domain type.
    • EntryProcessor non-transactional – The client’s worker threads send Entry Processors to the cluster member that hosts the data. That cluster member performs the updates, taking advantage of the Entry Processors’ inherent concurrency controls. One cache is used for both domain types so that a single Entry Processor can be used for each DICE op.
    • Client transactional deprecated – The client’s worker threads read from and write to the caches using the (deprecated) TransactionMap interface. Pessimistic mode is used to manage concurrency. Two caches are used, one per domain type.
    • Client transactional Transaction Framework – The client’s worker threads read from and writes to the caches using the Transaction Framework’s OptimisticNamedCache interface. When updates fail for concurrency-related reasons (either a required lock is unavailable or the optimistic assumption proves false), retries are issued automatically. Two caches are used, one per domain type.

    Each DICE Op. Includes two writes to the cache: a Counter is updated, and an Incrementer is inserted. In the two transactional implementations, these writes are performed as an atomic operation. In the non-transactional implementation they are not.

    DICE Configuration

    Here are some key points from the DICE application configuration used for these tests:

    • The number of Counter objects was set to two for one series of tests, and to ten for another series (as shown in the table of results).
    • One client instance was run for each test execution.
    • The client spawned five worker threads that initiated DICE ops.
    • Each thread was instructed to execute 2,000 DICE ops.
      (The total number of DICE ops. per test was therefore 5 x 2,000 = 10,000.)
    • Counter objects each contain a string field that can be populated to increase the size of each instance. For these tests a string of 4,096 bytes was used.

    Coherence Cache Configuration

    Here are some key points from the Coherence cache configuration used for these tests:

    A distributed scheme managed by the DistributedCache service was used for the two non-transactional implementations and the transactional implementation based on the deprecated TransactionMap interface.

    • A transactional scheme managed by the TransactionalCache service was used for the TransactionMap implementation.
    • In both cases, thread-count was set to two; lease-granularity was set to ‘lease’; and an unlimited backing map local scheme was used.

    Observed Throughput

    The Coherence license under which I am working does not allow me to “disclose results of any program benchmark tests without our prior consent”. Because I don’t have Oracle’s consent to publish my observations, I am reporting results on a relative basis, with the throughput rates I observed indexed to the lowest value.

    Several iterations of the test were run with each implementation. The table that follows shows the best throughput achieved by each implementation with two and five Counters respectively, indexed to the throughput of the lowest observed rate (Transaction Framework with two Counters).

    implementation

    no. Counters

    indexed throughput

    EntryProcessor non-transactional

    2

    11.7

    client non-transactional

    2

    3.7

    client transactional deprecated

    2

    2.4

    client transactional Transaction Framework

    2

    1.0

    EntryProcessor non-transactional

    10

    12.1

    client non-transactional

    10

    5.3

    client transactional deprecated

    10

    3.1

    client transactional Transaction Framework

    10

    2.1

    When considering these results it is important to remember that:

    • Throughput was measured using a DICE configuration designed to produce very heavy contention for access to a small number of data elements.

    • Neither the execution environment nor the Coherence configuration were tuned to optimize the throughput of any of the implementations.

    Conclusions and Interpretation

    Higher rates of throughput were achieved by the non-transactional implementations than by the transactional implementations. Presumably this is because of the overhead required to support the atomicity and isolation properties of transactions.

    Of the two non-transactional implementations, the implementation that used EntryProcessors provided greater throughput by a factor of more than two. This advantage is likely attributable to the smaller amount of inter-process communication required by the Entry Processors compared with the non-transactional client implementation.

    Of the two transactional implementations, the one based on the deprecated TransactionMap feature showed greater throughput. This is probably because its pessimistic mode is more efficient than the Transaction Framework’s optimistic concurrency control for this use-case, which is write-intensive and contention-intensive by design.

    As shown in the following table, each implementation achieved greater throughput when the number of Counters was increased (although in the case of the EntryProcessor implementation the increase was only about 4%). The reduced rate of contention for each Counter probably explains the increases.

    implementation

    % change

    EntryProcessor non-transactional

    4%

    client non-transactional

    42%

    client transactional deprecated

    33%

    client transactional Transaction Framework

    134%

    The percentage improvement in throughput when the number of counters was increased was greatest (134%) for the Transaction Framework implementation. The likely explanation is that the cost of the retries used by this implementation to recover from concurrency conflicts is very high. With the updates spread across five times as many Counters the number of retries was substantially reduced.

    As the following table show, the throughput advantage of the implementation using the deprecated TransactionMap feature over the implementation using the newer Transaction Framework became much smaller percentage-wise when the number of Counters was increased from two to ten. This observation suggests that the retries employed by the Transaction Framework implementation to resolve concurrency conflicts account for most of the difference in throughput rates between the old and new transactionality features.

    Counters

    % increase

    Two Counters

    136%

    Ten Counters

    34%