Why do we need bulk loading?
Bulk loading helps us reduce the time required for warming up large caches.
Basic Idea:
The basic idea for doing fast bulk loading can be achieved by reducing lock acquisition time and transaction overhead. To do this one can move the cache from "coherent" mode to "incoherent" mode when you want to do bulk loading. In incoherent mode, the cache is non-coherent and concurrent locks mark Terracotta transaction boundaries. Multiple puts are batched up automatically and sent as one Terracotta transaction.
Implementation details:
Cache interface (in ehcache-core) now exposes a few more methods which makes using bulk load quite simple to use. So here are are basic descriptions of those methods:
- setNodeCoherent(boolean) -- When false is passed as a parameter to this method then it simply makes the cache incoherent. However when the cache is coherent then this call stops the local buffer and waits until all the Terracotta transactions have been acked back.
- waitUntilClusterCoherent() -- This call will wait until all the nodes in the cluster are coherent.
- isNodeCoherent() -- Returns true if the current node is coherent.
- isClusterCoherent() -- Returns true if the cluster is coherent.
A simple example here:
// This call makes the cache incoherent and enables buffering and will use concurrent locks
cache.setNodeCoherent(false);
// Start loading
startLoading();
// This makes the cache coherent again and this will wait
// until all Terracotta transactions have been acked
// by the server
cache.setNodeCoherent(true);
cache.waitUntilClusterCoherent();
Hence by just using these interfaces one can start using bulk load.
Performance
The sample app attached shows bulk load time reduced from 134 seconds to 10 seconds when bulk loading 100,000 objects.
Advanced tuning parameters
If you want to tune the bulk loading further, then here are some more tuning parameters:
- ehcache.incoherent.putsBatchTimeInMillis -- Time to sleep for the flush thread between subsequent clearing of the local buffer to server.
- ehcache.incoherent.putsBatchSize -- Number of elements that will be batched together in a (terracotta concurrent) transaction.
- ehcache.incoherent.throttlePutsAtSize -- Number of elements that can be in the local buffer beyond which app threads doing put()'s will sleep (until the local buffer size becomes less than this number).
- ehcache.incoherent.logging -- If true, does some minimal logging (like when cache goes to coherent/incoherent mode etc), useful for debugging.
Express installation steps
Here is the sample code along the ehcache.xml and tc-config.xml
Sample code