All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.gemstone.gemfire.internal.cache.package.html Maven / Gradle / Ivy

There is a newer version: 2.0-BETA
Show newest version




This package contains internal GemStone classes for implementating caching on top of the GemFire Distributed System.

Local Regions

LocalRegion implements the basic caching mechanism and allows for subclasses to perform message distribution and other specialization of LocalRegion functionality. A LocalRegion is an implementation of the jre Map interface that supports expiration, callbacks, server cache communication, and so on.

LocalRegion has a RegionMap that holds the actual data for the region.

Most changes to an entry in a LocalRegion are performed in three steps:
* The entry is modified under synchronization using an instance of
EntryEventImpl object.  The event
is also queued for later callback invokation under this synchronization.

* Distribution is allowed to occur outside of synchronization

* Synchronization is again obtained on the entry and callbacks are invoked

A LocalRegion may also have a DiskRegion associated with it for persistence or overflow to disk.

Distributed Regions

Distributed Regions are a subclass of LocalRegion that interact with locking and the DistributedSystem to implement distributed caching. Most DistributedRegion operations are carried out using subclasses of DistributedCacheOperation.

Partitioned Regions

The contents of a partitioned region is spread evenly across multiple members of a distributed system. From the user's standpoint, each member hosts a partition of the region and data is moved from partition to partition in order to provide scalability and high availability. The actual implementation of partitioned regions divides each partition into sub-partitions named "buckets". A bucket may be moved from one partition to another partition in a process called "migration" when GemFire determines that the partitioned region's data is not spread evenly across all members. When a bucket reaches a maximum size, it is split in two and may be migrated to a different partition.

Data is split among buckets using the Extensible Hashing algorithm that hashes data based upon the lower-order bits ("mask") of the data's (the Region entry's key in the case of GemFire) value. All partitions of a given region share a directory that maintains a mapping between a mask and information about the bucket that holds data that applies to the mask. When an entry is placed into a partitioned region, the bucket directory is consulted to determine which member(s) of the distributed system should be updated. The Extensible Hashing algorithm is useful when a bucket fills us with data and needs to be split. Other hashing algorithm require a complete rebalancing of the partitioned region when a bucket is full. Extensible Hashing, however, only requires that the full bucket be split into two, thus allowing the other buckets to be accessed without delay. The below diagram demonstrates bucket splitting with extensible hashing.

A BucketInfo contains metadata about a bucket (most importantly the locations of all copies of the bucket) that is distributed to members that access the partitioned region. Changes to the BucketDirectory metadata are coordinate through GemFire's distributed lock service. Inside of a region partition are a number of Buckets that hold the values for keys that match the bucket's mask as shown in the below diagram.

The total size (in bytes) of a bucket is maintained as key/value pairs are added. It is not necessary for the bucket to store the value of a region entry as an actual object. So, the bucket stores the value in its serialized byte form. This takes up less space in the VM's heap and allows us to accurately calculate its size. The entry's key, however, is used when looking up data in the bucket and must be deserialized. As an estimate, the size of the key object is assumed to the size of object's serialized bytes. When a entry's value is replaced via an update operation, the size of the old value is subtracted from the total size before the size of the new value is added in. It is assumed that the key does not change size.

When a bucket's size exceeds the "maximum bucket size", it is split in two based on the extensible hashing algorithm: a new Bucket is created and is populated with the key/value pairs that match its mask, the Bucket's local depth is incremented by 1, update the global depth if the new local depth exceeds the current global depth. The splitting process is repeated while all of the following conditions are met: the size of either bucket continues to exceed the "maximum bucket size", the full bucket has more than 1 element, the global depth is less than the "maximum global depth".

Primary Bucket

One bucket instance is selected as the primary. All bucket operations target the primary and are passed on to the backups from the primary.

Identification of the primary is tracked using metadata in BucketAdvisor. The following diagram shows the standard state transitions of the BucketAdvisor:

Partitioned Region Cache Listeners

User CacheListeners are registered on the PartitionedRegion. Activity in the Buckets may fire callbacks on the PartitionedRegion's CacheListeners. The following tabled figures attempt to demonstrate the logic and sequence involved.

Definition of Participants
pr_A1a pure accessor
pr_B1_pria datastore which hosts primary for bucket B1
pr_B1_c1a datastore which hosts copy 1 for Bucket B1
pr_B1_c2a datastore which hosts copy 2 for Bucket B1
pr_A2_listener
pr_A3_bridge
pr_A4_gateway
pure accessors with CacheListener, Bridge or Gateway

Fig. 1 (Flow of Put to CacheListeners)
pr_A1pr_B1_pripr_B1_c1pr_B1_c2pr_A2_listener
pr_A3_bridge
pr_A4_gateway
putMessage1 -->operateOnPartitionRegion()
sync entry
  update entry
  UpdateOperation.distribute
    if bucket add adjunct.recips
    send ---------------------->--> see Fig. 2
<------- reply
    send ---------------------->--------->--> see Fig. 2
<------- reply
    if adjunct.recips > 0
      PutMessage2 (notificationOnly == true)
        send -------------------------->--------->--------->--> CacheListener fires on pr iif
InterestPolicy != CACHE_CONTENT
<------- reply
  waitForReplies (from all above msgs)
  fire local CacheListener on pr
release entry sync

Fig. 2 (Processing of UpdateOperation by non-Primary Bucket Host)
sync entry
  update entry
  CacheListener fires on pr iif InterestPolicy != CACHE_CONTENT
release entry sync
reply

Migration

Buckets are "migrated" to other members of the distributed system to ensure that the contents of a partitioned region are evenly spread across all members of a distributed system that wish to host partitioned data. After a bucket is split, a migration operation is triggered. Migration may also occur when a Cache exceeds its maxParitionedData threshold and when a new member that can host partitioned data joins the distributed system. Each member is consulted to determine how much partitioned region data it is currently hosting and the com.gemstone.gemfire.cache.Cache#getMaxPartitionedData maximum amount of partitioned region data it can host. The largest bucket hosted by the VM is migrated to the member with the large percentage of space available for partitioned data. This ensures that data is spread evenly among members of the distributed system and that their space available partitioned region data fills consistently. Migration will continue until the amount of partitioned data hosted by the member initiating the migration falls below the average for all members. When a member that hosts partitions {@linkplain com.gemstone.gemfire.cache.Cache#close closes} its Cache, the partitions are migrated to other hosts.

High Availability

The high availability ( com.gemstone.gemfire.cache.PartitionAttributes#getRedundancy redundancy) feature of partitioned regions effects the implementation in a number of ways. When a bucket is created, the implementation uses the migration algorithm to determine the location(s) of any redundant copies of the buckets. A warning is logged if there is not enough room (or not enough members) to guarantee the redundancy of the partitioned region. When an entry is put into a redundant partitioned region, the key/value is distributed to each bucket according to the consistency specified by the region's scope. That is, is the region is DISTRIBUTED_ACK, the put operation will not return until it has received an acknowledgment from each bucket. When a get is performed on a partitioned region and the value is not already in the partitioned region's local cache, a targeted netSearch is performed. When there are redundant copies of the region's buckets, the netSearch chooses one bucket at random from which to fetch the value. If the bucket does not respond within a given timeout, then the process is repeated on another randomly chosen redundant bucket. If the bucket has been migrated to another member, then the member operating on the region will re-consult its metadata and retry the operation. When redundant buckets are migrated from one machine to another, the implementation is careful to ensure that multiple copies of a bucket are not hosted by the same member.

System Properties

All of the system properties used by GemFire are discussed here.




© 2015 - 2024 Weber Informatics LLC | Privacy Policy