com.bigdata.package.html Maven / Gradle / Ivy

Go to download


bigdata®





bigdata® is a scale-out data and computing fabric designed for commodity
hardware. Scale-out is achieved using key-range partitioned B+Trees and
distributed computing.  The architecture supports both
embedded and scale-out database applications.  Unisolated transactions
are supported and provide for extremely high read-write concurrency
when used as a sparse row store.  In addition, both read-committed,
read-only, and fully isolated read-write transactions are supported
using Multi-Version Concurrency Control (MVCC).



Services



The bigdata architecture is broken down into several services: 


data service
The data service provides an API for reading
and writing on index partitions

metadata service
The metadata services manages and locates
index paritions

transaction service
The transaction service coordinates
transaction start and commits and provides the integration point for
both unisolated and isolated transactions.

map/reduce service
The map/reduce service provides an API
for decomposing a problem across the distributed database.



The current release supports distributed services using JINI but there
has been interest expressed in supporting other distributed
application architectures as well, including OSGi and Service
Component Archicture (SCA).





Note: Readers familar with Google's research publications or with the
Apache Hadoop effort will recognize some similarities and some
differences.  For example, both Google and Hadoop both use a
distributed file system for failover.  While bigdata may be deployed
in a similar manner using a third party distributed file system, it
also offers a store-level media replication strategy for addressing
failover.



Packages





{@link com.bigdata.cache}
A set of utility classes for
creating weak reference object caches.

{@link com.bigdata.io}
A set of utility classes I/O.

{@link com.bigdata.util}
A set of utility classes.

{@link com.bigdata.rawstore}
A set of interfaces and
utility classes defining the low-level protocol for operations on a
persistence store.  Operations at this level are expressed in terms of
byte[] records and an "address" combining both the offset
at which the a record was written and the length of the
record.

{@link com.bigdata.btree}
This package provides both a
implementation of both a mutable B+Tree and a read-only B+Tree.  The
mutable {@link com.bigdata.btree.BTree} supports variable length
byte[] keys, a copy-on-write strategy for nodes and leaves which is
used to support transactional isolation, and remains balanced under
both insert and delete operations.  a B+Tree may be exported into a
read-only {@link com.bigdata.btree.IndexSegment} using an efficient
bulk index build utility.

{@link com.bigdata.isolation}
This package provides
specialized B+Tree classes designed to support transactional
isolation. This builds on the features of the base B+Tree package,
which already supports copy-on-write semantics, and on the Journal
package, which already supports a policy in which valid data are never
overwritten. The primary contribution of this package is a set of
extensions and wrapper classes that manage {@link
com.bigdata.isolation.IValue} objects wrapping application data
values. Each {@link com.bigdata.isolation.IValue} encapsulates a
version counter, which is used to detect write-write conflicts, and a
deleted flag, which is used to mark keys that have been deleted until
a full compacting merge can be performed.

{@link com.bigdata.sparse}
This package provides a sparse
row store data model similar to Google's bigtable or the HBase
component in the Apache Hadoop project.  A sparse row store is a data
model in which the B+Tree keys are formed as:

[schemaName][primaryKey][columnName][timestamp]



{@link com.bigdata.journal}
This package provides a fast
append-only persistence store.  The journal is designed to minimize
disk head movement and maximize the opportunity for sequential IO.
Typically, multiple indices are mapped onto the same journal in order
to minimize the #of distinct disk files and disk seeks on a server
platform.

{@link com.bigdata.service}
This package realizes the
services for a distributed scale-out database.  The basic components
of the scale-out architecture are the {@link
com.bigdata.service.IDataService} and {@link com.bigdata.service.IMetadataService}.