com.bigdata.journal.IRootBlockView Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of bigdata-core Show documentation
Blazegraph(TM) DB Core Platform. It contains all Blazegraph DB dependencies other than Blueprints.
There is a newer version: 2.1.4
/**

Copyright (C) SYSTAP, LLC DBA Blazegraph 2006-2016.  All rights reserved.

Contact:
     SYSTAP, LLC DBA Blazegraph
     2501 Calvert ST NW #106
     Washington, DC 20008
     licenses@blazegraph.com

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
*/
/*
 * Created on Oct 18, 2006
 */

package com.bigdata.journal;

import java.nio.ByteBuffer;
import java.util.UUID;

import com.bigdata.io.writecache.WriteCache;
import com.bigdata.quorum.Quorum;
import com.bigdata.rawstore.WormAddressManager;

/**
 * Interface for a root block on the journal. The root block provides metadata
 * about the journal. The journal has two root blocks. The root blocks are
 * written in an alternating order according to the Challis algorithm. Each root
 * block includes a field at the head and tail whose value is strictly
 * increasing fields. This field is often referred to as a root block
 * "timestamps", but in practice we use the commit counter. On restart, the root
 * block is choosen whose (a) strictly increasing fields agree; and (b) whose
 * value on those fields is greater. This protected against both crashes and
 * partial writes of the root block itself.
 * 
 * The commit counter is a store local strictly increasing non-negative long
 * integer (commit counters are distinct for each store regardless of whether
 * they are part of the same distributed database). The commit counters MUST be
 * strictly increasing (a) so that they place the commit records into a total
 * ordering; (b) so that the more current root block may be choose by comparing
 * the value of the field in each of the two root blocks; and (c) so that a
 * partial write of a root block may be detected by the presence of different
 * values for the field at the head and tail of a given root block. The commit
 * counter is also used as the field written at the head and tail of each root
 * block according to the Challis algorithm. If those fields are the same then
 * the root block is assumed to have been completely written.
 * 

 * Note that random data may still result in an identical value during a partial
 * write. This possibility is guarded against by storing the checksum of the
 * root block.
 * 

 * The first and last commit times are persisted in each root block in order to
 * support both unisolated commits and transactions, whether in a local or a
 * distributed database. These "times" are generated by the appropriate
 * {@link ITransactionManagerService} service, which is responsible both for assigning
 * transaction start times (which are in fact the transaction identifier) and
 * transaction commit times, which are stored in root blocks of the various
 * stored that participate in a given database and reported via
 * {@link #getFirstCommitTime()} and {@link #getLastCommitTime()}. While these
 * do not strictly speaking have to be "times" they do have to be assigned using
 * the same measure as the transaction identifiers, so either a coordinated time
 * server or a strictly increasing counter. Regardless, we need to know "when" a
 * transaction commits as well as "when" it starts whether we measure "when"
 * using a counter or a clock. Also note that we need to assign "commit times"
 * even when the operation is unisolated. This means that we have to coordinate
 * an unisolated commit on a store that is part of a distributed database with
 * the centralized transaction manager. This should be done as part of the group
 * commit since we are waiting at that point anyway to optimize IO by minimizing
 * syncs to disk.
 * 

 * Note that some file systems or disks can re-order writes of by the
 * application and write the data in a more efficient order. This can cause the
 * root blocks to be written before the application data is stable on disk. The
 * {@link Options#DOUBLE_SYNC} option exists to defeat this behavior and ensure
 * restart-safety for such systems.
 * 
 * @author Bryan Thompson
 * @version $Id$
 */
public interface IRootBlockView {

    /**
     * Assertion throws exception unless the root block is valid. Conditions
     * tested include the root block MAGIC and the root block timestamps (there
     * are two and they must agree).
     */
    public void valid() throws RootBlockException;

    /**
     * There are two root blocks and they are written in an alternating order.
     * For the sake of distinction, the first one is referred to as "rootBlock0"
     * while the 2nd one is referred to as "rootBlock1". This method indicates
     * which root block is represented by this view based on metadata supplied
     * to the constructor (the distinction is not persistent on disk).
     * 
     * @return True iff the root block view was constructed from "rootBlock0".
     */
    public boolean isRootBlock0();

    /**
     * The root block version number.
     */
    public int getVersion();

    /**
     * The next offset at which a data item would be written on the store.
     * 
     * FIXME The RWStore has different semantics for this field. Document those
     * semantics and modify {@link AbstractJournal} so we can directly decide
     * how many bytes were "written" (for the WORM) or were "allocated" (for the
     * RWStore, in which case it should probably be the net of the bytes
     * allocated and released). Update all the locations in the code which rely
     * on {@link #getNextOffset()} to compute the #of bytes written onto the
     * store.
     */
    public long getNextOffset();

    /**
     * The database wide timestamp of first commit on the store -or- 0L if there
     * have been no commits.  In a local database, this timestamp is generated by
     * a local timestamp service.  In a distributed database, this timestamp
     * is generated by a shared timestamp service. The timestamps returned by
     * this method are strictly increasing for a given store and for a given
     * database.
     * 
     * @return The timestamp of the first commit on the store or 0L iff there
     *         have been no commits.
     */
    public long getFirstCommitTime();
    
    /**
     * The database wide timestamp of the most recent commit on the store or 0L
     * iff there have been no commits. In a local database, this timestamp is
     * generated by a local timestamp service. In a distributed database, this
     * timestamp is generated by a shared timestamp service. The timestamps
     * returned by this method are strictly increasing for a given store and for
     * a given database.
     * 
     * @return The timestamp of the most recent commit on the store or 0L iff
     *         there have been no commits.
     */
    public long getLastCommitTime();
    
    /**
     * The commit counter is a positive long integer that is strictly local to
     * the store. The commit counter is used to avoid problems with timestamps
     * generated by different machines or when time goes backwards or other
     * nasty stuff. The correct root block is chosen by selecting the valid
     * root block with the larger commit counter (the value of the commit counter
     * is reused by the {@link #getChallisField() Challis field}).
     * 
     * @return The commit counter.
     */
    public long getCommitCounter();
    
    /**
     * Return the address at which the {@link ICommitRecord} for this root block
     * is stored. The {@link ICommitRecord}s are stored separately from the
     * root block so that they may be indexed by the commit timestamps. This is
     * necessary in order to be able to quickly recover the root addresses for a
     * given commit timestamp, which is a featured used to support transactional
     * isolation.
     * 

     * Note: When a logical journal may overflow onto more than one physical
     * journal then the address of the {@link ICommitRecord} MAY refer to a
     * historical physical journal and care MUST be exercised to resolve the
     * address against the appropriate journal file. [This paragraph is probably
     * not valid.  Verify and remove if it is not true.]
     * 
     * @return The address at which the {@link ICommitRecord} for this root
     *         block is stored.
     */
    public long getCommitRecordAddr();

    /**
     * The address of the root of the {@link CommitRecordIndex}. The
     * {@link CommitRecordIndex} contains the ordered addresses of the
     * historical {@link ICommitRecord}s on the {@link Journal}. The address
     * of the {@link CommitRecordIndex} is stored directly in the root block
     * rather than the {@link ICommitRecord} since we can not obtain this
     * address until after we have formatted and written the
     * {@link ICommitRecord}.
     */
    public long getCommitRecordIndexAddr();
    
    /**
     * The unique journal identifier
     */
    public UUID getUUID();

    /*
     * @todo Consider putting the logical service UUID into the root blocks. It
     * is already in the service Entry[] (and the file system path) for
     * scale-out.
     */
//    /**
//     * The unique identifier for the logical service to which this journal
//     * belongs. All physical services for the same logical service will have the
//     * same logical service {@link UUID}. The logical service {@link UUID} is
//     * generated when the quorum leader creates the initial journal for a
//     * service and is written into the root blocks. From the root blocks it is
//     * replicated to the {@link Quorum} followers.
//     * 

//     * Note: The physical service UUID is NOT stored in the root blocks since
//     * that would make the root blocks incompatible when they are replicated to
//     * other nodes in the same logical service and high availability maintains
//     * binary compatibility when replicating a journal.
//     */
//    public UUID getLogicalServiceUUID();
    
    /**
     * The #of bits in a 64-bit long integer address that are dedicated to the
     * byte offset into the store.
     * 
     * @see WormAddressManager
     */
    public int getOffsetBits();
    
    /**
     * The timestamp assigned as the creation time for the journal.
     */
    public long getCreateTime();

    /**
     * The timestamp assigned as the time at which writes were disallowed for
     * the journal.
     */
    public long getCloseTime();

    /**
     * A byte value which specifies whether the backing store is a journal
     * (log-structured store or WORM) or a read-write store. Only two values are
     * defined at present. ZERO (0) is a WORM; ONE (1) is a read/write store.
     */
    public StoreTypeEnum getStoreType();

    /**
     * For the {@link StoreTypeEnum#RW} store, where we will read the metadata
     * bits from. When we start the store up we need to retrieve the metabits
     * from this address. This is a byte offset into the file and is stored as a
     * long integer. Normal addresses are calculated with reference to the
     * allocation blocks. The value for a WORM store is ZERO (0).
     */
    public long getMetaBitsAddr();

    /**
     * For the {@link StoreTypeEnum#RW} store, the start of the area of the file
     * where the allocation blocks are allocated. This is also a byte offset
     * into the file and is stored as a 64-bit integer. It is called
     * metaStartAddr because that is the offset that is used with the
     * metaBitsAddr to determine how to find the allocation blocks. The value
     * for a WORM store is ZERO (0).
     */
    public long getMetaStartAddr();

    /**
     * The {@link Quorum} token associated with this commit point or
     * {@link Quorum#NO_QUORUM} if there was no quorum.
     * 
     * Note: If commit points are part of the resynchronization protocol, they
     * MUST NOT use the current quorum token unless the service is synchronized
     * with the quorum at that commit point.
     */
    public long getQuorumToken();

    /**
     * Return the #of {@link WriteCache} blocks that have been written out as
     * part of the current write set. This value is origin ZERO (0) and is reset
     * to ZERO (0) after each commit or abort.
     */
    public long getBlockSequence();

    /**
     * The value used for {@link #getBlockSequence()} for both historical stores
     * and for stores that do not support this concept.
     */
    long NO_BLOCK_SEQUENCE = 0L;
    
    /**
     * A read-only buffer whose contents are the root block. The position,
     * limit, and mark will be independent for each {@link ByteBuffer} that is
     * returned by this method.
     */
    public ByteBuffer asReadOnlyBuffer();

    /**
     * Return a version of the caller's root block that is flagged as either
     * rootBlock0 or rootBlock1 as indicated by the argument. Root blocks are
     * immutable, so this will either return your argument or return a new
     * {@link IRootBlockView} in which the sense of the {@link #isRootBlock0()}
     * flag is now correct.
     * 
     * @param rootBlock0
     *            Whether you want rootBlock0 or rootBlock1.
     * 
     * @return The root block.
     */
    public IRootBlockView asRootBlock(final boolean rootBlock0);

}