All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.HdrHistogram.package-info Maven / Gradle / Ivy

The newest version!
/*
 * package-info.java
 * Written by Gil Tene of Azul Systems, and released to the public domain,
 * as explained at http://creativecommons.org/publicdomain/zero/1.0/
 */

/**
 * 

A High Dynamic Range (HDR) Histogram Package

*

* An HdrHistogram histogram supports the recording and analyzing sampled data value counts across a configurable * integer value range with configurable value precision within the range. Value precision is expressed as the number * of significant digits in the value recording, and provides control over value quantization behavior across the * value range and the subsequent value resolution at any given level. *

*

* In contrast to traditional histograms that use linear, logarithmic, or arbitrary sized bins or buckets, * HdrHistograms use a fixed storage internal data representation that simultaneously supports an arbitrarily high * dynamic range and arbitrary precision throughout that dynamic range. This capability makes HdrHistograms extremely * useful for tracking and reporting on the distribution of percentile values with high resolution and across a wide * dynamic range -- a common need in latency behavior characterization. *

*

* The HdrHistogram package was specifically designed with latency and performance sensitive applications in mind. * Experimental u-benchmark measurements show value recording times as low as 3-6 nanoseconds on modern * (circa 2012) Intel CPUs. All Histogram variants can maintain a fixed cost in both space and time. When not * configured to auto-resize, a Histogram's memory footprint is constant, with no allocation operations involved in * recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data * value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in * recording a sample is constant, and directly computes storage index locations such that no iteration or searching * is ever involved in recording data values. *

* NOTE: Histograms can optionally be configured to auto-resize their dynamic range as a convenience feature. * When configured to auto-resize, recording operations that need to expand a histogram will auto-resize its * dynamic range to include recorded values as they are encountered. Note that recording calls that cause * auto-resizing may take longer to execute, and that resizing incurs allocation and copying of internal data * structures. *

*

* The combination of high dynamic range and precision is useful for collection and accurate post-recording * analysis of sampled value data distribution in various forms. Whether it's calculating or * plotting arbitrary percentiles, iterating through and summarizing values in various ways, or deriving mean and * standard deviation values, the fact that the recorded value count information is kept in high * resolution allows for accurate post-recording analysis with low [and ultimately configurable] loss in * accuracy when compared to performing the same analysis directly on the potentially infinite series of sourced * data values samples. *

*

* An HdrHistogram histogram is usually configured to maintain value count data with a resolution good enough * to support a desired precision in post-recording analysis and reporting on the collected data. Analysis can include * the computation and reporting of distribution by percentiles, linear or logarithmic arbitrary value buckets, mean * and standard deviation, as well as any other computations that can supported using the various iteration techniques * available on the collected value count data. In practice, a precision levels of 2 or 3 decimal points are most * commonly used, as they maintain a value accuracy of +/- ~1% or +/- ~0.1% respectively for derived distribution * statistics. *

*

* A good example of HdrHistogram use would be tracking of latencies across a wide dynamic range. E.g. from a * microsecond to an hour. A Histogram can be configured to track and later report on the counts of observed integer * usec-unit latency values between 0 and 3,600,000,000 while maintaining a value precision of 3 significant digits * across that range. Such an example Histogram would simply be created with a * highestTrackableValue of 3,600,000,000, and a * numberOfSignificantValueDigits of 3, and would occupy a fixed, unchanging memory footprint * of around 185KB (see "Footprint estimation" below). *
* Code for this use example would include these basic elements: *
*

 * 
 * {@link org.HdrHistogram.Histogram} histogram = new {@link org.HdrHistogram.Histogram}(3600000000L, 3);
 * .
 * .
 * .
 * // Repeatedly record measured latencies:
 * histogram.{@link org.HdrHistogram.AbstractHistogram#recordValue(long) recordValue}(latency);
 * .
 * .
 * .
 * // Report histogram percentiles, expressed in msec units:
 * histogram.{@link org.HdrHistogram.AbstractHistogram#outputPercentileDistribution(java.io.PrintStream, Double) outputPercentileDistribution}(histogramLog, 1000.0)};
 * 
 * 
* Specifying 3 decimal points of precision in this example guarantees that value quantization within the value range * will be no larger than 1/1,000th (or 0.1%) of any recorded value. This example Histogram can be therefor used to * track, analyze and report the counts of observed latencies ranging between 1 microsecond and 1 hour in magnitude, * while maintaining a value resolution 1 microsecond (or better) up to 1 millisecond, a resolution of 1 millisecond * (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At it's maximum tracked * value (1 hour), it would still maintain a resolution of 3.6 seconds (or better). *

Histogram variants and internal representation

* The HdrHistogram package includes multiple implementations of the {@link org.HdrHistogram.AbstractHistogram} class: *
    *
  • {@link org.HdrHistogram.Histogram}, which is the commonly used Histogram form and tracks value counts * in long fields.
  • *
  • {@link org.HdrHistogram.IntCountsHistogram} and {@link org.HdrHistogram.ShortCountsHistogram}, which track value counts * in int and * short fields respectively, are provided for use cases where smaller count ranges are practical * and smaller overall storage is beneficial (e.g. systems where tens of thousands of in-memory histogram are * being tracked).
  • *
  • {@link org.HdrHistogram.AtomicHistogram}, {@link org.HdrHistogram.ConcurrentHistogram} * and {@link org.HdrHistogram.SynchronizedHistogram}
  • *
*

* Internally, data in HdrHistogram variants is maintained using a concept somewhat similar to that of floating * point number representation: Using a an exponent a (non-normalized) mantissa to * support a wide dynamic range at a high but varying (by exponent value) resolution. * AbstractHistogram uses exponentially increasing bucket value ranges (the parallel of * the exponent portion of a floating point number) with each bucket containing * a fixed number (per bucket) set of linear sub-buckets (the parallel of a non-normalized mantissa portion * of a floating point number). * Both dynamic range and resolution are configurable, with highestTrackableValue * controlling dynamic range, and numberOfSignificantValueDigits controlling * resolution. *

*

Synchronization and concurrent access

* In the interest of keeping value recording cost to a minimum, the commonly used {@link org.HdrHistogram.Histogram} * class and it's {@link org.HdrHistogram.IntCountsHistogram} and {@link org.HdrHistogram.ShortCountsHistogram} * variants are NOT internally synchronized, and do NOT use atomic variables. Callers wishing to make potentially * concurrent, multi-threaded updates or queries against Histogram objects should either take care to externally * synchronize and/or order their access, or use the {@link org.HdrHistogram.ConcurrentHistogram}, * {@link org.HdrHistogram.AtomicHistogram}, or {@link org.HdrHistogram.SynchronizedHistogram} variants. *

* A common pattern seen in histogram value recording involves recording values in a critical path (multi-threaded * or not), coupled with a non-critical path reading the recorded data for summary/reporting purposes. When such * continuous non-blocking recording operation (concurrent or not) is desired even when sampling, analyzing, or * reporting operations are needed, consider using the {@link org.HdrHistogram.Recorder} and * {@link org.HdrHistogram.SingleWriterRecorder} variants that were specifically designed for that purpose. * Recorders provide a recording API similar to Histogram, and internally maintain and coordinate active/inactive * histograms such that recording remains wait-free in the presense of accurate and stable interval sampling. *

*

* It is worth mentioning that since Histogram objects are additive, it is common practice to use per-thread * non-synchronized histograms or {@link org.HdrHistogram.SingleWriterRecorder}s, and using a summary/reporting * thread perform histogram aggregation math across time and/or threads. *

*

Iteration

* Histograms supports multiple convenient forms of iterating through the histogram data set, including linear, * logarithmic, and percentile iteration mechanisms, as well as means for iterating through each recorded value or * each possible value level. The iteration mechanisms all provide {@link org.HdrHistogram.HistogramIterationValue} * data points along the histogram's iterated data set, and are available via the following methods: *
    *
  • {@link org.HdrHistogram.AbstractHistogram#percentiles percentiles} : * An {@link java.lang.Iterable}{@literal <}{@link org.HdrHistogram.HistogramIterationValue}{@literal >} through the * histogram using a {@link org.HdrHistogram.PercentileIterator}
  • *
  • {@link org.HdrHistogram.AbstractHistogram#linearBucketValues linearBucketValues} : * An {@link java.lang.Iterable}{@literal <}{@link org.HdrHistogram.HistogramIterationValue}{@literal >} through * the histogram using a {@link org.HdrHistogram.LinearIterator}
  • *
  • {@link org.HdrHistogram.AbstractHistogram#logarithmicBucketValues logarithmicBucketValues} : * An {@link java.lang.Iterable}{@literal <}{@link org.HdrHistogram.HistogramIterationValue}{@literal >} * through the histogram using a {@link org.HdrHistogram.LogarithmicIterator}
  • *
  • {@link org.HdrHistogram.AbstractHistogram#recordedValues recordedValues} : * An {@link java.lang.Iterable}{@literal <}{@link org.HdrHistogram.HistogramIterationValue}{@literal >} through * the histogram using a {@link org.HdrHistogram.RecordedValuesIterator}
  • *
  • {@link org.HdrHistogram.AbstractHistogram#allValues allValues} : * An {@link java.lang.Iterable}{@literal <}{@link org.HdrHistogram.HistogramIterationValue}{@literal >} through * the histogram using a {@link org.HdrHistogram.AllValuesIterator}
  • *
*

* Iteration is typically done with a for-each loop statement. E.g.: *


 * for (HistogramIterationValue v : histogram.percentiles(percentileTicksPerHalfDistance)) {
 *     ...
 * }
 * 
* or *

 * for (HistogramIterationValue v : histogram.linearBucketValues(valueUnitsPerBucket)) {
 *     ...
 * }
 * 
 * 
* The iterators associated with each iteration method are resettable, such that a caller that would like to avoid * allocating a new iterator object for each iteration loop can re-use an iterator to repeatedly iterate through the * histogram. This iterator re-use usually takes the form of a traditional for loop using the Iterator's * hasNext() and next() methods: * * to avoid allocating a new iterator object for each iteration loop: *
*
 * 
 * PercentileIterator iter = histogram.percentiles().iterator(percentileTicksPerHalfDistance);
 * ...
 * iter.reset(percentileTicksPerHalfDistance);
 * for (iter.hasNext() {
 *     HistogramIterationValue v = iter.next();
 *     ...
 * }
 * 
 * 
*

Equivalent Values and value ranges

*

* Due to the finite (and configurable) resolution of the histogram, multiple adjacent integer data values can * be "equivalent". Two values are considered "equivalent" if samples recorded for both are always counted in a * common total count due to the histogram's resolution level. Histogram provides methods for determining the * lowest and highest equivalent values for any given value, as we as determining whether two values are equivalent, * and for finding the next non-equivalent value for a given value (useful when looping through values, in order * to avoid double-counting count). *

*

Raw vs. corrected recording

*

* Regular, raw value data recording into an HdrHistogram is achieved with the * {@link org.HdrHistogram.AbstractHistogram#recordValue(long) recordValue()} method. *

* Histogram variants also provide an auto-correcting * {@link org.HdrHistogram.AbstractHistogram#recordValueWithExpectedInterval(long, long) recordValueWithExpectedInterval()} * form in support of a common use case found when histogram values are used to track response time * distribution in the presence of Coordinated Omission - an extremely common phenomenon found in latency recording * systems. * This correcting form is useful in [e.g. load generator] scenarios where measured response times may exceed the * expected interval between issuing requests, leading to the "omission" of response time measurements that would * typically correlate with "bad" results. This coordinated (non random) omission of source data, if left uncorrected, * will then dramatically skew any overall latency stats computed on the recorded information, as the recorded data set * itself will be significantly skewed towards good results. *

*

* When a value recorded in the histogram exceeds the * expectedIntervalBetweenValueSamples parameter, recorded histogram data will * reflect an appropriate number of additional values, linearly decreasing in steps of * expectedIntervalBetweenValueSamples, down to the last value * that would still be higher than expectedIntervalBetweenValueSamples). *

*

* To illustrate why this corrective behavior is critically needed in order to accurately represent value * distribution when large value measurements may lead to missed samples, imagine a system for which response * times samples are taken once every 10 msec to characterize response time distribution. * The hypothetical system behaves "perfectly" for 100 seconds (10,000 recorded samples), with each sample * showing a 1msec response time value. At each sample for 100 seconds (10,000 logged samples * at 1msec each). The hypothetical system then encounters a 100 sec pause during which only a single sample is * recorded (with a 100 second value). * An normally recorded (uncorrected) data histogram collected for such a hypothetical system (over the 200 second * scenario above) would show ~99.99% of results at 1msec or below, which is obviously "not right". In contrast, a * histogram that records the same data using the auto-correcting * {@link org.HdrHistogram.AbstractHistogram#recordValueWithExpectedInterval(long, long) recordValueWithExpectedInterval()} * method with the knowledge of an expectedIntervalBetweenValueSamples of 10msec will correctly represent the * real world response time distribution of this hypothetical system. Only ~50% of results will be at 1msec or below, * with the remaining 50% coming from the auto-generated value records covering the missing increments spread between * 10msec and 100 sec. *

*

* Data sets recorded with and with * {@link org.HdrHistogram.AbstractHistogram#recordValue(long) recordValue()} * and with * {@link org.HdrHistogram.AbstractHistogram#recordValueWithExpectedInterval(long, long) recordValueWithExpectedInterval()} * will differ only if at least one value recorded was greater than it's * associated expectedIntervalBetweenValueSamples parameter. * Data sets recorded with * {@link org.HdrHistogram.AbstractHistogram#recordValueWithExpectedInterval(long, long) recordValueWithExpectedInterval()} * parameter will be identical to ones recorded with * {@link org.HdrHistogram.AbstractHistogram#recordValue(long) recordValue()} * it if all values recorded via the recordValue calls were smaller * than their associated expectedIntervalBetweenValueSamples parameters. *

*

* In addition to at-recording-time correction option, Histrogram variants also provide the post-recording correction * methods * {@link org.HdrHistogram.AbstractHistogram#copyCorrectedForCoordinatedOmission(long) copyCorrectedForCoordinatedOmission()} * and * {@link org.HdrHistogram.AbstractHistogram#addWhileCorrectingForCoordinatedOmission(AbstractHistogram, long) addWhileCorrectingForCoordinatedOmission()}. * These methods can be used for post-recording correction, and are useful when the * expectedIntervalBetweenValueSamples parameter is estimated to be the same for all recorded * values. However, for obvious reasons, it is important to note that only one correction method (during or post * recording) should be be used on a given histogram data set. *

*

* When used for response time characterization, the recording with the optional * expectedIntervalBetweenValueSamples parameter will tend to produce data sets that would * much more accurately reflect the response time distribution that a random, uncoordinated request would have * experienced. *

*

Floating point values and DoubleHistogram variants

* The above discussion relates to integer value histograms (the various subclasses of * {@link org.HdrHistogram.AbstractHistogram} and their related supporting classes). HdrHistogram supports floating * point value recording and reporting with a similar set of classes, including the * {@link org.HdrHistogram.DoubleHistogram}, {@link org.HdrHistogram.ConcurrentDoubleHistogram} and * {@link org.HdrHistogram.SynchronizedDoubleHistogram} histogram classes. Support for floating point value * iteration is provided with {@link org.HdrHistogram.DoubleHistogramIterationValue} and related iterator classes ( * {@link org.HdrHistogram.DoubleLinearIterator}, {@link org.HdrHistogram.DoubleLogarithmicIterator}, * {@link org.HdrHistogram.DoublePercentileIterator}, {@link org.HdrHistogram.DoubleRecordedValuesIterator}, * {@link org.HdrHistogram.DoubleAllValuesIterator}). Support for interval recording is provided with * {@link org.HdrHistogram.DoubleRecorder} and * {@link org.HdrHistogram.SingleWriterDoubleRecorder}. *

Auto-ranging in floating point histograms

* Unlike integer value based histograms, the specific value range tracked by a {@link * org.HdrHistogram.DoubleHistogram} (and variants) is not specified upfront. Only the dynamic range of values * that the histogram can cover is (optionally) specified. E.g. When a {@link org.HdrHistogram.DoubleHistogram} * is created to track a dynamic range of 3600000000000 (enough to track values from a nanosecond to an hour), * values could be recorded into into it in any consistent unit of time as long as the ratio between the highest * and lowest non-zero values stays within the specified dynamic range, so recording in units of nanoseconds * (1.0 thru 3600000000000.0), milliseconds (0.000001 thru 3600000.0) seconds (0.000000001 thru 3600.0), hours * (1/3.6E12 thru 1.0) will all work just as well. *

Footprint estimation

* Due to it's dynamic range representation, Histogram is relatively efficient in memory space requirements given * the accuracy and dynamic range it covers. Still, it is useful to be able to estimate the memory footprint involved * for a given highestTrackableValue and numberOfSignificantValueDigits * combination. Beyond a relatively small fixed-size footprint used for internal fields and stats (which can be * estimated as "fixed at well less than 1KB"), the bulk of a Histogram's storage is taken up by it's data value * recording counts array. The total footprint can be conservatively estimated by: *

 *     largestValueWithSingleUnitResolution = 2 * (10 ^ numberOfSignificantValueDigits);
 *     subBucketSize = roundedUpToNearestPowerOf2(largestValueWithSingleUnitResolution);

 *     expectedHistogramFootprintInBytes = 512 +
 *          ({primitive type size} / 2) *
 *          (log2RoundedUp((highestTrackableValue) / subBucketSize) + 2) *
 *          subBucketSize
 *
 * 
* A conservative (high) estimate of a Histogram's footprint in bytes is available via the * {@link org.HdrHistogram.AbstractHistogram#getEstimatedFootprintInBytes() getEstimatedFootprintInBytes()} method. */ package org.HdrHistogram;




© 2015 - 2025 Weber Informatics LLC | Privacy Policy