com.google.common.hash.HashFunction Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of aem-sdk-api Show documentation
The Adobe Experience Manager SDK
The newest version!
/*
 * Copyright (C) 2011 The Guava Authors
 *
 * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
 * in compliance with the License. You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software distributed under the License
 * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
 * or implied. See the License for the specific language governing permissions and limitations under
 * the License.
 */
package com.google.common.hash;

import com.google.common.annotations.Beta;
import com.google.common.primitives.Ints;
import java.nio.charset.Charset;

/**
 *  A hash function is a collision-averse pure function that maps an arbitrary block of
 *  data to a number called a hash code.
 *
 *  Definition
 *
 *  Unpacking this definition:
 *
 *  

 *  block of data: the input for a hash function is always, in concept, an
 *      ordered byte array. This hashing API accepts an arbitrary sequence of byte and
 *      multibyte values (via {@link Hasher}), but this is merely a convenience; these are
 *      always translated into raw byte sequences under the covers.
 *
 *  
hash code: each hash function always yields hash codes of the same fixed bit
 *      length (given by {@link #bits}). For example, {@link Hashing#sha1} produces a
 *      160-bit number, while {@link Hashing#murmur3_32()} yields only 32 bits. Because a
 *      {@code long} value is clearly insufficient to hold all hash code values, this API
 *      represents a hash code as an instance of {@link HashCode}.
 *
 *  
pure function: the value produced must depend only on the input bytes, in
 *      the order they appear. Input data is never modified. {@link HashFunction} instances
 *      should always be stateless, and therefore thread-safe.
 *
 *  
collision-averse: while it can't be helped that a hash function will
 *      sometimes produce the same hash code for distinct inputs (a "collision"), every
 *      hash function strives to some degree to make this unlikely. (Without this
 *      condition, a function that always returns zero could be called a hash function. It
 *      is not.)
 *  
 *
 *  Summarizing the last two points: "equal yield equal always; unequal yield
 *  unequal often." This is the most important characteristic of all hash functions.
 *
 *  
Desirable properties
 *
 *  A high-quality hash function strives for some subset of the following virtues:
 *
 *  

 *  collision-resistant: while the definition above requires making at least
 *      some token attempt, one measure of the quality of a hash function is how
 *      well it succeeds at this goal. Important note: it may be easy to achieve the
 *      theoretical minimum collision rate when using completely random sample
 *      input. The true test of a hash function is how it performs on representative
 *      real-world data, which tends to contain many hidden patterns and clumps. The goal
 *      of a good hash function is to stamp these patterns out as thoroughly as possible.
 *
 *  
bit-dispersing: masking out any single bit from a hash code should
 *      yield only the expected twofold increase to all collision rates. Informally,
 *      the "information" in the hash code should be as evenly "spread out" through the
 *      hash code's bits as possible. The result is that, for example, when choosing a
 *      bucket in a hash table of size 2^8, any eight bits could be consistently
 *      used.
 *
 *  
cryptographic: certain hash functions such as {@link Hashing#sha512} are
 *      designed to make it as infeasible as possible to reverse-engineer the input that
 *      produced a given hash code, or even to discover any two distinct inputs that
 *      yield the same result. These are called cryptographic hash functions. But,
 *      whenever it is learned that either of these feats has become computationally
 *      feasible, the function is deemed "broken" and should no longer be used for secure
 *      purposes. (This is the likely eventual fate of all cryptographic hashes.)
 *
 *  
fast: perhaps self-explanatory, but often the most important consideration.
 *      We have published microbenchmark results for many
 *      common hash functions.
 *  
 *
 *  Providing input to a hash function
 *
 *  The primary way to provide the data that your hash function should act on is via a
 *  {@link Hasher}. Obtain a new hasher from the hash function using {@link #newHasher},
 *  "push" the relevant data into it using methods like {@link Hasher#putBytes(byte[])},
 *  and finally ask for the {@code HashCode} when finished using {@link Hasher#hash}. (See
 *  an {@linkplain #newHasher example} of this.)
 *
 *  
If all you want to hash is a single byte array, string or {@code long} value, there
 *  are convenient shortcut methods defined directly on {@link HashFunction} to make this
 *  easier.
 *
 *  
Hasher accepts primitive data types, but can also accept any Object of type {@code
 *  T} provided that you implement a {@link Funnel Funnel} to specify how to "feed" data
 *  from that object into the function. (See {@linkplain Hasher#putObject an example} of
 *  this.)
 *
 *  
Compatibility note: Throughout this API, multibyte values are always
 *  interpreted in little-endian order. That is, hashing the byte array {@code
 *  {0x01, 0x02, 0x03, 0x04}} is equivalent to hashing the {@code int} value {@code
 *  0x04030201}. If this isn't what you need, methods such as {@link Integer#reverseBytes}
 *  and {@link Ints#toByteArray} will help.
 *
 *  
Relationship to {@link Object#hashCode}
 *
 *  Java's baked-in concept of hash codes is constrained to 32 bits, and provides no
 *  separation between hash algorithms and the data they act on, so alternate hash
 *  algorithms can't be easily substituted. Also, implementations of {@code hashCode} tend
 *  to be poor-quality, in part because they end up depending on other existing
 *  poor-quality {@code hashCode} implementations, including those in many JDK classes.
 *
 *  
{@code Object.hashCode} implementations tend to be very fast, but have weak
 *  collision prevention and no expectation of bit dispersion. This leaves them
 *  perfectly suitable for use in hash tables, because extra collisions cause only a slight
 *  performance hit, while poor bit dispersion is easily corrected using a secondary hash
 *  function (which all reasonable hash table implementations in Java use). For the many
 *  uses of hash functions beyond data structures, however, {@code Object.hashCode} almost
 *  always falls short -- hence this library.
 *
 *  @author Kevin Bourrillion
 *  @since 11.0
 *
 * @deprecated The Google Guava Core Libraries are deprecated and will not be part of the AEM SDK after April 2023
 */
@Beta
@Deprecated(since = "2022-12-01")
public interface HashFunction {

    /**
     * Begins a new hash code computation by returning an initialized, stateful {@code
     * Hasher} instance that is ready to receive data. Example: 
   {@code
     *
     *   HashFunction hf = Hashing.md5();
     *   HashCode hc = hf.newHasher()
     *       .putLong(id)
     *       .putBoolean(isActive)
     *       .hash();}
     */
    Hasher newHasher();

    /**
     * Begins a new hash code computation as {@link #newHasher()}, but provides a hint of the
     * expected size of the input (in bytes). This is only important for non-streaming hash
     * functions (hash functions that need to buffer their whole input before processing any
     * of it).
     */
    Hasher newHasher(int expectedInputSize);

    /**
     * Shortcut for {@code newHasher().putInt(input).hash()}; returns the hash code for the given
     * {@code int} value, interpreted in little-endian byte order. The implementation might
     * perform better than its longhand equivalent, but should not perform worse.
     *
     * @since 12.0
     */
    HashCode hashInt(int input);

    /**
     * Shortcut for {@code newHasher().putLong(input).hash()}; returns the hash code for the
     * given {@code long} value, interpreted in little-endian byte order. The implementation
     * might perform better than its longhand equivalent, but should not perform worse.
     */
    HashCode hashLong(long input);

    /**
     * Shortcut for {@code newHasher().putBytes(input).hash()}. The implementation
     * might perform better than its longhand equivalent, but should not perform
     * worse.
     */
    HashCode hashBytes(byte[] input);

    /**
     * Shortcut for {@code newHasher().putBytes(input, off, len).hash()}. The implementation
     * might perform better than its longhand equivalent, but should not perform
     * worse.
     *
     * @throws IndexOutOfBoundsException if {@code off < 0} or {@code off + len > bytes.length}
     *   or {@code len < 0}
     */
    HashCode hashBytes(byte[] input, int off, int len);

    /**
     * Shortcut for {@code newHasher().putUnencodedChars(input).hash()}. The implementation
     * might perform better than its longhand equivalent, but should not perform worse.
     * Note that no character encoding is performed; the low byte and high byte of each {@code char}
     * are hashed directly (in that order).
     *
     * @since 15.0 (since 11.0 as hashString(CharSequence)).
     */
    HashCode hashUnencodedChars(CharSequence input);

    /**
     * Shortcut for {@code newHasher().putUnencodedChars(input).hash()}. The implementation
     * might perform better than its longhand equivalent, but should not perform worse.
     * Note that no character encoding is performed; the low byte and high byte of each {@code char}
     * are hashed directly (in that order).
     *
     * @deprecated Use {@link HashFunction#hashUnencodedChars} instead. This method is scheduled for
     *     removal in Guava 16.0.
     */
    @Deprecated
    HashCode hashString(CharSequence input);

    /**
     * Shortcut for {@code newHasher().putString(input, charset).hash()}. Characters are encoded
     * using the given {@link Charset}. The implementation might perform better than its
     * longhand equivalent, but should not perform worse.
     */
    HashCode hashString(CharSequence input, Charset charset);

    /**
     * Shortcut for {@code newHasher().putObject(instance, funnel).hash()}. The implementation
     * might perform better than its longhand equivalent, but should not perform worse.
     *
     * @since 14.0
     */
     HashCode hashObject(T instance, Funnel funnel);

    /**
     * Returns the number of bits (a multiple of 32) that each hash code produced by this
     * hash function has.
     */
    int bits();
}