net.algart.matrices.spectra.SeparableFastHartleyTransform Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of algart Show documentation
Open-source Java libraries, supporting generalized smart arrays and matrices with elements of any types, including a wide set of 2D-, 3D- and multidimensional image processing and other algorithms, working with arrays and matrices.
There is a newer version: 1.4.23
Show newest version
/*
 * The MIT License (MIT)
 *
 * Copyright (c) 2007-2024 Daniel Alievsky, AlgART Laboratory (http://algart.net)
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in all
 * copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 * SOFTWARE.
 */

package net.algart.matrices.spectra;

import net.algart.arrays.*;

import java.util.Objects;

/**
 * Fast Hartley transform (FHT)
 * (in multidimensional case — the separable fast Hartley transform).
 * This class implements standard one-dimensional FHT algorithm over an abstract {@link SampleArray}.
 * It is generalized to multidimensional case by the simplest way, implemented in
 * {@link AbstractSpectralTransform} class (applying the transform separably to each dimension);
 * the resulting transformation for 2- or multidimensional {@link Matrix AlgART numeric matrices} is usually called
 * separable fast Hartley transform (SFHT).
 * The samples, processed by this class, can be both real or complex ({@link #areComplexSamplesRequired()} method
 * returns false). This class is especially useful in a case of real samples. In this case
 * it is performed faster than the classic FFT, {@link FastFourierTransform} class, because
 * there are no needs to allocate and process arrays of imaginary parts.
 * The simple relation between Hartley and Fourier transform (see below) allows to use this transform
 * almost in all areas where Fourier transform is applicable.
 *
 * More precisely, this class implements the classic fast "butterfly" algorithm (FHT) for calculating
 * discrete Hartley transform (DHT), described at
 * http://en.wikipedia.org/wiki/Discrete_Hartley_transform.
 *
 * Namely, let x₀,x₁,...,x_N−1 are some
 * real or complex samples (represented by abstract {@link SampleArray}), and
 * H₀,H₁,...,H_N−1 are their Hartley spectrum:
 * the result of DHT. Let's designate cas θ = cos θ + sin θ.
 * This class implements two possible definitions of DHT:
 *
 * 
 * direct transform is
 * H_k =
 * ∑_(0≤n<N)
 * x_n cas(2knπ/N),
 * inverse transform is
 * x_n = N⁻¹
 * ∑_(0≤k<N)
 * H_k cas(2knπ/N).
 * 
 *
 * direct transform is
 * H_k = N⁻¹
 * ∑_(0≤n<N)
 * x_n cas(2knπ/N),
 * inverse transform is
 * x_n =
 * ∑_(0≤k<N)
 * H_k cas(2knπ/N).
 * 
 * 
 *
 * The only difference is when to normalize the result: while inverse transform (case 1) or direct transform
 * (case 2). The Wikipedia offers formulas of the 1st case. This class allows to calculate both variants:
 * the 1st case is chosen if the normalizeDirectTransform argument
 * of the constructors is false
 * or if this class is created by a constructor without this argument (it is the default behavior),
 * the 2nd case is chosen  if the normalizeDirectTransform
 * argument of the constructors is true.
 *
 * The very useful feature of DHT is that for real samples x_k the Hartley spectrum
 * H_k is also real — unlike DFT, when even real samples lead to complex spectrum.
 * As a result, the transformation algorithms in this class can process real arrays and matrices,
 * without imaginary parts. In this case, they work in two and even more times faster than FFT algorithms,
 * implemented in {@link FastFourierTransform}, and do not require allocating additional memory
 * for storing imaginary parts of the complex numbers.
 *
 * The formulas above correspond to one-dimensional transforms and specify the results of
 * {@link #directTransform directTransform} / {@link #inverseTransform inverseTransform} methods.
 * They are generalized to multidimensional case by default algorithms, implemented in
 * {@link AbstractSpectralTransform} class, i.e. by applying the transform separably to each dimension.
 * It leads to so-called multidimensional separable discrete Hartley transformations (SDHT). Below are
 * the formulas for 2-dimensional separable discrete Hartley transformation of the matrix x_ij
 * (0≤i<M, 0≤j<N) for the case 1 (normalizing the inverse transform):
 *
 * 
 * direct:
 * H_ij =
 * ∑_(0≤m<M)
 * ∑_(0≤n<N)
 * x_mn cas(2imπ/M) cas(2jnπ/N),

 * inverse:
 * x_mn = (MN)⁻¹
 * ∑_(0≤i<M)
 * ∑_(0≤j<N)
 * x_ij cas(2imπ/M) cas(2jnπ/N).
 * 
 *
 * There is the simple relation between classic DFT (discrete Fourier transform) and
 * SDHT (separable discrete Hartley transform).
 *
 * Let's consider one-dimensional case (usual DHT).
 * Let x₀,x₁,...,x_N−1 are some
 * real or complex samples,
 * F₀,F₁,...,F_N−1 are their Fourier spectrum and
 * H₀,H₁,...,H_N−1 are their Hartley spectrum.
 * Let i is the usual imaginary unit.
 * For simplicity, let's consider that F_−k=F_N−k,
 * H_−k=H_N−k, k=1,2,... Then:
 *
 * 
 * F_k = (H_k+H_−k)/2
 * − i (H_k−H_−k)/2,

 * H_k = (F_k+F_−k)/2
 * + i (F_k−F_−k)/2,

 * in a case of real samples: H_k =
 * Re F_k − Im F_k
 * 
 *
 * (of course, we consider the same definition, 1 or 2, for both {@link FastFourierTransform DFT} and
 * {@link SeparableFastHartleyTransform SDHT} spectra).
 *
 * In 2-dimensional case, the relation between DFT and SDHT is the following (we similarly suppose that
 * F_−i, j=F_M−i, j,
 * F_i,−j=F_i, N−j,
 * H_−i, j=H_M−i, j,
 * H_i,−j=H_i, N−j):
 *
 * 
 * F_i, j = (H_i,−j+H_−i, j)/2
 * − i (H_i, j−H_−i,−j)/2,

 * H_i, j = (F_i,−j+F_−i, j)/2
 * + i (F_i, j−F_−i,−j)/2,

 * in a case of real samples: H_i, j =
 * Re F_i,−j − Im F_i, j.
 * 
 *
 * In the common n-dimensional case, there are similar formulas, which express
 * F_i, j,...,k through a linear combination of 2ⁿ numbers
 * H_{± i,± j,...,± k} and, vice versa, express
 * H_i, j,...,k through a linear combination of 2ⁿ numbers
 * F_{± i,± j,...,± k}.
 *
 * This class contains the ready for use methods, allowing to convert n-dimensional
 * separable Hartley spectrum to Fourier one and vice versa, n=1,2,3,...:
 *
 * 
 * {@link #separableHartleyToFourier(ArrayContext context, Matrix fRe, Matrix fIm, Matrix h)}
 * converts (real) Hartley spectrum of the real matrix to its Fourier spectrum;
 * {@link #separableHartleyToFourier(ArrayContext context, Matrix fRe, Matrix fIm, Matrix hRe, Matrix hIm)}
 * converts (complex) Hartley spectrum of the complex matrix to its Fourier spectrum;
 * {@link #fourierToSeparableHartley(ArrayContext context, Matrix h, Matrix fRe, Matrix fIm)}
 * converts Fourer spectrum of the real matrix to its (real) Hartley spectrum;
 * {@link #fourierToSeparableHartley(ArrayContext context, Matrix hRe, Matrix hIm, Matrix fRe, Matrix fIm)}
 * converts Fourer spectrum of the complex matrix to its (complex) Hartley spectrum.
 * 
 *
 * If it is necessary to get the Fourier spectrum of some real matrix, probably process it and transform
 * the Fourier spectrum back to the real matrix, you can use a combination of SHFT,
 * provided by this class, and the conversion methods listed above (cases of real matrices).
 * But if all that you need is to calculate a convolution of two real matrices, there is a better way: see below.
 *
 * One-dimensional Hartley transform, defined by the formulas 1 and 2 above, complies with the
 * convolution theorem. Namely, let
 * p₀,p₁,...,p_N−1 is the first complex or real
 * numeric function and q₀,q₁,...,q_N−1 is
 * the second complex or real function, and
 * c₀,c₁,...,c_N−1 is
 * their (complex or real) convolution, defined as:
 *
 * 
 * c_k =
 * ∑_(0≤n<N)
 * p_nq_k−n
 * 
 *
 * (here and below we consider that Z_−k=Z_N−k for all samples
 * and spectra).
 * Also, let P₀,P₁,...,P_N−1,
 * Q₀,Q₁,...,Q_N−1 and
 * C₀,C₁,...,C_N−1 are
 * Hartley spectra of these functions. Then:
 *
 * 

 * C_k =
 * (P_kQ_−k+P_−kQ_k)/2 +
 * (P_kQ_k−P_−kQ_−k)/2,
 * if the spectra were calculated according formula 1 above (default method);
 *
 * C_k = N
 * ((P_kQ_−k+P_−kQ_k)/2 +
 * (P_kQ_k−P_−kQ_−k)/2),
 * if the spectra were calculated according formula 2 above.
 * 
 *
 * There are similar formulas in the common n-dimensional case, allowing to express the separable
 * Hartley spectrum of the convolution of two n-dimensional matrices via the spectra of the source matrices.
 * In particular, in the 2-dimensional case:
 *
 * 
 * C_i, j =
 * ((P_i, j+P_−i,−j)
 * (Q_i, j+Q_−i,−j) −
 * (P_i,−j−P_−i, j)
 * (Q_i,−j−Q_−i, j) +
 * (P_i, j−P_−i,−j)
 * (Q_i,−j+Q_−i, j) +
 * (P_i,−j+P_−i, j)
 * (Q_i, j−Q_−i,−j))/4,
 * if the spectra were calculated according formula 1 above (default method);
 *
 * C_i, j = the same expression multiplied by MN
 * (M and N are the dimensions of the matrices),
 * if the spectra were calculated according formula 2 above.
 * 
 *
 * This class contains the ready for use methods, allowing to calculate a spectrum of convolution C
 * on the base of the given spectra P and Q of two source numeric matrices x and y
 * according the formulas A, C and their generalization for any number of dimensions:
 *
 * 
 * {@link
 * #spectrumOfConvolution(ArrayContext context, Matrix cRe, Matrix cIm, Matrix pRe, Matrix pRe, Matrix qRe, Matrix qRe)}
 * calculates separable Hartley spectrum of the convolution C on the base of separable Hartley spectra
 * P and Q of two source complex matrices;
 * {@link #spectrumOfConvolution(ArrayContext context, Matrix c, Matrix p, Matrix q)} calculates
 * separable Hartley spectrum of the convolution C on the base of separable Hartley spectra
 * P and Q of two source real matrices.
 * 
 *
 * So, if you need to calculate a convolution of some real matrices, for example, for goals of linear filtering,
 * you can use the SFHT transform and the
 * {@link #spectrumOfConvolution(ArrayContext context, Matrix c, Matrix p, Matrix q) spectrumOfConvolution}
 * method, provided by this class: it is much better idea than using {@link FastFourierTransform} class.
 *
 * Please note: in the one-dimensional case, the spectral transofmation algorithms, implemented by
 * {@link #directTransformMatrix directTransformMatrix} / {@link #inverseTransformMatrix inverseTransformMatrix}
 * methods of this class, work with normal (i.e. high) performance only if
 * the passed one-dimensional AlgART matrices are stored in {@link SimpleMemoryModel}
 * (more precisely, if they are {@link DirectAccessible directly accessible}).
 * In another case, each access to every sample leads to calling accessing methods
 * {@link PArray#getDouble(long) getDouble} and {@link UpdatablePArray#setDouble(long, double) setDouble},
 * which can work slowly in non-simple memory models like {@link LargeMemoryModel}. There is the same problem for
 * {@link #directTransform directTransform} / {@link #inverseTransform inverseTransform} methods, if the passed
 * sample arrays are created via {@link RealScalarSampleArray#asSampleArray RealScalarSampleArray.asSampleArray}
 * or {@link ComplexScalarSampleArray#asSampleArray ComplexScalarSampleArray.asSampleArray} methods on the base of
 * updatable AlgART arrays, created by memory model other than {@link SimpleMemoryModel}.
 *
 * For n-dimensional matrices (n≥2), this problem usually does not occur at all, even for non-simple
 * memory models, if you use standard implementations of
 * {@link #directTransformMatrix directTransformMatrix} / {@link #inverseTransformMatrix inverseTransformMatrix}
 * from {@link AbstractSpectralTransform} class: these implementations automatically download necessary parts
 * of the matrix into {@link SimpleMemoryModel}. This problem also does not occur while using
 * conversion methods {@link #separableHartleyToFourier(ArrayContext, Matrix, Matrix, Matrix)},
 * {@link #separableHartleyToFourier(ArrayContext, Matrix, Matrix, Matrix, Matrix)},
 * {@link #fourierToSeparableHartley(ArrayContext, Matrix, Matrix, Matrix)},
 * {@link #fourierToSeparableHartley(ArrayContext, Matrix, Matrix, Matrix, Matrix)} and
 * methods of calculation of the spectrum of convolution
 * {@link #spectrumOfConvolution(ArrayContext, Matrix, Matrix, Matrix)} and
 * {@link #spectrumOfConvolution(ArrayContext, Matrix, Matrix, Matrix, Matrix, Matrix, Matrix)},
 * if all processed matrices have the same float or double element types.
 */
public class SeparableFastHartleyTransform extends AbstractSpectralTransform implements SpectralTransform {

    final boolean normalizeDirectTransform;

    /**
     * Creates a new instance of this class, performing separable Hartley transform according to the formula 1 from
     * the {@link SeparableFastHartleyTransform comments to this class}.
     * Equivalent to {@link #SeparableFastHartleyTransform(boolean) SeparableFastHartleyTransform(false)}.
     *
     * @see #SeparableFastHartleyTransform(long)
     */
    public SeparableFastHartleyTransform() {
        this.normalizeDirectTransform = FastFourierTransform.DEFAULT_NORMALIZE_DIRECT_TRANSFORM;
    }

    /**
     * Creates a new instance of this class, performing separable Hartley transform according to the formula 1 from
     * the {@link SeparableFastHartleyTransform comments to this class}.
     *
     * The maxTempJavaMemory argument specifies the amount of Java memory (heap),
     * that can be used by methods of this class for internal needs. It is passed to the corresponding
     * constructor of {@link AbstractSpectralTransform}: see
     * {@link AbstractSpectralTransform#AbstractSpectralTransform(long) comments to that constructor}.
     *
     * @param maxTempJavaMemory desired maximal amount of Java memory, in bytes, allowed for allocation
     *                          by methods of this class for internal needs.
     * @see #SeparableFastHartleyTransform()
     */
    public SeparableFastHartleyTransform(long maxTempJavaMemory) {
        super(maxTempJavaMemory);
        this.normalizeDirectTransform = FastFourierTransform.DEFAULT_NORMALIZE_DIRECT_TRANSFORM;
    }

    /**
     * Creates a new instance of this class, performing separable Hartley transform according either to the formula 1
     * from the {@link SeparableFastHartleyTransform comments to this class},
     * if normalizeDirectTransform argument is false,
     * or to the formula 2, if this argument is true.
     * The default value, used by the constructors without normalizeDirectTransform argument,
     * is false.
     *
     * 
Please note: the value of normalizeDirectTransform argument affects only the transformation
     * methods {@link #directTransform directTransform}, {@link #inverseTransform inverseTransform},
     * {@link #directTransformMatrix directTransformMatrix}, {@link #inverseTransformMatrix inverseTransformMatrix}.
     * This value does not matter in other methods of this class: conversions between Hartley and Fourier spectrum,
     * {@link #spectrumOfConvolution(ArrayContext, Matrix, Matrix, Matrix)} and
     * {@link #spectrumOfConvolution(ArrayContext, Matrix, Matrix, Matrix, Matrix, Matrix, Matrix)}.
     *
     * @param normalizeDirectTransform true if you want to perform normalization
     *                                 (division by the number of
     *                                 samples N) after the direct transform,
     *                                 false (the default value)
     *                                 if you want to perform normalization after the inverse transform.
     * @see #SeparableFastHartleyTransform(boolean, long)
     */
    public SeparableFastHartleyTransform(boolean normalizeDirectTransform) {
        this.normalizeDirectTransform = normalizeDirectTransform;
    }

    /**
     * Creates a new instance of this class, performing separable Hartley transform according either to the formula 1
     * from the {@link SeparableFastHartleyTransform comments to this class},
     * if normalizeDirectTransform argument is false,
     * or to the formula 2, if this argument is true.
     * The default value, used by the constructors without normalizeDirectTransform argument,
     * is false.
     *
     * 
Please note: the value of normalizeDirectTransform argument affects only the transformation
     * methods {@link #directTransform directTransform}, {@link #inverseTransform inverseTransform},
     * {@link #directTransformMatrix directTransformMatrix}, {@link #inverseTransformMatrix inverseTransformMatrix}.
     * This value does not matter in other methods of this class: conversions between Hartley and Fourier spectrum,
     * {@link #spectrumOfConvolution(ArrayContext, Matrix, Matrix, Matrix)} and
     * {@link #spectrumOfConvolution(ArrayContext, Matrix, Matrix, Matrix, Matrix, Matrix, Matrix)}.
     *
     * 
The maxTempJavaMemory argument specifies the amount of Java memory (heap),
     * that can be used by methods of this class for internal needs. It is passed to the corresponding
     * constructor of {@link AbstractSpectralTransform}: see
     * {@link AbstractSpectralTransform#AbstractSpectralTransform(long) comments to that constructor}.
     *
     * @param maxTempJavaMemory        desired maximal amount of Java memory, in bytes, allowed for allocating
     *                                 by methods of this class for internal needs.
     * @param normalizeDirectTransform true if you want to perform normalization
     *                                 (division by the number of
     *                                 samples N) after the direct transform,
     *                                 false (the default value)
     *                                 if you want to perform normalization after the inverse transform.
     * @see #SeparableFastHartleyTransform(boolean)
     */
    public SeparableFastHartleyTransform(boolean normalizeDirectTransform, long maxTempJavaMemory) {
        super(maxTempJavaMemory);
        this.normalizeDirectTransform = normalizeDirectTransform;
    }

    /**
     * Converts the separable Hartley spectrum H of some real n-dimensional matrix into
     * the (complex) Fourier spectrum F of the same matrix.
     * See the {@link SeparableFastHartleyTransform comments to this class} about the relation formulas between
     * separable Hartley and Fourier spectra.
     *
     * 
The complex matrix F is represented as a pair of AlgART matrices (fRe,fIm):
     * the corresponding elements of these 2 matrices contain the real and imaginary parts
     * of the corresponding elements of the complex matrix F.
     * The real matrix H is passed as an AlgART matrix h.
     *
     * 
All matrices, passed to this method, must have {@link Matrix#dimEquals(Matrix) equal dimensions}.
     * The {@link Matrix#elementType() element type} of the passed matrices can be different, but we recommend
     * using the same float or double element type for all matrices.
     * There are no restrictions for the dimensions of the passed matrices:
     * {@link #isLengthAllowed(long)} method is not used here.
     *
     * 
This method works correctly, if you pass the same matrix as
     * fRe / fIm and h.
     *
     * 
If you need to convert spectrum in a case of one-dimensional
     * numeric AlgART arrays, you just need to convert them into one-dimensional AlgART matrices by
     * {@link Matrices#matrix(Array, long...)} call, for example:
     * {@link Matrices#matrix(Array, long...) Matrices.matrix}(array, array.length()).
     *
     * @param context the context that will be used by this algorithm; can be {@code null}
     *                (see comments to {@link SpectralTransform}).
     * @param fRe     the real parts of the elements of the resulting matrix (Fourier spectrum).
     * @param fIm     the imaginary parts of the elements of the resulting matrix (Fourier spectrum).
     * @param h       the source real matrix (separable Hartley spectrum).
     * @throws NullPointerException  if one of fRe, fIm,
     *                               h arguments is {@code null}.
     * @throws SizeMismatchException if some of the passed matrices have different dimensions.
     * @see #separableHartleyToFourier(ArrayContext, Matrix, Matrix, Matrix, Matrix)
     */
    public void separableHartleyToFourier(
            ArrayContext context,
            Matrix fRe, Matrix fIm,
            Matrix h) {
        Objects.requireNonNull(fRe, "Null fRe argument");
        Objects.requireNonNull(fIm, "Null fIm argument");
        Objects.requireNonNull(h, "Null h argument");
        if (!fRe.dimEquals(fIm)) {
            throw new SizeMismatchException("fRe and fIm dimensions mismatch: fRe is " + fRe + ", fIm " + fIm);
        }
        if (!h.dimEquals(fRe)) {
            throw new SizeMismatchException("h and fRe dimensions mismatch: h is " + h + ", fRe " + fRe);
        }
        ThreadPoolFactory tpf = Arrays.getThreadPoolFactory(context);
        Conversions.separableHartleyToFourierRecoursive(context, maxTempJavaMemory(),
                fRe.array(), fIm.array(), h.array(), null,
                fRe.dimensions(), Math.max(1, tpf.recommendedNumberOfTasks()));
    }

    /**
     * Converts the separable Hartley spectrum H of some complex n-dimensional matrix into
     * the (complex) Fourier spectrum F of the same matrix.
     * See the {@link SeparableFastHartleyTransform comments to this class} about the relation formulas between
     * separable Hartley and Fourier spectra.
     *
     * 
The complex matrix F is represented as a pair of AlgART matrices (fRe,fIm):
     * the corresponding elements of these 2 matrices contain the real and imaginary parts
     * of the corresponding elements of the complex matrix F.
     * Similarly, the complex matrix H is represented as a pair of AlgART matrices (hRe,hIm).
     *
     * 
All matrices, passed to this method, must have {@link Matrix#dimEquals(Matrix) equal dimensions}.
     * The {@link Matrix#elementType() element type} of the passed matrices can be different, but we recommend
     * using the same float or double element type for all matrices.
     * There are no restrictions for the dimensions of the passed matrices:
     * {@link #isLengthAllowed(long)} method is not used here.
     *
     * 
This method works correctly, if you pass the same complex matrix as F and H.
     * So, you can calculate and return the result in the source matrices.
     *
     * 
If you need to convert spectrum in a case of one-dimensional
     * numeric AlgART arrays, you just need to convert them into one-dimensional AlgART matrices by
     * {@link Matrices#matrix(Array, long...)} call, for example:
     * {@link Matrices#matrix(Array, long...) Matrices.matrix}(array, array.length()).
     *
     * @param context the context that will be used by this algorithm; can be {@code null}
     *                (see comments to {@link SpectralTransform}).
     * @param fRe     the real parts of the elements of the resulting matrix (Fourier spectrum).
     * @param fIm     the imaginary parts of the elements of the resulting matrix (Fourier spectrum).
     * @param hRe     the real parts of the elements of the source matrix (separable Hartley spectrum).
     * @param hIm     the imaginary parts of the elements of the source matrix (separable Hartley spectrum).
     * @throws NullPointerException  if one of fRe, fIm, hRe, hIm
     *                               arguments is {@code null}.
     * @throws SizeMismatchException if some of the passed matrices have different dimensions.
     * @see #separableHartleyToFourier(ArrayContext, Matrix, Matrix, Matrix)
     */
    public void separableHartleyToFourier(
            ArrayContext context,
            Matrix fRe, Matrix fIm,
            Matrix hRe, Matrix hIm) {
        Objects.requireNonNull(fRe, "Null fRe argument");
        Objects.requireNonNull(fIm, "Null fIm argument");
        Objects.requireNonNull(hRe, "Null hRe argument");
        Objects.requireNonNull(hIm, "Null hIm argument");
        if (!fRe.dimEquals(fIm)) {
            throw new SizeMismatchException("fRe and fIm dimensions mismatch: fRe is " + fRe + ", fIm " + fIm);
        }
        if (!hRe.dimEquals(fRe)) {
            throw new SizeMismatchException("hRe and fRe dimensions mismatch: hRe is " + hRe + ", fRe " + fRe);
        }
        if (!hIm.dimEquals(fRe)) {
            throw new SizeMismatchException("hIm and fRe dimensions mismatch: hIm is " + hIm + ", fRe " + fRe);
        }
        ThreadPoolFactory tpf = Arrays.getThreadPoolFactory(context);
        Conversions.separableHartleyToFourierRecoursive(context, maxTempJavaMemory(),
                fRe.array(), fIm.array(), hRe.array(), hIm.array(),
                fRe.dimensions(), Math.max(1, tpf.recommendedNumberOfTasks()));
    }

    /**
     * Converts the Fourier spectrum F of some real n-dimensional matrix into
     * the (real) separable Hartley  spectrum H of the same matrix.
     * See the {@link SeparableFastHartleyTransform comments to this class} about the relation formulas between
     * separable Hartley and Fourier spectra.
     * If the passed Fourier spectrum is not a spectrum of a real matrix (in other words,
     * if the inverse Fourier transform of F matrix contains nonzero imaginary parts),
     * then this method still correctly calculates the real parts of the separable Hartley spectrum H.
     *
     * 
The complex matrix F is represented as a pair of AlgART matrices (fRe,fIm):
     * the corresponding elements of these 2 matrices contain the real and imaginary parts
     * of the corresponding elements of the complex matrix F.
     * The real matrix H (or the real parts of H, if the passed F matrix
     * is not a spectrum of a real matrix) is passed as an AlgART matrix h.
     *
     * 
All matrices, passed to this method, must have {@link Matrix#dimEquals(Matrix) equal dimensions}.
     * The {@link Matrix#elementType() element type} of the passed matrices can be different, but we recommend
     * using the same float or double element type for all matrices.
     * There are no restrictions for the dimensions of the passed matrices:
     * {@link #isLengthAllowed(long)} method is not used here.
     *
     * 
This method works correctly, if you pass the same matrix as
     * fRe / fIm and h.
     *
     * 
If you need to convert spectrum in a case of one-dimensional
     * numeric AlgART arrays, you just need to convert them into one-dimensional AlgART matrices by
     * {@link Matrices#matrix(Array, long...)} call, for example:
     * {@link Matrices#matrix(Array, long...) Matrices.matrix}(array, array.length()).
     *
     * @param context the context that will be used by this algorithm; can be {@code null}
     *                (see comments to {@link SpectralTransform}).
     * @param h       the resulting real matrix (separable Hartley spectrum).
     * @param fRe     the real parts of the elements of the source matrix (Fourier spectrum).
     * @param fIm     the imaginary parts of the elements of the source matrix (Fourier spectrum).
     * @throws NullPointerException  if one of h, fRe, fIm arguments is {@code null}.
     * @throws SizeMismatchException if some of the passed matrices have different dimensions.
     * @see #fourierToSeparableHartley(ArrayContext, Matrix, Matrix, Matrix, Matrix)
     */
    public void fourierToSeparableHartley(
            ArrayContext context,
            Matrix h,
            Matrix fRe, Matrix fIm) {
        Objects.requireNonNull(h, "Null h argument");
        Objects.requireNonNull(fRe, "Null fRe argument");
        Objects.requireNonNull(fIm, "Null fIm argument");
        if (!fRe.dimEquals(fIm)) {
            throw new SizeMismatchException("fRe and fIm dimensions mismatch: fRe is " + fRe + ", fIm " + fIm);
        }
        if (!h.dimEquals(fRe)) {
            throw new SizeMismatchException("h and fRe dimensions mismatch: h is " + h + ", fRe " + fRe);
        }
        ThreadPoolFactory tpf = Arrays.getThreadPoolFactory(context);
        Conversions.fourierToSeparableHartleyRecursive(context, maxTempJavaMemory(),
                h.array(), null, fRe.array(), fIm.array(),
                fRe.dimensions(), Math.max(1, tpf.recommendedNumberOfTasks()));
    }

    /**
     * Converts the Fourier spectrum F of some complex n-dimensional matrix into
     * the (complex) separable Hartley spectrum H of the same matrix.
     * See the {@link SeparableFastHartleyTransform comments to this class} about the relation formulas between
     * separable Hartley and Fourier spectra.
     *
     * 
The complex matrix F is represented as a pair of AlgART matrices (fRe,fIm):
     * the corresponding elements of these 2 matrices contain the real and imaginary parts
     * of the corresponding elements of the complex matrix F.
     * Similarly, the complex matrix H is represented as a pair of AlgART matrices (hRe,hIm).
     *
     * 
All matrices, passed to this method, must have {@link Matrix#dimEquals(Matrix) equal dimensions}.
     * The {@link Matrix#elementType() element type} of the passed matrices can be different, but we recommend
     * using the same float or double element type for all matrices.
     * There are no restrictions for the dimensions of the passed matrices:
     * {@link #isLengthAllowed(long)} method is not used here.
     *
     * 
This method works correctly, if you pass the same complex matrix as F and H.
     * So, you can calculate and return the result in the source matrices.
     *
     * 
If you need to convert spectrum in a case of one-dimensional
     * numeric AlgART arrays, you just need to convert them into one-dimensional AlgART matrices by
     * {@link Matrices#matrix(Array, long...)} call, for example:
     * {@link Matrices#matrix(Array, long...) Matrices.matrix}(array, array.length()).
     *
     * @param context the context that will be used by this algorithm; can be {@code null}
     *                (see comments to {@link SpectralTransform}).
     * @param hRe     the real parts of the elements of the resulting matrix (separable Hartley spectrum).
     * @param hIm     the imaginary parts of the elements of the resulting matrix (separable Hartley spectrum).
     * @param fRe     the real parts of the elements of the source matrix (Fourier spectrum).
     * @param fIm     the imaginary parts of the elements of the source matrix (Fourier spectrum).
     * @throws NullPointerException  if one of hRe, hIm, fRe, fIm
     *                               arguments is {@code null}.
     * @throws SizeMismatchException if some of the passed matrices have different dimensions.
     * @see #fourierToSeparableHartley(ArrayContext, Matrix, Matrix, Matrix)
     */
    public void fourierToSeparableHartley(
            ArrayContext context,
            Matrix hRe, Matrix hIm,
            Matrix fRe, Matrix fIm) {
        Objects.requireNonNull(hRe, "Null hRe argument");
        Objects.requireNonNull(fRe, "Null fRe argument");
        Objects.requireNonNull(fIm, "Null fIm argument");
        Objects.requireNonNull(hIm, "Null hIm argument");
        if (!fRe.dimEquals(fIm)) {
            throw new SizeMismatchException("fRe and fIm dimensions mismatch: fRe is " + fRe + ", fIm " + fIm);
        }
        if (!hRe.dimEquals(fRe)) {
            throw new SizeMismatchException("hRe and fRe dimensions mismatch: hRe is " + hRe + ", fRe " + fRe);
        }
        if (!hIm.dimEquals(fRe)) {
            throw new SizeMismatchException("hIm and fRe dimensions mismatch: hIm is " + hIm + ", fRe " + fRe);
        }
        ThreadPoolFactory tpf = Arrays.getThreadPoolFactory(context);
        Conversions.fourierToSeparableHartleyRecursive(context, maxTempJavaMemory(),
                hRe.array(), hIm.array(), fRe.array(), fIm.array(),
                fRe.dimensions(), Math.max(1, tpf.recommendedNumberOfTasks()));
    }

    /**
     * Calculates C, the separable Hartley spectrum of the convolution of some two real matrices,
     * on the base of P and Q — the separable Hartley spectra of these two real matrices.
     *
     * 
The real matrices P, Q, C are passed as AlgART matrices
     * p, q, c.
     *
     * 
All matrices, passed to this method, must have {@link Matrix#dimEquals(Matrix) equal dimensions}.
     * The {@link Matrix#elementType() element type} of the passed matrices can be different, but we recommend
     * using the same float or double element type for all matrices.
     * There are no restrictions for the dimensions of the passed matrices:
     * {@link #isLengthAllowed(long)} method is not used here.
     *
     * 
This method works correctly, if you pass the same complex matrix as P and Q,
     * or as P and C, or as Q and C, or even as all three matrices.
     * So, you can calculate and return the result in one of the source matrices.
     *
     * 
If you need to calculate the Hartley spectrum of convolution for a case of one-dimensional
     * numeric AlgART arrays, you just need to convert them into one-dimensional AlgART matrices by
     * {@link Matrices#matrix(Array, long...)} call, for example:
     * {@link Matrices#matrix(Array, long...) Matrices.matrix}(array, array.length()).
     *
     * @param context the context that will be used by this algorithm; can be {@code null}
     *                (see comments to {@link SpectralTransform}).
     * @param c       the resulting matrix (spectrum of the convolution).
     * @param p       the spectrum of the 1st matrix.
     * @param q       the spectrum of the 2nd matrix.
     * @throws NullPointerException  if one of c, p, q arguments is {@code null}.
     * @throws SizeMismatchException if some of the passed matrices have different dimensions.
     * @see #spectrumOfConvolution(ArrayContext, Matrix, Matrix, Matrix, Matrix, Matrix, Matrix)
     */
    public void spectrumOfConvolution(
            ArrayContext context,
            Matrix c,
            Matrix p,
            Matrix q) {
        Objects.requireNonNull(c, "Null c argument");
        Objects.requireNonNull(p, "Null p argument");
        Objects.requireNonNull(q, "Null q argument");
        if (!p.dimEquals(c)) {
            throw new SizeMismatchException("c and p dimensions mismatch: c is " + c + ", p " + p);
        }
        if (!q.dimEquals(c)) {
            throw new SizeMismatchException("c and q dimensions mismatch: c is " + c + ", q " + q);
        }
        ThreadPoolFactory tpf = Arrays.getThreadPoolFactory(context);
        SpectraOfConvolution.separableHartleySpectrumOfConvolution(context, maxTempJavaMemory(),
                c.array(), null, p.array(), null, q.array(), null,
                c.dimensions(), Math.max(1, tpf.recommendedNumberOfTasks()));
    }

    /**
     * Calculates C, the separable Hartley spectrum of the convolution of some two complex matrices,
     * on the base of P and Q — the separable Hartley spectra of these two complex matrices.
     *
     * 
The complex matrix P is represented as a pair of AlgART matrices (pRe,pIm):
     * the corresponding elements of these 2 matrices contain the real and imaginary parts
     * of the corresponding elements of the complex matrix P.
     * Similarly, the complex matrix Q is represented as a pair of AlgART matrices (qRe,qIm),
     * and the complex matrix C is represented as a pair of AlgART matrices (cRe,cIm).
     *
     * 
All matrices, passed to this method, must have {@link Matrix#dimEquals(Matrix) equal dimensions}.
     * The {@link Matrix#elementType() element type} of the passed matrices can be different, but we recommend
     * using the same float or double element type for all matrices.
     * There are no restrictions for the dimensions of the passed matrices:
     * {@link #isLengthAllowed(long)} method is not used here.
     *
     * 
This method works correctly, if you pass the same complex matrix as P and Q,
     * or as P and C, or as Q and C, or even as all three matrices.
     * So, you can calculate and return the result in one of the source matrices.
     *
     * If you need to calculate the Hartley spectrum of convolution for a case of one-dimensional
     * numeric AlgART arrays, you just need to convert them into one-dimensional AlgART matrices by
     * {@link Matrices#matrix(Array, long...)} call, for example:
     * {@link Matrices#matrix(Array, long...) Matrices.matrix}(array, array.length()).
     *
     * @param context the context that will be used by this algorithm; can be {@code null}
     *                (see comments to {@link SpectralTransform}).
     * @param cRe     the real parts of the elements of the resulting matrix (spectrum of the convolution).
     * @param cIm     the imaginary parts of the elements of the resulting matrix (spectrum of the convolution).
     * @param pRe     the real parts of the elements of the spectrum of the 1st matrix.
     * @param pIm     the imaginary parts of the elements of the spectrum of the 1st matrix.
     * @param qRe     the real parts of the elements of the spectrum of the 2nd matrix.
     * @param qIm     the imaginary parts of the elements of the spectrum of the 2nd matrix.
     * @throws NullPointerException  if one of cRe, cIm, pRe, pIm,
     *                               qRe, qIm arguments is {@code null}.
     * @throws SizeMismatchException if some of the passed matrices have different dimensions.
     * @see #spectrumOfConvolution(ArrayContext, Matrix, Matrix, Matrix)
     */
    public void spectrumOfConvolution(
            ArrayContext context,
            Matrix cRe, Matrix cIm,
            Matrix pRe, Matrix pIm,
            Matrix qRe, Matrix qIm) {
        Objects.requireNonNull(cRe, "Null cRe argument");
        Objects.requireNonNull(cIm, "Null cIm argument");
        Objects.requireNonNull(pRe, "Null pRe argument");
        Objects.requireNonNull(pIm, "Null pIm argument");
        Objects.requireNonNull(qRe, "Null qRe argument");
        Objects.requireNonNull(qIm, "Null qIm argument");
        if (!cRe.dimEquals(cIm)) {
            throw new SizeMismatchException("cRe and cIm dimensions mismatch: cRe is " + cRe + ", cIm " + cIm);
        }
        if (!pRe.dimEquals(cRe)) {
            throw new SizeMismatchException("cRe and pRe dimensions mismatch: cRe is " + cRe + ", pRe " + pRe);
        }
        if (!pIm.dimEquals(cRe)) {
            throw new SizeMismatchException("cRe and pIm dimensions mismatch: cRe is " + cRe + ", pIm " + pIm);
        }
        if (!qRe.dimEquals(cRe)) {
            throw new SizeMismatchException("cRe and qRe dimensions mismatch: cRe is " + cRe + ", qRe " + qRe);
        }
        if (!qIm.dimEquals(cRe)) {
            throw new SizeMismatchException("cRe and qIm dimensions mismatch: cRe is " + cRe + ", qIm " + qIm);
        }
        ThreadPoolFactory tpf = Arrays.getThreadPoolFactory(context);
        SpectraOfConvolution.separableHartleySpectrumOfConvolution(context, maxTempJavaMemory(),
                cRe.array(), cIm.array(), pRe.array(), pIm.array(), qRe.array(), qIm.array(),
                cRe.dimensions(), Math.max(1, tpf.recommendedNumberOfTasks()));
    }

    @Override
    public final boolean isLengthAllowed(long length) {
        return (length & (length - 1)) == 0;
    }

    @Override
    public boolean areComplexSamplesRequired() {
        return false;
    }

    @Override
    protected String unallowedLengthMessage() {
        return "FHT algorithm can process only 2^k elements";
    }

    @Override
    protected final void transform(ArrayContext context, SampleArray samples, boolean inverse) {
        assert isLengthAllowed(samples.length());
        boolean normalize = normalizeDirectTransform ? !inverse : inverse;

//        long t1 = System.nanoTime();
        ReverseBits.reorderForButterfly(context == null ? null : context.part(0.0, 0.2), samples);

//        long t2 = System.nanoTime();
        fhtMainLoop(context == null ? null : context.part(0.2, normalize ? 0.95 : 1.0), samples);

//        long t3 = System.nanoTime();
        if (normalize) {
            FastFourierTransform.fftNormalize(context == null ? null : context.part(0.95, 1.0), samples);
        }
//        long t4 = System.nanoTime();
//        System.out.printf("FHT time (common branch) %.3f ms reorder, %.3f ms main, %.3f normalizing%n",
//            (t2 - t1) * 1e-6, (t3 - t2) * 1e-6, (t4 - t3) * 1e-6);
    }

    private static final int LOG_ANGLE_STEP = 4, ANGLE_STEP = 1 << LOG_ANGLE_STEP;
    private static final double SQRT2 = StrictMath.sqrt(2.0);
    private static final double HALF_SQRT2 = 0.5 * SQRT2;
    private static final int
            S01 = 0, D01 = 1, S23 = 2, D23 = 3,
            S45 = 0, D45 = 1, S67 = 2, D67 = 3, // reusing
            SS0123 = 4, SD0123 = 5, DS0123 = 6, DD0123 = 7,
            SS4567 = 8, DS4567 = 9,
            R1 = 0, R2 = 1,
            L1 = 2,
            L2 = 2, // reusing,
            CAS = 3,
            NUMBER_OF_WORK_VARIABLES = 10;

    private static void fhtMainLoop(ArrayContext context, SampleArray samples) {
        final int logN = 63 - Long.numberOfLeadingZeros(samples.length());
        if (logN <= 0) {
            return; // nothing to do: this check little simplifies further checks in the recursive procedure
        }

        if (samples instanceof RealScalarSampleArray.DirectZeroOffsetsRealFloatSampleArray
                || samples instanceof RealScalarSampleArray.DirectRealFloatSampleArray) {
            fhtJavaFloatMainLoop(context, samples, 0, logN, logN);
        } else if (samples instanceof RealScalarSampleArray.DirectZeroOffsetsRealDoubleSampleArray
                || samples instanceof RealScalarSampleArray.DirectRealDoubleSampleArray) {
            fhtJavaDoubleMainLoop(context, samples, 0, logN, logN);
// The functions below are anti-optimization in the large applications in the current version of 32-bit Java 1.7
//        } else if (samples instanceof RealVectorSampleArray.DirectRealFloatVectorSampleArray) {
//            fhtJavaFloatMultidimensionalMainLoop(context,
//                (RealVectorSampleArray.DirectRealFloatVectorSampleArray)samples, 0, logN, logN,
//                ((RealVectorSampleArray.DirectRealFloatVectorSampleArray)samples).
//                    newCompatibleSamplesArray(NUMBER_OF_WORK_VARIABLES));
//        } else if (samples instanceof RealVectorSampleArray.DirectRealDoubleVectorSampleArray) {
//            fhtJavaDoubleMultidimensionalMainLoop(context,
//                (RealVectorSampleArray.DirectRealDoubleVectorSampleArray)samples, 0, logN, logN,
//                ((RealVectorSampleArray.DirectRealDoubleVectorSampleArray)samples).
//                    newCompatibleSamplesArray(NUMBER_OF_WORK_VARIABLES));
        } else {
            fhtCommonMainLoop(context, samples,
                    0, logN, logN, samples.newCompatibleSamplesArray(NUMBER_OF_WORK_VARIABLES));
        }
    }

    private static void fhtCommonMainLoop(
            ArrayContext context, SampleArray samples,
            final long pos, final int logN, final int originalLogN, SampleArray work) {
        switch (logN) { // in comments below xK, rK, yK means the source, the reordered source and the result
            case 1: {
                work.sub(0, samples, pos, pos + 1);
                samples.add(pos, pos, pos + 1);             // y0 = x0 + x1
                samples.copy(pos + 1, work, 0);             // y1 = x0 - x1
                break;
            }
            case 2: {
                work.sub(D01, samples, pos, pos + 1);
                work.add(S01, samples, pos, pos + 1);
                work.sub(D23, samples, pos + 2, pos + 3);
                work.add(S23, samples, pos + 2, pos + 3);
                samples.add(pos, work, S01, S23);           // y0 = r0 + r1 + r2 + r3 = x0 + x2 + x1 + x3
                samples.add(pos + 1, work, D01, D23);       // y1 = r0 - r1 + r2 - r3 = x0 - x2 + x1 - x3
                samples.sub(pos + 2, work, S01, S23);       // y2 = r0 + r1 - r2 - r3 = x0 + x2 - x1 - x3
                samples.sub(pos + 3, work, D01, D23);       // y3 = r0 - r1 - r2 + r3 = x0 - x2 - x1 + x3
                break;
            }
            case 3: {
                work.sub(D01, samples, pos, pos + 1);
                work.add(S01, samples, pos, pos + 1);
                work.sub(D23, samples, pos + 2, pos + 3);
                work.add(S23, samples, pos + 2, pos + 3);
                work.sub(DS0123, S01, S23);
                work.add(SS0123, S01, S23);
                work.sub(DD0123, D01, D23);
                work.add(SD0123, D01, D23); // S01, S23, D01, D23 will not be used more and can be reused
                work.add(S45, samples, pos + 4, pos + 5);
                work.add(S67, samples, pos + 6, pos + 7);
                work.sub(D45, samples, pos + 4, pos + 5);
                work.sub(D67, samples, pos + 6, pos + 7);
                work.sub(DS4567, S45, S67);
                work.add(SS4567, S45, S67);
                samples.sub(pos + 4, work, SS0123, SS4567); // y4 = r0 + r1 + r2 + r3 - r4 - r5 - r6 - r7
                samples.add(pos, work, SS0123, SS4567);     // y0 = r0 + ... + r7 = x0 + ... + x7
                samples.sub(pos + 6, work, DS0123, DS4567); // y6 = r0 + r1 - r2 - r3 - r4 - r5 + r6 + r7
                //    = x0 + x4 - x2 - x6 - x1 - x5 + x3 + x7
                samples.add(pos + 2, work, DS0123, DS4567); // y2 = r0 + r1 - r2 - r3 + r4 + r5 - r6 - r7
                //    = x0 + x4 - x2 - x6 + x1 + x5 - x3 - x7
                work.multiplyByRealScalar(D45, SQRT2);
                work.multiplyByRealScalar(D67, SQRT2);
                samples.sub(pos + 5, work, SD0123, D45);    // y5 = r0 - r1 + r2 - r3 - sqrt(2)*r4 + sqrt(2)*r5
                //    = x0 - x4 + x2 - x6 - sqrt(2)*x1 + sqrt(2)*x5
                samples.add(pos + 1, work, SD0123, D45);    // y1 = r0 - r1 + r2 - r3 + sqrt(2)*r4 - sqrt(2)*r5
                //    = x0 - x4 + x2 - x6 + sqrt(2)*x1 - sqrt(2)*x5
                samples.sub(pos + 7, work, DD0123, D67);    // y7 = r0 - r1 - r2 + r3 - sqrt(2)*r6 + sqrt(2)*r7
                //    = x0 - x4 - x2 + x6 - sqrt(2)*x3 + sqrt(2)*x7
                samples.add(pos + 3, work, DD0123, D67);    // y3 = r0 - r1 - r2 + r3 + sqrt(2)*r6 - sqrt(2)*r7
                //    = x0 - x4 - x2 + x6 + sqrt(2)*x3 - sqrt(2)*x7
                break;
            }
            default: {
                assert logN > 3;
                final long nDiv8 = 1L << (logN - 3);
                final long nDiv4 = nDiv8 * 2;
                final long nDiv2 = nDiv4 * 2;
                fhtCommonMainLoop(context, samples, pos, logN - 1, originalLogN, work);
                fhtCommonMainLoop(context, samples, pos + nDiv2, logN - 1, originalLogN, work);

                final boolean allAnglesInCache = logN - 1 <= RootsOfUnity.LOG_CACHE_SIZE + LOG_ANGLE_STEP;
                final double rotationAngle = StrictMath.PI / nDiv2;
                final double sin0Half = RootsOfUnity.LOGARITHMICAL_SINE_CACHE[logN]; //sin(0.5 * rotationAngle);
                final double cos0M1 = -2.0 * sin0Half * sin0Half; // cos(rotationAngle)-1
                final double sin0 = RootsOfUnity.LOGARITHMICAL_SINE_CACHE[logN - 1]; //sin(rotationAngle)
                double cos = 1.0 + cos0M1;
                double sin = sin0;
                long j1, j2;
                for (long j = 1; j < nDiv8; j++) {
                    // (cos, sin) = (cos(j*2*PI/N),  sin(j*2*PI/N))
                    j1 = pos + j;
                    j2 = pos + nDiv2 - j;
                    //[[Repeat.SectionStart CommonButterfly]]
                    work.copy(R1, samples, nDiv2 + j1);
                    work.copy(R2, samples, nDiv2 + j2);
                    work.combineWithRealMultipliers(CAS, R1, cos, R2, sin);
                    work.copy(L1, samples, j1);
                    samples.add(j1, work, L1, CAS);
                    samples.sub(nDiv2 + j1, work, L1, CAS);
                    work.combineWithRealMultipliers(CAS, R1, sin, R2, -cos);
                    work.copy(L2, samples, j2);
                    samples.add(j2, work, L2, CAS);
                    samples.sub(nDiv2 + j2, work, L2, CAS);
                    //[[Repeat.SectionEnd CommonButterfly]]

                    j1 = pos + nDiv4 - j;
                    j2 = pos + nDiv4 + j; // below is the same code as above with exchanging sin<->cos
                    //[[Repeat(INCLUDE_FROM_FILE, THIS_FILE, CommonButterfly)
                    //  cos ==> TTTT;; sin ==> cos;; TTTT ==> sin   !! Auto-generated: NOT EDIT !! ]]
                    work.copy(R1, samples, nDiv2 + j1);
                    work.copy(R2, samples, nDiv2 + j2);
                    work.combineWithRealMultipliers(CAS, R1, sin, R2, cos);
                    work.copy(L1, samples, j1);
                    samples.add(j1, work, L1, CAS);
                    samples.sub(nDiv2 + j1, work, L1, CAS);
                    work.combineWithRealMultipliers(CAS, R1, cos, R2, -sin);
                    work.copy(L2, samples, j2);
                    samples.add(j2, work, L2, CAS);
                    samples.sub(nDiv2 + j2, work, L2, CAS);
                    //[[Repeat.IncludeEnd]]

                    if ((j & (ANGLE_STEP - 1)) == ANGLE_STEP - 1) {
                        if (allAnglesInCache) {
                            int angleIndex = (int) (j + 1) >> LOG_ANGLE_STEP
                                    << (RootsOfUnity.LOG_CACHE_SIZE - (logN - 1) + LOG_ANGLE_STEP);
                            // (j + 1) * CACHE_SIZE/nDiv2
                            // cos = RootsOfUnity.quickCos(angleIndex);
                            // sin = RootsOfUnity.quickSin(angleIndex);
                            cos = angleIndex < RootsOfUnity.HALF_CACHE_SIZE ?
                                    RootsOfUnity.SINE_CACHE[RootsOfUnity.HALF_CACHE_SIZE - angleIndex] :
                                    -RootsOfUnity.SINE_CACHE[angleIndex - RootsOfUnity.HALF_CACHE_SIZE];
                            sin = angleIndex < RootsOfUnity.HALF_CACHE_SIZE ?
                                    RootsOfUnity.SINE_CACHE[angleIndex] :
                                    RootsOfUnity.SINE_CACHE[RootsOfUnity.CACHE_SIZE - angleIndex];
                        } else {
                            double angle = (j + 1) * rotationAngle;
                            double sinHalf = Math.sin(0.5 * angle);
                            cos = 1.0 - 2.0 * sinHalf * sinHalf; // = cos(angle)
                            sin = Math.sin(angle);
                        }
                    } else {
                        // on modern JVM and CPU, the following 4 multiplications and 4 additions work faster
                        // then extracting cos and sin from the table
                        double temp = cos;
                        cos = cos * cos0M1 - sin * sin0 + cos;
                        sin = sin * cos0M1 + temp * sin0 + sin;
                    }
                }
                j1 = pos + nDiv8;
                j2 = pos + nDiv2 - nDiv8;
                //[[Repeat(INCLUDE_FROM_FILE, THIS_FILE, CommonButterfly)
                //  (cos|sin) ==> HALF_SQRT2;;
                //  \s\s\s\s(work\.|samples\.|$) ==> $1   !! Auto-generated: NOT EDIT !! ]]
                work.copy(R1, samples, nDiv2 + j1);
                work.copy(R2, samples, nDiv2 + j2);
                work.combineWithRealMultipliers(CAS, R1, HALF_SQRT2, R2, HALF_SQRT2);
                work.copy(L1, samples, j1);
                samples.add(j1, work, L1, CAS);
                samples.sub(nDiv2 + j1, work, L1, CAS);
                work.combineWithRealMultipliers(CAS, R1, HALF_SQRT2, R2, -HALF_SQRT2);
                work.copy(L2, samples, j2);
                samples.add(j2, work, L2, CAS);
                samples.sub(nDiv2 + j2, work, L2, CAS);
                //[[Repeat.IncludeEnd]]

                work.sub(0, samples, pos, pos + nDiv2);
                samples.add(pos, pos, pos + nDiv2);         // r[0] + r[n/2]
                samples.copy(pos + nDiv2, work, 0);         // r[0] - r[n/2]

                work.sub(0, samples, pos + nDiv4, pos + nDiv2 + nDiv4);
                samples.add(pos + nDiv4, pos + nDiv4, pos + nDiv2 + nDiv4); // r[n/4] + r[3*n/4]
                samples.copy(pos + nDiv2 + nDiv4, work, 0);                 // r[n/4] - r[3*n/4]
            }

            if (context != null && (pos + (1 << logN) == samples.length())) {
                context.checkInterruptionAndUpdateProgress(null, logN, originalLogN);
            }
        }
    }

    //[[Repeat() Float ==> Double;;
    //           float ==> double;;
    //           \(double\)\s*\(([^)]+)\) ==> $1 ]]
    private static void fhtJavaFloatMainLoop(
            ArrayContext context, SampleArray samples,
            final int pos, final int logN, final int originalLogN) {
        final float[] values = samples instanceof RealScalarSampleArray.DirectZeroOffsetsRealFloatSampleArray ?
                ((RealScalarSampleArray.DirectZeroOffsetsRealFloatSampleArray) samples).samples :
                ((RealScalarSampleArray.DirectRealFloatSampleArray) samples).samples;
        final int ofs = pos + (samples instanceof RealScalarSampleArray.DirectZeroOffsetsRealFloatSampleArray ? 0 :
                ((RealScalarSampleArray.DirectRealFloatSampleArray) samples).ofs);
        switch (logN) { // in comments below xK, rK, yK means the source, the reordered source and the result
            case 1: {
                float temp = values[ofs] - values[ofs + 1];
                values[ofs] += values[ofs + 1];             // y0 = x0 + x1
                values[ofs + 1] = temp;                     // y1 = x0 - x1
                break;
            }
            case 2: {
                float d01 = values[ofs] - values[ofs + 1];
                float s01 = values[ofs] + values[ofs + 1];
                float d23 = values[ofs + 2] - values[ofs + 3];
                float s23 = values[ofs + 2] + values[ofs + 3];
                values[ofs] = s01 + s23;                    // y0 = r0 + r1 + r2 + r3 = x0 + x2 + x1 + x3
                values[ofs + 1] = d01 + d23;                // y1 = r0 - r1 + r2 - r3 = x0 - x2 + x1 - x3
                values[ofs + 2] = s01 - s23;                // y2 = r0 + r1 - r2 - r3 = x0 + x2 - x1 - x3
                values[ofs + 3] = d01 - d23;                // y3 = r0 - r1 - r2 + r3 = x0 - x2 - x1 + x3
                break;
            }
            case 3: {
                float d01 = values[ofs] - values[ofs + 1];
                float s01 = values[ofs] + values[ofs + 1];
                float d23 = values[ofs + 2] - values[ofs + 3];
                float s23 = values[ofs + 2] + values[ofs + 3];
                float ds0123 = s01 - s23;
                float ss0123 = s01 + s23;
                float dd0123 = d01 - d23;
                float sd0123 = d01 + d23;
                float s45 = values[ofs + 4] + values[ofs + 5];
                float s67 = values[ofs + 6] + values[ofs + 7];
                float d45 = values[ofs + 4] - values[ofs + 5];
                float d67 = values[ofs + 6] - values[ofs + 7];
                float ds4567 = s45 - s67;
                float ss4567 = s45 + s67;
                values[ofs + 4] = ss0123 - ss4567;          // y4 = r0 + r1 + r2 + r3 - r4 - r5 - r6 - r7
                values[ofs] = ss0123 + ss4567;              // y0 = r0 + ... + r7 = x0 + ... + x7
                values[ofs + 6] = ds0123 - ds4567;          // y6 = r0 + r1 - r2 - r3 - r4 - r5 + r6 + r7
                //    = x0 + x4 - x2 - x6 - x1 - x5 + x3 + x7
                values[ofs + 2] = ds0123 + ds4567;          // y2 = r0 + r1 - r2 - r3 + r4 + r5 - r6 - r7
                //    = x0 + x4 - x2 - x6 + x1 + x5 - x3 - x7
                d45 *= SQRT2;
                d67 *= SQRT2;
                values[ofs + 5] = sd0123 - d45;             // y5 = r0 - r1 + r2 - r3 - sqrt(2)*r4 + sqrt(2)*r5
                //    = x0 - x4 + x2 - x6 - sqrt(2)*x1 + sqrt(2)*x5
                values[ofs + 1] = sd0123 + d45;             // y1 = r0 - r1 + r2 - r3 + sqrt(2)*r4 - sqrt(2)*r5
                //    = x0 - x4 + x2 - x6 + sqrt(2)*x1 - sqrt(2)*x5
                values[ofs + 7] = dd0123 - d67;             // y7 = r0 - r1 - r2 + r3 - sqrt(2)*r6 + sqrt(2)*r7
                //    = x0 - x4 - x2 + x6 - sqrt(2)*x3 + sqrt(2)*x7
                values[ofs + 3] = dd0123 + d67;             // y3 = r0 - r1 - r2 + r3 + sqrt(2)*r6 - sqrt(2)*r7
                //    = x0 - x4 - x2 + x6 + sqrt(2)*x3 - sqrt(2)*x7
                break;
            }
            default: {
                assert logN > 3;
                final int nDiv8 = 1 << (logN - 3);
                final int nDiv4 = nDiv8 * 2;
                final int nDiv2 = nDiv4 * 2;
                fhtJavaFloatMainLoop(context, samples, pos, logN - 1, originalLogN);
                fhtJavaFloatMainLoop(context, samples, pos + nDiv2, logN - 1, originalLogN);

                final boolean allAnglesInCache = logN - 1 <= RootsOfUnity.LOG_CACHE_SIZE + LOG_ANGLE_STEP;
                final double rotationAngle = StrictMath.PI / nDiv2;
                final double sin0Half = RootsOfUnity.LOGARITHMICAL_SINE_CACHE[logN]; //sin(0.5 * rotationAngle);
                final double cos0M1 = -2.0 * sin0Half * sin0Half; // cos(rotationAngle)-1
                final double sin0 = RootsOfUnity.LOGARITHMICAL_SINE_CACHE[logN - 1]; //sin(rotationAngle)
                double cos = 1.0 + cos0M1;
                double sin = sin0;
                int j1, j2;
                float r1, r2, cas, l1, l2;
                for (int j = 1; j < nDiv8; j++) {
                    // (cos, sin) = (cos(j*2*PI/N),  sin(j*2*PI/N))
                    j1 = ofs + j;
                    j2 = ofs + nDiv2 - j;
                    //[[Repeat.SectionStart JavaFloatButterfly]]
                    r1 = values[nDiv2 + j1];
                    r2 = values[nDiv2 + j2];
                    cas = (float) (r1 * cos + r2 * sin);
                    l1 = values[j1];
                    values[j1] = l1 + cas;
                    values[nDiv2 + j1] = l1 - cas;
                    cas = (float) (r1 * sin - r2 * cos);
                    l2 = values[j2];
                    values[j2] = l2 + cas;
                    values[nDiv2 + j2] = l2 - cas;
                    //[[Repeat.SectionEnd JavaFloatButterfly]]

                    j1 = ofs + nDiv4 - j;
                    j2 = ofs + nDiv4 + j; // below is the same code as above with exchanging sin<->cos
                    //[[Repeat(INCLUDE_FROM_FILE, THIS_FILE, JavaFloatButterfly)
                    //  cos ==> TTTT;; sin ==> cos;; TTTT ==> sin !! Generated by Repeater: DO NOT EDIT !! ]]
                    r1 = values[nDiv2 + j1];
                    r2 = values[nDiv2 + j2];
                    cas = (float) (r1 * sin + r2 * cos);
                    l1 = values[j1];
                    values[j1] = l1 + cas;
                    values[nDiv2 + j1] = l1 - cas;
                    cas = (float) (r1 * cos - r2 * sin);
                    l2 = values[j2];
                    values[j2] = l2 + cas;
                    values[nDiv2 + j2] = l2 - cas;
                    //[[Repeat.IncludeEnd]]

                    if ((j & (ANGLE_STEP - 1)) == ANGLE_STEP - 1) {
                        if (allAnglesInCache) {
                            int angleIndex = (j + 1) >> LOG_ANGLE_STEP
                                    << (RootsOfUnity.LOG_CACHE_SIZE - (logN - 1) + LOG_ANGLE_STEP);
                            // (j + 1) * CACHE_SIZE/nDiv2
                            // cos = RootsOfUnity.quickCos(angleIndex);
                            // sin = RootsOfUnity.quickSin(angleIndex);
                            cos = angleIndex < RootsOfUnity.HALF_CACHE_SIZE ?
                                    RootsOfUnity.SINE_CACHE[RootsOfUnity.HALF_CACHE_SIZE - angleIndex] :
                                    -RootsOfUnity.SINE_CACHE[angleIndex - RootsOfUnity.HALF_CACHE_SIZE];
                            sin = angleIndex < RootsOfUnity.HALF_CACHE_SIZE ?
                                    RootsOfUnity.SINE_CACHE[angleIndex] :
                                    RootsOfUnity.SINE_CACHE[RootsOfUnity.CACHE_SIZE - angleIndex];
                        } else {
                            double angle = (j + 1) * rotationAngle;
                            double sinHalf = Math.sin(0.5 * angle);
                            cos = 1.0 - 2.0 * sinHalf * sinHalf; // = cos(angle)
                            sin = Math.sin(angle);
                        }
                    } else {
                        // on modern JVM and CPU, the following 4 multiplications and 4 additions work faster
                        // then extracting cos and sin from the table
                        double temp = cos;
                        cos = cos * cos0M1 - sin * sin0 + cos;
                        sin = sin * cos0M1 + temp * sin0 + sin;
                    }
                }
                j1 = ofs + nDiv8;
                j2 = ofs + nDiv2 - nDiv8;
                //[[Repeat(INCLUDE_FROM_FILE, THIS_FILE, JavaFloatButterfly)
                //  (cos|sin) ==> HALF_SQRT2;;
                //  \s\s\s\s(r1\s|r2\s|cas\s|l1\s|l2\s|values\b|$) ==> $1 !! Generated by Repeater: DO NOT EDIT !! ]]
                r1 = values[nDiv2 + j1];
                r2 = values[nDiv2 + j2];
                cas = (float) (r1 * HALF_SQRT2 + r2 * HALF_SQRT2);
                l1 = values[j1];
                values[j1] = l1 + cas;
                values[nDiv2 + j1] = l1 - cas;
                cas = (float) (r1 * HALF_SQRT2 - r2 * HALF_SQRT2);
                l2 = values[j2];
                values[j2] = l2 + cas;
                values[nDiv2 + j2] = l2 - cas;
                //[[Repeat.IncludeEnd]]

                float temp = values[ofs] - values[ofs + nDiv2];
                values[ofs] += values[ofs + nDiv2];         // r[0] + r[n/2]
                values[ofs + nDiv2] = temp;                 // r[0] - r[n/2]

                temp = values[ofs + nDiv4] - values[ofs + nDiv2 + nDiv4];
                values[ofs + nDiv4] += values[ofs + nDiv2 + nDiv4];         // r[n/4] + r[3*n/4]
                values[ofs + nDiv2 + nDiv4] = temp;                         // r[n/4] - r[3*n/4]
            }

            if (context != null && (pos + (1 << logN) == samples.length())) {
                context.checkInterruptionAndUpdateProgress(null, logN, originalLogN);
            }
        }
    }

    //[[Repeat.AutoGeneratedStart !! Auto-generated: NOT EDIT !! ]]
    private static void fhtJavaDoubleMainLoop(
            ArrayContext context, SampleArray samples,
            final int pos, final int logN, final int originalLogN) {
        final double[] values = samples instanceof RealScalarSampleArray.DirectZeroOffsetsRealDoubleSampleArray ?
                ((RealScalarSampleArray.DirectZeroOffsetsRealDoubleSampleArray) samples).samples :
                ((RealScalarSampleArray.DirectRealDoubleSampleArray) samples).samples;
        final int ofs = pos + (samples instanceof RealScalarSampleArray.DirectZeroOffsetsRealDoubleSampleArray ? 0 :
                ((RealScalarSampleArray.DirectRealDoubleSampleArray) samples).ofs);
        switch (logN) { // in comments below xK, rK, yK means the source, the reordered source and the result
            case 1: {
                double temp = values[ofs] - values[ofs + 1];
                values[ofs] += values[ofs + 1];             // y0 = x0 + x1
                values[ofs + 1] = temp;                     // y1 = x0 - x1
                break;
            }
            case 2: {
                double d01 = values[ofs] - values[ofs + 1];
                double s01 = values[ofs] + values[ofs + 1];
                double d23 = values[ofs + 2] - values[ofs + 3];
                double s23 = values[ofs + 2] + values[ofs + 3];
                values[ofs] = s01 + s23;                    // y0 = r0 + r1 + r2 + r3 = x0 + x2 + x1 + x3
                values[ofs + 1] = d01 + d23;                // y1 = r0 - r1 + r2 - r3 = x0 - x2 + x1 - x3
                values[ofs + 2] = s01 - s23;                // y2 = r0 + r1 - r2 - r3 = x0 + x2 - x1 - x3
                values[ofs + 3] = d01 - d23;                // y3 = r0 - r1 - r2 + r3 = x0 - x2 - x1 + x3
                break;
            }
            case 3: {
                double d01 = values[ofs] - values[ofs + 1];
                double s01 = values[ofs] + values[ofs + 1];
                double d23 = values[ofs + 2] - values[ofs + 3];
                double s23 = values[ofs + 2] + values[ofs + 3];
                double ds0123 = s01 - s23;
                double ss0123 = s01 + s23;
                double dd0123 = d01 - d23;
                double sd0123 = d01 + d23;
                double s45 = values[ofs + 4] + values[ofs + 5];
                double s67 = values[ofs + 6] + values[ofs + 7];
                double d45 = values[ofs + 4] - values[ofs + 5];
                double d67 = values[ofs + 6] - values[ofs + 7];
                double ds4567 = s45 - s67;
                double ss4567 = s45 + s67;
                values[ofs + 4] = ss0123 - ss4567;          // y4 = r0 + r1 + r2 + r3 - r4 - r5 - r6 - r7
                values[ofs] = ss0123 + ss4567;              // y0 = r0 + ... + r7 = x0 + ... + x7
                values[ofs + 6] = ds0123 - ds4567;          // y6 = r0 + r1 - r2 - r3 - r4 - r5 + r6 + r7
                //    = x0 + x4 - x2 - x6 - x1 - x5 + x3 + x7
                values[ofs + 2] = ds0123 + ds4567;          // y2 = r0 + r1 - r2 - r3 + r4 + r5 - r6 - r7
                //    = x0 + x4 - x2 - x6 + x1 + x5 - x3 - x7
                d45 *= SQRT2;
                d67 *= SQRT2;
                values[ofs + 5] = sd0123 - d45;             // y5 = r0 - r1 + r2 - r3 - sqrt(2)*r4 + sqrt(2)*r5
                //    = x0 - x4 + x2 - x6 - sqrt(2)*x1 + sqrt(2)*x5
                values[ofs + 1] = sd0123 + d45;             // y1 = r0 - r1 + r2 - r3 + sqrt(2)*r4 - sqrt(2)*r5
                //    = x0 - x4 + x2 - x6 + sqrt(2)*x1 - sqrt(2)*x5
                values[ofs + 7] = dd0123 - d67;             // y7 = r0 - r1 - r2 + r3 - sqrt(2)*r6 + sqrt(2)*r7
                //    = x0 - x4 - x2 + x6 - sqrt(2)*x3 + sqrt(2)*x7
                values[ofs + 3] = dd0123 + d67;             // y3 = r0 - r1 - r2 + r3 + sqrt(2)*r6 - sqrt(2)*r7
                //    = x0 - x4 - x2 + x6 + sqrt(2)*x3 - sqrt(2)*x7
                break;
            }
            default: {
                assert logN > 3;
                final int nDiv8 = 1 << (logN - 3);
                final int nDiv4 = nDiv8 * 2;
                final int nDiv2 = nDiv4 * 2;
                fhtJavaDoubleMainLoop(context, samples, pos, logN - 1, originalLogN);
                fhtJavaDoubleMainLoop(context, samples, pos + nDiv2, logN - 1, originalLogN);

                final boolean allAnglesInCache = logN - 1 <= RootsOfUnity.LOG_CACHE_SIZE + LOG_ANGLE_STEP;
                final double rotationAngle = StrictMath.PI / nDiv2;
                final double sin0Half = RootsOfUnity.LOGARITHMICAL_SINE_CACHE[logN]; //sin(0.5 * rotationAngle);
                final double cos0M1 = -2.0 * sin0Half * sin0Half; // cos(rotationAngle)-1
                final double sin0 = RootsOfUnity.LOGARITHMICAL_SINE_CACHE[logN - 1]; //sin(rotationAngle)
                double cos = 1.0 + cos0M1;
                double sin = sin0;
                int j1, j2;
                double r1, r2, cas, l1, l2;
                for (int j = 1; j < nDiv8; j++) {
                    // (cos, sin) = (cos(j*2*PI/N),  sin(j*2*PI/N))
                    j1 = ofs + j;
                    j2 = ofs + nDiv2 - j;

                    r1 = values[nDiv2 + j1];
                    r2 = values[nDiv2 + j2];
                    cas = r1 * cos + r2 * sin;
                    l1 = values[j1];
                    values[j1] = l1 + cas;
                    values[nDiv2 + j1] = l1 - cas;
                    cas = r1 * sin - r2 * cos;
                    l2 = values[j2];
                    values[j2] = l2 + cas;
                    values[nDiv2 + j2] = l2 - cas;


                    j1 = ofs + nDiv4 - j;
                    j2 = ofs + nDiv4 + j; // below is the same code as above with exchanging sin<->cos

                    r1 = values[nDiv2 + j1];
                    r2 = values[nDiv2 + j2];
                    cas = r1 * sin + r2 * cos;
                    l1 = values[j1];
                    values[j1] = l1 + cas;
                    values[nDiv2 + j1] = l1 - cas;
                    cas = r1 * cos - r2 * sin;
                    l2 = values[j2];
                    values[j2] = l2 + cas;
                    values[nDiv2 + j2] = l2 - cas;


                    if ((j & (ANGLE_STEP - 1)) == ANGLE_STEP - 1) {
                        if (allAnglesInCache) {
                            int angleIndex = (j + 1) >> LOG_ANGLE_STEP
                                    << (RootsOfUnity.LOG_CACHE_SIZE - (logN - 1) + LOG_ANGLE_STEP);
                            // (j + 1) * CACHE_SIZE/nDiv2
                            // cos = RootsOfUnity.quickCos(angleIndex);
                            // sin = RootsOfUnity.quickSin(angleIndex);
                            cos = angleIndex < RootsOfUnity.HALF_CACHE_SIZE ?
                                    RootsOfUnity.SINE_CACHE[RootsOfUnity.HALF_CACHE_SIZE - angleIndex] :
                                    -RootsOfUnity.SINE_CACHE[angleIndex - RootsOfUnity.HALF_CACHE_SIZE];
                            sin = angleIndex < RootsOfUnity.HALF_CACHE_SIZE ?
                                    RootsOfUnity.SINE_CACHE[angleIndex] :
                                    RootsOfUnity.SINE_CACHE[RootsOfUnity.CACHE_SIZE - angleIndex];
                        } else {
                            double angle = (j + 1) * rotationAngle;
                            double sinHalf = Math.sin(0.5 * angle);
                            cos = 1.0 - 2.0 * sinHalf * sinHalf; // = cos(angle)
                            sin = Math.sin(angle);
                        }
                    } else {
                        // on modern JVM and CPU, the following 4 multiplications and 4 additions work faster
                        // then extracting cos and sin from the table
                        double temp = cos;
                        cos = cos * cos0M1 - sin * sin0 + cos;
                        sin = sin * cos0M1 + temp * sin0 + sin;
                    }
                }
                j1 = ofs + nDiv8;
                j2 = ofs + nDiv2 - nDiv8;

                r1 = values[nDiv2 + j1];
                r2 = values[nDiv2 + j2];
                cas = r1 * HALF_SQRT2 + r2 * HALF_SQRT2;
                l1 = values[j1];
                values[j1] = l1 + cas;
                values[nDiv2 + j1] = l1 - cas;
                cas = r1 * HALF_SQRT2 - r2 * HALF_SQRT2;
                l2 = values[j2];
                values[j2] = l2 + cas;
                values[nDiv2 + j2] = l2 - cas;


                double temp = values[ofs] - values[ofs + nDiv2];
                values[ofs] += values[ofs + nDiv2];         // r[0] + r[n/2]
                values[ofs + nDiv2] = temp;                 // r[0] - r[n/2]

                temp = values[ofs + nDiv4] - values[ofs + nDiv2 + nDiv4];
                values[ofs + nDiv4] += values[ofs + nDiv2 + nDiv4];         // r[n/4] + r[3*n/4]
                values[ofs + nDiv2 + nDiv4] = temp;                         // r[n/4] - r[3*n/4]
            }

            if (context != null && (pos + (1 << logN) == samples.length())) {
                context.checkInterruptionAndUpdateProgress(null, logN, originalLogN);
            }
        }
    }

    //[[Repeat.AutoGeneratedEnd]]


    //[[Repeat() Float ==> Double;;
    //           float ==> double;;
    //           \(double\)\s*\(([^)]+)\) ==> $1 ]]
    private static void fhtJavaFloatMultidimensionalMainLoop(
            ArrayContext context,
            RealVectorSampleArray.DirectRealFloatVectorSampleArray samples,
            final int pos, final int logN, final int originalLogN,
            RealVectorSampleArray.DirectRealFloatVectorSampleArray work) {
        final float[] values = samples.samples;
        final int step = (int) samples.vectorStep;
        final int ofs = pos * step + samples.ofs;
        switch (logN) { // in comments below xK, rK, yK means the source, the reordered source and the result
            case 1: {
                for (int k = ofs, kMax = k + samples.vectorLen; k < kMax; k++) {
                    float temp = values[k] - values[k + step];
                    values[k] += values[k + step];          // y0 = x0 + x1
                    values[k + step] = temp;                // y1 = x0 - x1
                }
                break;
            }
            case 2: {
                for (int k = ofs, kMax = k + samples.vectorLen; k < kMax; k++) {
                    float d01 = values[k] - values[k + step];
                    float s01 = values[k] + values[k + step];
                    float d23 = values[k + 2 * step] - values[k + 3 * step];
                    float s23 = values[k + 2 * step] + values[k + 3 * step];
                    values[k] = s01 + s23;                  // y0 = r0 + r1 + r2 + r3 = x0 + x2 + x1 + x3
                    values[k + step] = d01 + d23;           // y1 = r0 - r1 + r2 - r3 = x0 - x2 + x1 - x3
                    values[k + 2 * step] = s01 - s23;       // y2 = r0 + r1 - r2 - r3 = x0 + x2 - x1 - x3
                    values[k + 3 * step] = d01 - d23;       // y3 = r0 - r1 - r2 + r3 = x0 - x2 - x1 + x3
                }
                break;
            }
            case 3: {
                for (int k = ofs, kMax = k + samples.vectorLen; k < kMax; k++) {
                    float d01 = values[k] - values[k + step];
                    float s01 = values[k] + values[k + step];
                    float d23 = values[k + 2 * step] - values[k + 3 * step];
                    float s23 = values[k + 2 * step] + values[k + 3 * step];
                    float ds0123 = s01 - s23;
                    float ss0123 = s01 + s23;
                    float dd0123 = d01 - d23;
                    float sd0123 = d01 + d23;
                    float s45 = values[k + 4 * step] + values[k + 5 * step];
                    float s67 = values[k + 6 * step] + values[k + 7 * step];
                    float d45 = values[k + 4 * step] - values[k + 5 * step];
                    float d67 = values[k + 6 * step] - values[k + 7 * step];
                    float ds4567 = s45 - s67;
                    float ss4567 = s45 + s67;
                    values[k + 4 * step] = ss0123 - ss4567; // y4 = r0 + r1 + r2 + r3 - r4 - r5 - r6 - r7
                    values[k] = ss0123 + ss4567;            // y0 = r0 + ... + r7 = x0 + ... + x7
                    values[k + 6 * step] = ds0123 - ds4567; // y6 = r0 + r1 - r2 - r3 - r4 - r5 + r6 + r7
                    //    = x0 + x4 - x2 - x6 - x1 - x5 + x3 + x7
                    values[k + 2 * step] = ds0123 + ds4567; // y2 = r0 + r1 - r2 - r3 + r4 + r5 - r6 - r7
                    //    = x0 + x4 - x2 - x6 + x1 + x5 - x3 - x7
                    d45 *= SQRT2;
                    d67 *= SQRT2;
                    values[k + 5 * step] = sd0123 - d45;    // y5 = r0 - r1 + r2 - r3 - sqrt(2)*r4 + sqrt(2)*r5
                    //    = x0 - x4 + x2 - x6 - sqrt(2)*x1 + sqrt(2)*x5
                    values[k + step] = sd0123 + d45;        // y1 = r0 - r1 + r2 - r3 + sqrt(2)*r4 - sqrt(2)*r5
                    //    = x0 - x4 + x2 - x6 + sqrt(2)*x1 - sqrt(2)*x5
                    values[k + 7 * step] = dd0123 - d67;    // y7 = r0 - r1 - r2 + r3 - sqrt(2)*r6 + sqrt(2)*r7
                    //    = x0 - x4 - x2 + x6 - sqrt(2)*x3 + sqrt(2)*x7
                    values[k + 3 * step] = dd0123 + d67;    // y3 = r0 - r1 - r2 + r3 + sqrt(2)*r6 - sqrt(2)*r7
                    //    = x0 - x4 - x2 + x6 + sqrt(2)*x3 - sqrt(2)*x7

                }
                break;
            }
            default: {
                assert logN > 3;
                final int nDiv8 = 1 << (logN - 3);
                final int nDiv4 = nDiv8 * 2;
                final int nDiv2 = nDiv4 * 2;
                final int nDiv8Step = nDiv8 * step;
                final int nDiv4Step = nDiv8Step * 2;
                final int nDiv2Step = nDiv4Step * 2;
                fhtJavaFloatMultidimensionalMainLoop(context, samples, pos, logN - 1, originalLogN, work);
                fhtJavaFloatMultidimensionalMainLoop(context, samples, pos + nDiv2, logN - 1, originalLogN, work);

                final boolean allAnglesInCache = logN - 1 <= RootsOfUnity.LOG_CACHE_SIZE + LOG_ANGLE_STEP;
                final double rotationAngle = StrictMath.PI / nDiv2;
                final double sin0Half = RootsOfUnity.LOGARITHMICAL_SINE_CACHE[logN]; //sin(0.5 * rotationAngle);
                final double cos0M1 = -2.0 * sin0Half * sin0Half; // cos(rotationAngle)-1
                final double sin0 = RootsOfUnity.LOGARITHMICAL_SINE_CACHE[logN - 1]; //sin(rotationAngle)
                double cos = 1.0 + cos0M1;
                double sin = sin0;
                int j1, j2, j1Step, j2Step;
                float r1, r2, cas, l1, l2;
                for (int j = 1, jStep = step; j < nDiv8; j++, jStep += step) {
                    // (cos, sin) = (cos(j*2*PI/N),  sin(j*2*PI/N))
                    for (int k = ofs, kMax = k + samples.vectorLen; k < kMax; k++) {
                        int j1Step1 = k + jStep;
                        int j2Step1 = k + nDiv2Step - jStep;
                        //[[Repeat.SectionStart JavaFloatMultidimensionalButterfly]]
                        r1 = values[nDiv2Step + j1Step1];
                        r2 = values[nDiv2Step + j2Step1];
                        cas = (float) (r1 * cos + r2 * sin);
                        l1 = values[j1Step1];
                        values[j1Step1] = l1 + cas;
                        values[nDiv2Step + j1Step1] = l1 - cas;
                        cas = (float) (r1 * sin - r2 * cos);
                        l2 = values[j2Step1];
                        values[j2Step1] = l2 + cas;
                        values[nDiv2Step + j2Step1] = l2 - cas;
                        //[[Repeat.SectionEnd JavaFloatMultidimensionalButterfly]]

                        j1Step1 = k + nDiv4Step - jStep;
                        j2Step1 = k + nDiv4Step + jStep;
                        //[[Repeat(INCLUDE_FROM_FILE, THIS_FILE, JavaFloatMultidimensionalButterfly)
                        //  cos ==> TTTT;; sin ==> cos;; TTTT ==> sin !! Generated by Repeater: DO NOT EDIT !! ]]
                        r1 = values[nDiv2Step + j1Step1];
                        r2 = values[nDiv2Step + j2Step1];
                        cas = (float) (r1 * sin + r2 * cos);
                        l1 = values[j1Step1];
                        values[j1Step1] = l1 + cas;
                        values[nDiv2Step + j1Step1] = l1 - cas;
                        cas = (float) (r1 * cos - r2 * sin);
                        l2 = values[j2Step1];
                        values[j2Step1] = l2 + cas;
                        values[nDiv2Step + j2Step1] = l2 - cas;
                        //[[Repeat.IncludeEnd]]
                    }
                    if ((j & (ANGLE_STEP - 1)) == ANGLE_STEP - 1) {
                        if (allAnglesInCache) {
                            int angleIndex = (j + 1) >> LOG_ANGLE_STEP
                                    << (RootsOfUnity.LOG_CACHE_SIZE - (logN - 1) + LOG_ANGLE_STEP);
                            // (j + 1) * CACHE_SIZE/nDiv2
                            // cos = RootsOfUnity.quickCos(angleIndex);
                            // sin = RootsOfUnity.quickSin(angleIndex);
                            cos = angleIndex < RootsOfUnity.HALF_CACHE_SIZE ?
                                    RootsOfUnity.SINE_CACHE[RootsOfUnity.HALF_CACHE_SIZE - angleIndex] :
                                    -RootsOfUnity.SINE_CACHE[angleIndex - RootsOfUnity.HALF_CACHE_SIZE];
                            sin = angleIndex < RootsOfUnity.HALF_CACHE_SIZE ?
                                    RootsOfUnity.SINE_CACHE[angleIndex] :
                                    RootsOfUnity.SINE_CACHE[RootsOfUnity.CACHE_SIZE - angleIndex];
                        } else {
                            double angle = (j + 1) * rotationAngle;
                            double sinHalf = Math.sin(0.5 * angle);
                            cos = 1.0 - 2.0 * sinHalf * sinHalf; // = cos(angle)
                            sin = Math.sin(angle);
                        }
                    } else {
                        // on modern JVM and CPU, the following 4 multiplications and 4 additions work faster
                        // then extracting cos and sin from the table
                        double temp = cos;
                        cos = cos * cos0M1 - sin * sin0 + cos;
                        sin = sin * cos0M1 + temp * sin0 + sin;
                    }
                }
                for (int k = ofs, kMax = k + samples.vectorLen; k < kMax; k++) {
                    j1Step = k + nDiv8Step;
                    j2Step = k + nDiv2Step - nDiv8Step;
                    //[[Repeat(INCLUDE_FROM_FILE, THIS_FILE, JavaFloatMultidimensionalButterfly)
                    //  (cos|sin) ==> HALF_SQRT2;;
                    //  \s\s\s\s(r1\s|r2\s|cas\s|l1\s|l2\s|values\b|$) ==> $1 !! Generated by Repeater: DO NOT EDIT !! ]]
                    r1 = values[nDiv2Step + j1Step];
                    r2 = values[nDiv2Step + j2Step];
                    cas = (float) (r1 * HALF_SQRT2 + r2 * HALF_SQRT2);
                    l1 = values[j1Step];
                    values[j1Step] = l1 + cas;
                    values[nDiv2Step + j1Step] = l1 - cas;
                    cas = (float) (r1 * HALF_SQRT2 - r2 * HALF_SQRT2);
                    l2 = values[j2Step];
                    values[j2Step] = l2 + cas;
                    values[nDiv2Step + j2Step] = l2 - cas;
                    //[[Repeat.IncludeEnd]]

                    float temp = values[k] - values[k + nDiv2Step];
                    values[k] += values[k + nDiv2Step];         // r[0] + r[n/2]
                    values[k + nDiv2Step] = temp;               // r[0] - r[n/2]

                    temp = values[k + nDiv4Step] - values[k + nDiv2Step + nDiv4Step];
                    values[k + nDiv4Step] += values[k + nDiv2Step + nDiv4Step]; // r[n/4] + r[3*n/4]
                    values[k + nDiv2Step + nDiv4Step] = temp;                   // r[n/4] - r[3*n/4]
                }
            }
            if (context != null && (pos + (1 << logN) == samples.length())) {
                context.checkInterruptionAndUpdateProgress(null, logN, originalLogN);
            }
        }
    }

    //[[Repeat.AutoGeneratedStart !! Auto-generated: NOT EDIT !! ]]
    private static void fhtJavaDoubleMultidimensionalMainLoop(
            ArrayContext context,
            RealVectorSampleArray.DirectRealDoubleVectorSampleArray samples,
            final int pos, final int logN, final int originalLogN,
            RealVectorSampleArray.DirectRealDoubleVectorSampleArray work) {
        final double[] values = samples.samples;
        final int step = (int) samples.vectorStep;
        final int ofs = pos * step + samples.ofs;
        switch (logN) { // in comments below xK, rK, yK means the source, the reordered source and the result
            case 1: {
                for (int k = ofs, kMax = k + samples.vectorLen; k < kMax; k++) {
                    double temp = values[k] - values[k + step];
                    values[k] += values[k + step];          // y0 = x0 + x1
                    values[k + step] = temp;                // y1 = x0 - x1
                }
                break;
            }
            case 2: {
                for (int k = ofs, kMax = k + samples.vectorLen; k < kMax; k++) {
                    double d01 = values[k] - values[k + step];
                    double s01 = values[k] + values[k + step];
                    double d23 = values[k + 2 * step] - values[k + 3 * step];
                    double s23 = values[k + 2 * step] + values[k + 3 * step];
                    values[k] = s01 + s23;                  // y0 = r0 + r1 + r2 + r3 = x0 + x2 + x1 + x3
                    values[k + step] = d01 + d23;           // y1 = r0 - r1 + r2 - r3 = x0 - x2 + x1 - x3
                    values[k + 2 * step] = s01 - s23;       // y2 = r0 + r1 - r2 - r3 = x0 + x2 - x1 - x3
                    values[k + 3 * step] = d01 - d23;       // y3 = r0 - r1 - r2 + r3 = x0 - x2 - x1 + x3
                }
                break;
            }
            case 3: {
                for (int k = ofs, kMax = k + samples.vectorLen; k < kMax; k++) {
                    double d01 = values[k] - values[k + step];
                    double s01 = values[k] + values[k + step];
                    double d23 = values[k + 2 * step] - values[k + 3 * step];
                    double s23 = values[k + 2 * step] + values[k + 3 * step];
                    double ds0123 = s01 - s23;
                    double ss0123 = s01 + s23;
                    double dd0123 = d01 - d23;
                    double sd0123 = d01 + d23;
                    double s45 = values[k + 4 * step] + values[k + 5 * step];
                    double s67 = values[k + 6 * step] + values[k + 7 * step];
                    double d45 = values[k + 4 * step] - values[k + 5 * step];
                    double d67 = values[k + 6 * step] - values[k + 7 * step];
                    double ds4567 = s45 - s67;
                    double ss4567 = s45 + s67;
                    values[k + 4 * step] = ss0123 - ss4567; // y4 = r0 + r1 + r2 + r3 - r4 - r5 - r6 - r7
                    values[k] = ss0123 + ss4567;            // y0 = r0 + ... + r7 = x0 + ... + x7
                    values[k + 6 * step] = ds0123 - ds4567; // y6 = r0 + r1 - r2 - r3 - r4 - r5 + r6 + r7
                    //    = x0 + x4 - x2 - x6 - x1 - x5 + x3 + x7
                    values[k + 2 * step] = ds0123 + ds4567; // y2 = r0 + r1 - r2 - r3 + r4 + r5 - r6 - r7
                    //    = x0 + x4 - x2 - x6 + x1 + x5 - x3 - x7
                    d45 *= SQRT2;
                    d67 *= SQRT2;
                    values[k + 5 * step] = sd0123 - d45;    // y5 = r0 - r1 + r2 - r3 - sqrt(2)*r4 + sqrt(2)*r5
                    //    = x0 - x4 + x2 - x6 - sqrt(2)*x1 + sqrt(2)*x5
                    values[k + step] = sd0123 + d45;        // y1 = r0 - r1 + r2 - r3 + sqrt(2)*r4 - sqrt(2)*r5
                    //    = x0 - x4 + x2 - x6 + sqrt(2)*x1 - sqrt(2)*x5
                    values[k + 7 * step] = dd0123 - d67;    // y7 = r0 - r1 - r2 + r3 - sqrt(2)*r6 + sqrt(2)*r7
                    //    = x0 - x4 - x2 + x6 - sqrt(2)*x3 + sqrt(2)*x7
                    values[k + 3 * step] = dd0123 + d67;    // y3 = r0 - r1 - r2 + r3 + sqrt(2)*r6 - sqrt(2)*r7
                    //    = x0 - x4 - x2 + x6 + sqrt(2)*x3 - sqrt(2)*x7

                }
                break;
            }
            default: {
                assert logN > 3;
                final int nDiv8 = 1 << (logN - 3);
                final int nDiv4 = nDiv8 * 2;
                final int nDiv2 = nDiv4 * 2;
                final int nDiv8Step = nDiv8 * step;
                final int nDiv4Step = nDiv8Step * 2;
                final int nDiv2Step = nDiv4Step * 2;
                fhtJavaDoubleMultidimensionalMainLoop(context, samples, pos, logN - 1, originalLogN, work);
                fhtJavaDoubleMultidimensionalMainLoop(context, samples, pos + nDiv2, logN - 1, originalLogN, work);

                final boolean allAnglesInCache = logN - 1 <= RootsOfUnity.LOG_CACHE_SIZE + LOG_ANGLE_STEP;
                final double rotationAngle = StrictMath.PI / nDiv2;
                final double sin0Half = RootsOfUnity.LOGARITHMICAL_SINE_CACHE[logN]; //sin(0.5 * rotationAngle);
                final double cos0M1 = -2.0 * sin0Half * sin0Half; // cos(rotationAngle)-1
                final double sin0 = RootsOfUnity.LOGARITHMICAL_SINE_CACHE[logN - 1]; //sin(rotationAngle)
                double cos = 1.0 + cos0M1;
                double sin = sin0;
                int j1, j2, j1Step, j2Step;
                double r1, r2, cas, l1, l2;
                for (int j = 1, jStep = step; j < nDiv8; j++, jStep += step) {
                    // (cos, sin) = (cos(j*2*PI/N),  sin(j*2*PI/N))
                    for (int k = ofs, kMax = k + samples.vectorLen; k < kMax; k++) {
                        int j1Step1 = k + jStep;
                        int j2Step1 = k + nDiv2Step - jStep;

                        r1 = values[nDiv2Step + j1Step1];
                        r2 = values[nDiv2Step + j2Step1];
                        cas = r1 * cos + r2 * sin;
                        l1 = values[j1Step1];
                        values[j1Step1] = l1 + cas;
                        values[nDiv2Step + j1Step1] = l1 - cas;
                        cas = r1 * sin - r2 * cos;
                        l2 = values[j2Step1];
                        values[j2Step1] = l2 + cas;
                        values[nDiv2Step + j2Step1] = l2 - cas;


                        j1Step1 = k + nDiv4Step - jStep;
                        j2Step1 = k + nDiv4Step + jStep;

                        r1 = values[nDiv2Step + j1Step1];
                        r2 = values[nDiv2Step + j2Step1];
                        cas = r1 * sin + r2 * cos;
                        l1 = values[j1Step1];
                        values[j1Step1] = l1 + cas;
                        values[nDiv2Step + j1Step1] = l1 - cas;
                        cas = r1 * cos - r2 * sin;
                        l2 = values[j2Step1];
                        values[j2Step1] = l2 + cas;
                        values[nDiv2Step + j2Step1] = l2 - cas;

                    }
                    if ((j & (ANGLE_STEP - 1)) == ANGLE_STEP - 1) {
                        if (allAnglesInCache) {
                            int angleIndex = (j + 1) >> LOG_ANGLE_STEP
                                    << (RootsOfUnity.LOG_CACHE_SIZE - (logN - 1) + LOG_ANGLE_STEP);
                            // (j + 1) * CACHE_SIZE/nDiv2
                            // cos = RootsOfUnity.quickCos(angleIndex);
                            // sin = RootsOfUnity.quickSin(angleIndex);
                            cos = angleIndex < RootsOfUnity.HALF_CACHE_SIZE ?
                                    RootsOfUnity.SINE_CACHE[RootsOfUnity.HALF_CACHE_SIZE - angleIndex] :
                                    -RootsOfUnity.SINE_CACHE[angleIndex - RootsOfUnity.HALF_CACHE_SIZE];
                            sin = angleIndex < RootsOfUnity.HALF_CACHE_SIZE ?
                                    RootsOfUnity.SINE_CACHE[angleIndex] :
                                    RootsOfUnity.SINE_CACHE[RootsOfUnity.CACHE_SIZE - angleIndex];
                        } else {
                            double angle = (j + 1) * rotationAngle;
                            double sinHalf = Math.sin(0.5 * angle);
                            cos = 1.0 - 2.0 * sinHalf * sinHalf; // = cos(angle)
                            sin = Math.sin(angle);
                        }
                    } else {
                        // on modern JVM and CPU, the following 4 multiplications and 4 additions work faster
                        // then extracting cos and sin from the table
                        double temp = cos;
                        cos = cos * cos0M1 - sin * sin0 + cos;
                        sin = sin * cos0M1 + temp * sin0 + sin;
                    }
                }
                for (int k = ofs, kMax = k + samples.vectorLen; k < kMax; k++) {
                    j1Step = k + nDiv8Step;
                    j2Step = k + nDiv2Step - nDiv8Step;

                    r1 = values[nDiv2Step + j1Step];
                    r2 = values[nDiv2Step + j2Step];
                    cas = r1 * HALF_SQRT2 + r2 * HALF_SQRT2;
                    l1 = values[j1Step];
                    values[j1Step] = l1 + cas;
                    values[nDiv2Step + j1Step] = l1 - cas;
                    cas = r1 * HALF_SQRT2 - r2 * HALF_SQRT2;
                    l2 = values[j2Step];
                    values[j2Step] = l2 + cas;
                    values[nDiv2Step + j2Step] = l2 - cas;


                    double temp = values[k] - values[k + nDiv2Step];
                    values[k] += values[k + nDiv2Step];         // r[0] + r[n/2]
                    values[k + nDiv2Step] = temp;               // r[0] - r[n/2]

                    temp = values[k + nDiv4Step] - values[k + nDiv2Step + nDiv4Step];
                    values[k + nDiv4Step] += values[k + nDiv2Step + nDiv4Step]; // r[n/4] + r[3*n/4]
                    values[k + nDiv2Step + nDiv4Step] = temp;                   // r[n/4] - r[3*n/4]
                }
            }
            if (context != null && (pos + (1 << logN) == samples.length())) {
                context.checkInterruptionAndUpdateProgress(null, logN, originalLogN);
            }
        }
    }

    //[[Repeat.AutoGeneratedEnd]]
}