dsiutils.2.6.0.source-code.CHANGES Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of dsiutils Show documentation
Show all versions of dsiutils Show documentation
The DSI utilities are a mishmash of classes accumulated during the last twenty years in projects developed at the DSI (Dipartimento di Scienze dell'Informazione, i.e., Information Sciences Department), now DI (Dipartimento di Informatica, i.e., Informatics Department), of the Universita` degli Studi di Milano.
2.6.0
- New, slightly faster PRNGs using the ++ scrambler.
2.5.5
- Faster, no-test versions of Fast.int2nat() and Fast.nat2int().
- ByteBufferLongBigList instances are now mutable.
2.5.4
- Fixed old-standing bug in LongArrayBitVector.{next|previous}{One|Zero}().
2.5.3
- Removed boundary checks from LongArrayBitVector.
2.5.2
- Fixed dependence from SLF4J.
2.5.1
- Improved jump()/longJump() methods for generating non-overlapping
sequences. Thanks to Danilo Fabiani and Tim Mattox for reporting
problems and suggesting solutions.
2.5.0
- New PRNGs.
- ObjectParser now has default factory methods getInstance,
newInstance and valueOf.
2.4.2
- Fixed bug in nextInt(1) call for xoroshiro128+ and xorshift1024*φ
implementations.
2.4.1
- xorshift1024*φ generators
- More Java 8-ish code.
- Eliminated spaces inside ( and [.
- Removed multiplication-free nextDouble()/nextFloat() from all PRNG
implementations. Now they all generate floating-point numbers using 53
and 24 significant bits, respectively. The difference is only
in the least significant bit of half the values generated (previously
it would have always been zero). A new methods with a Fast suffix
have been added to obtain the old behavior.
- In xoroshiro128+ and xorshift1024*φ implementations, nextInt()
implementations now return the upper half (instead of the lower half)
and the shortcut for for powers of two in nextInt() and nextLong() now
returns the high bits instead of the low bit.
2.4.0
- Java 8-only, fastutil 8.
2.3.6
- Added ReorderingBlockingQueue.
2.3.5
- Moved logback-classic dependency from compile to runtime.
- Fixed licence in Kahan-summation classes.
- Fast constructor for LongArrayBitVector instances of given length.
2.3.4
- Now all RandomGenerator implementations implement Serializable,
so all generators can be serialized and restarted in the same
state.
2.3.3
- xoroshiro128+ implementations and new JMH-based speed measurements.
- SplitMix64 now provides a truly uniform nextInt(int): the generated
sequence, however, will not be the same.
- SplitMix64 and XorShift1024Star will now return different results for
nextLong(long)/nextInt(int) when the argument is a power of 2 (but more
quickly).
2.3.2
- New JSAP StringParser for enums.
2.3.1
- Fixed bug in PrefixFreeTransformationStrategy.
2.3.0
- New transformation strategies for byte arrays.
- Several new, high-performance raw transformation strategies that do
not reverse bit order.
2.2.8
- New class FileLinesByteArrayCollection.
2.2.7
- New high-performance raw transformation strategy for byte arrays.
2.2.6
- Fixed jump() function of xorshift1024* generators.
2.2.5
- All pseudorandom number generators now use a smart, faster conversion to
float/double, SplitMix64 is used everywhere to generate a large state
from a seed, and the shifts of XorShift128Plus have been slightly
modified to obtain a better polynomial weight. The generators should be
pretty stable now (hopefully).
2.2.4
- New InputBitStream.copyto()/OutputBitStream.copyFrom() methods.
2.2.3
- Improved speed for Fast.select() thanks to an idea of Giuseppe
Ottaviano.
- New SplitMix64Random PRNG that instantiates Java 8's SplittableRandom
and provides the fastest generator passing BigCrush using only 64 bits
of state.
- Now all nextBoolean() methods in PRNGs check for nextLong)() < 0, which
is faster. 1024-bit generators with a large state space are now
initialized using a SplitMix64RandomGenerator, which will cause, given a
seed, a different sequence to be generated.
- XorShift64Random[Generator] has been deprecated, as
SplitMix64Random[Generator] is better with every respect.
- MutableString's method that read UTF-8 strings are now deprecated,
as they don't handle properly surrogate pairs. The self-delimiting
methods have not been deprecated, but a warning has been added to
the documentation.
2.2.2
- Fixed Maven dependencies.
2.2.1
- Some jackknife BigDecimal operations were not propagating the math
context.
- Slightly improved speed of InputBitStream.readUnary() methods (thanks to
Giuseppe Ottaviano for the tip).
2.2.0
- Removed all deprecated dependencies from Log4J.
- Removed deprecated methods in Fast.
- Removed OfflineIterable.length() method.
- Removed Utf16TransformationStrategy, IntHyperLogLogCounterArray, XorShiftStarRandom, XorShiftStarRandomGenerator.
2.1.10
- Fixed wrong computation of local speed upon restart of a ProgressLogger.
- New XorShift128PlusRandom generators. These are currently the fastest
generators that have no systematic failure in BigCrush, albeit the state
space is good only for single-thread usage.
2.1.9
- Fixed iterator bug in FrontCodedStringList, which moreover now implements
java.util.RandomAccess.
- Made HyperLogLogCounterArray.counterLongwords public.
2.1.8
- Many utility methods previously in WebGraph's HyperBall class have been
moved to HyperLogLogCounterArray.
2.1.7
- Fixed (again, sorry, I think it will be the last time :) the parameters
of xorshift* generators.
- XorShiftStarRandom[Generator] has been deprecated in favour of
XorShift64StarRandom[Generator] 2.1.6. The new class has slightly
improved parameters and a seeding procedure that avoids "zeroland"
transient behaviour when seeding with small numbers (I do it all the
time...).
- SummaryStats is now thread safe.
2.1.5
- Further replaced BloomFilter methods (hopefully for the last time).
- Switched to standard vByte coding in ByteDiskQueue (the previous
coding, mistakenly called Group Varint, was slower).
2.1.4
- Replaced BloomFilter.contains(Hasher) with
BloomFilter.contains(HashCode). The first method is now deprecated, as
it calls Hasher.hash(), which is not idempotent.
- Fixed bug in the value returned by LongArrayBitVector.set(int,boolean).
2.1.3
- Replaced BloomFilter.add(Hasher) with BloomFilter.add(HashCode). The
first method is now deprecated, as it calls Hasher.hash(), which is
not idempotent.
2.1.2
- New class for Kahan summations.
2.1.1
- New method InputBitStream.position().
2.1
- Decided the final values, finally, for all constants in XorShiftStar
pseudorandom number generators, included the new XorShift1024Star
generator. These classes are no longer experimental.
- Added isEmpty() method to MutableString.
- Added slower UTF-32 transformation strategy that handles surrogate pairs.
- New PermutedFrontCodedStringList has the option of passing a text file
for the permutation, and instructions on how to build the permutation
using just UN*X commands (suggested by Peter Mika).
- Added file: functionality to ObjectParser.
- BloomFilter now supports custom hashers.
- Util.format() was localized, leading to inconsistent formatting. Thanks to
Nicola Tonellotto for reporting this bug.
- New ByteDiskQueue for handling large queues.
- We no longer accept repeated characters in the string specifying the
additional word constituents of a FastBufferedReader.
- New ProgressLogger.start() methods that make it possible to start
ahead of current time.
- Fixed very old set of pernicious bugs in reading/writing coded longs in
bit streams due to missing L's in constant literals.
- IntHyperLogLogCounterArray has been deprecated in favour of HyperLogLogCounterArray,
which supports long indices and values.
2.0.15
- OfflineIterable instances can now be reused.
- DelimitedWordReader now implements copy() correctly.
2.0.14
- ProgressLogger can now fix the time units used for speed and time per
item.
- The one-argument constructor of ByteBufferLongBigList was not computing
the parameters passed to the chained constructor correctly. Thanks to
Tim Potter for reporting and fixing this bug.
2.0.13
- New LongBigArraySignedStringMap for signing very large maps
and keep the signatures on disk, memory mapped.
- More integer overflow bugs fixes (thanks to Tim Potter).
2.0.12
- The fix to it.unimi.dsi.big.util.ShiftAddXorSignedStringMap was
not complete.
2.0.11
- Removed VOID_FUNNEL from BloomFilter (it wasn't really necessary).
- We did not *really* switch to SLF4J for logging. We now do. In the
process, we lost serialization compatibility for some classes, and some
loggers that were useful only in the main method are no logger cached in
static fields.
- it.unimi.dsi.big.util.ShiftAddXorSignedStringMap was not implementing
Size64 and not using size64(). Now it does. Thanks to Tim Potter for
reporting this problem.
- it.unimi.dsi.big.util.LiterallySignedStringMap was not implementing
Size64. Now it does.
2.0.10
- LongArrayBitVector now exploits the fact that shift operations
use just the lower necessary bits of their right operand, avoiding
a few masking operations.
- BloomFilter has been significantly enhanced and it is not longer
binary compatible with its previous version.
- We switched to SLF4J for logging.
- ProgressLogger can now optionally display the local speed, that is, the
speed shown by the process since the last log. It now also based on
SLF4J and Log4J constructors have been deprecated.
- All Log4J-related methods in Util have been deprecated. If
ConsoleAppender cannot be found (e.g., because you are using a bridging
package that mimics Log4J) a warning is issued.
- Util used to share a static number format without synchronization.
Now old format methods are synchronized, and there are new equivalent
method to which you can supply your own NumberFormat.
2.0.9
- Fixed again (hopefully in the right way this time) the Ivy dependency
file.
2.0.8
- Several methods in Fast now use equivalent methods in Integer/Long or
have just been deprecated in favour of their copies, as recent JVMs
intrinsify such methods.
- The Ivy dependency file has been Maven-normalized (now we have a
"compile" scope instead of a "build" scope). Thanks to Kazó Csaba for
reporting this issue.
2.0.7
- New InputBitStream methods for multiple coded writing (e.g.,
writeGammas()).
- New static methods in it.unimi.dsi.big.util.StringMaps offer
big views of standard string and prefix maps.
2.0.6
- New NullLogger for throwing away all logging.
2.0.4
- Fixed jar building issues.
2.0.3
- new Jackknife class to compute arbitrary-precision jackknife estimates.
- Fixed bug in SummaryStats.max().
- Now property files use "=" instead of " = " as separator, thanks to the
new setter in PropertiesConfigurationLayout, and are thus sourceable.
- Now XorShiftStarRandom uses the right number of bits to generate doubles
and floats: the previous version could occasionally generate 1.
- New ByteBufferLongBigList class (along the lines of
ByteBufferInputStream) that exposes arbitrarily large binary files of
longs (with settable endianness) as a big list.
- We now use Ivy to handle jar dependencies.
2.0.2
- Now ByteBufferInputStream initialises copies lazily, which should help
when mapping large files with concurrent access (e.g., WebGraph's).
- Now XorShiftStarRandom uses 63 bits to generate doubles, and provides
truly uniform nextInt(int) and nextLong(long) methods.
- Fixed computation of singular names in ProgressLogger.
- Now IntHyperLogLogCounterArray has a count() method that returns the
estimate of a single counter.
2.0.1
- Now there are permutation methods for big arrays.
- New SummaryStats class for basic statistics bookkeeping (I know,
there are billions of those, but we wanted something really
simple, fast, and supporting >2^31 values).
2.0
- WARNING: This release has minor binary incompatibilities with previous
releases, mainly due to the move from the interface
it.unimi.dsi.util.LongBigList to the now standard
it.unimi.dsi.fasutil.longs.LongBigList. It is part of a parallel release
of fastutil, the DSI Utilities, Sux4J, MG4J, WebGraph, etc. that were
all modified to fit the new interface, and that prepare the way for our
"big" versions, that is, supporting >2^31 entries in arrays (simulated),
elements in lists, terms, documents, nodes, etc. Please read our (short)
"Moving Java to Big Data" document (JavaBig.pdf) for details.
- We now require Java 6.
- it.unimi.dsi.util.LongBigList is dead. Long live to
it.unimi.dsi.fastutil.longs.LongBigList. We're sorry for the
nuisance--adapting the code should be very easy (and we warned you
anyway :).
- New it.unimi.dsi.big namespace for classes that need redesign to handle
collections with more than Integer.MAX_VALUE elements. Presently we
provide an adaptation of all classes built around StringMap. The classes
have the same name of the non-big counterparts, so watch out.
- Now you can save directly a synchronised ImmutableExternalPrefixMap.
- Instances of an ImmutableExternalPrefixMap contain now a small cache
that will store the most recent answers.
- Transformation strategies now have a length method that computes the
length of a representation (possibly without actually computing the
representation).
- New method BitVector.equals(v,from,to) that compares segments of
two bit vectors.
- A ProgressLogger will now print the time per item (besides the
number of items per time unit). Some patterns compute the
correct singular from the plural item name.
- The automatic Log4J configuration system will now log on stderr
instead of stdout.
- Fixed very old bug of Interval: the implemented sorted set was
returning iterators with an off-by-one starting point.
- Fixed very old bug in OutputBitStream.position() for wrapped output
streams. Thanks to Soumen Chakrabarti for reporting this bug.
1.0.12
- The DSI utilities are now distributed under the GNU Lesser General Public
License 3.
- We are no longer dependent on COLT, thanks to the new sorting stuff in
fastutil (which, in fact, is partially borrowed from COLT) and
java.security.SecureRandom.
- New IntHyperLogLogCounterArray that computes estimates of unique elements in
streams using HyperLogLog counters.
- LongArrayBitVectors now check that its length does not exceed 2^37.
- Now ProgressLogger makes it possible to update with a given value, or
to update and force display. Moreover, times per time interval are
scaled so slow processes gets items per minute, hour or day.
- New Util methods to manipulate and generate permutations.
- New Util.randomSeed()/Util.randomSeedBytes() methods that generates a
reasonable seed using System.nanoTime() and MurmurHash3.
- We now check that the buffer of a FastBufferedReader is of nonzero length.
- Fast flip() methods in LongArrayBitVector.
1.0.11
- BitVectors.readFast()/BitVectors.writeFast() now use more liberal interfaces.
- Equality in AbstractBitVector is computed quickly word by word, and not
scanning bit by bit (ouch!).
- The testForPosition boolean flag in InputBitStream and OutputBitStream
constructors now just inhibits just the very slow reflective test for
the existence of a getChannel() method. Conformance to the
RepositionableStream interface is always checked (requested by Bryan
Thompson).
- {Input,Output}BitStream.close() was throwing a null pointer exception in
certain cases (thanks to Bryan Thompson for finding and fixing this bug).
- CanonicalFast64CodeWordDecoder now works with 0 or 1 symbols (thanks to
Bryan Thompson for letting me notice this problem).
- IntBloomFilter was initialising incorrectly the field m (thanks to David
Greenspan for finding and fixing this bug).
- NullOutputStream is now a RepositionableSteam. In this way, the creation
of an OutputBitStream wrapping a NullOutputStream is much quicker.
- Fixed integer overflow bugs in LongArrayBitVector.wrap() and elsewhere.
1.0.10
- Fixed return type of writeLongZeta to int.
- New DelimitedWordReader class for tokenising streams by
delimiters (rather than by word constituents).
- Fixed a small bug in ObjectParser: string arguments were
trimmed even if delimited by quotes.
- ObjectParser will now accept null contexts.
- FastBufferedReader/DelimitedWordReader have toString()/toSpec()
methods.
- New MutableString.skipSelfDelimUTF8() method.
1.0.9
- New ByteBufferInputStream.map() method that will map any file into a
memory-mapped array of ByteBuffers and expose them as a measurable,
repositionable InputStream. Very useful to work around the 2GiB limit
of FileChannel.map().
- HuTuckerCodec can now get frequencies as a vector of longs.
HuTuckerTransformationStrategy uses the new constructor.
- Code in replace() and clear() methods was not using the fact
that unused parts of a LongArrayBitVector are maintained
clear. This caused a significant slowdown when replacing
or clearing vectors with a large backing array.
- New OfflineIterable class that makes it possible to dump and
reread quickly temporary data.
1.0.8
- InputBitStream and OutputBitStream have new constructors that make it
possible to skip the reflective test that are necessary to support
position(). This feature was requested by Bryan Thompson, who had to
create often a large number of bit streams. Additionally, there is a
constructor accepting a file input/output stream that invokes
getChannel() without using reflection.
- New Util.getDebugLogger() method for getting more detailed
debugging programmatically (e.g., when testing). More generally,
now it is possible to configure the log level when autoconfiguring
Log4J.
- BitVector.hashCode() has been finally nailed down and documented. The
chosen function seems to be reasonable and quick. Sorry if some
serialised classes out there depended on it...
- Fixed old bug in ImmutableBinaryTrie: a root with a non-empty
compacted path would have returned an incorrect approximated
interval on prefixes of that path. Thanks to Peni Nissani for
finding this bug.
- Slight impromevent to Fast.select(), which was using a few redundant
instruction when doing broadword comparisons.
1.0.7
- Fixed obnoxious bug in LongArrayBitVector.replace(BitVector). In
some cases bit vectors with garbage after length could have been
generated.
- Fixed bug in LongArrayBitVector.longestCommonPrefixLength(). When
comparing a string to one of its prefix the returned length was
sometimes incorrect.
- New methods BitVector.firstZero/BitVector.lastZero() with
fast implementations.
- New methods in BitVectors that dump on disk and expose
as an iterable object the bit vectors returned by an iteartor.
1.0.6
- New transformation strategy that remaps bit vectors so that they
are prefix free.
- Fixed a horrendous bug: AbstractBitVector was implementing equals(),
but not hashCode(). This fact was causing a number of problems with
collections. Since I had to fix it anyway, I built in a better
hash function, too.
1.0.5
- WARNING: ObjectParser now throws more accurate exception. As
an unpleasing side effect, it throws more checked exceptions
than it used to, so your code might not compile.
- New (obviously missing) interval comparators in Intervals.
- Fixed absolutely stupid bug in CanonicalFast64CodeWordDecoder:
the size of the lastCodeWordPlusOne array would have been
the number of codewords, and not the number of codeword
lengths. The resulting array was an order of magnitude larger
than necessary, which wasn't properly a bug, but very annoying
nonetheless (in particular when you generate thousands of
decoders, as we do for our large time-aware graphs).
- New optional context object in ObjectParser.
1.0.4
- FastBufferedReader has now a more flexible approach to word
segmentation.
1.0.3
- ShiftAddXorSignedStringMap moved here from Sux4J. It can
sign any given function (usually one from Sux4J).
- Fixed bug in LongArrayBitVector.length(long) (the
underlying long array was not cleared properly).
- Optimised implementations for LongArrayBitVector.{fill,flip}().
- Fixed very stupid but pernicious bug in ImmutableExternalPrefixMap,
which was actually a bug in AbstractPrefixMap: the default
return value of the underlying function should have been
set to -1.
1.0.2
- Fixed very old bug in MutableString.print(PrintStream).
- New methods in BitVector. In particular, fast().
- Fixed bug in LongArrayBitVector.replace(LongArrayBitVector).
- Fixed bug in LongArrayBitVector.copy(LongArrayBitVector).
- New ISO-8859-1 transformation strategy.
- Double (CharSequence/MutableString) implementation for
UTF16/ISO-8859-1 transformation strategies.
- Several performance enhancements to bit vector classes.
- A series of wrappers in TransformationStrategies apply
transforms to all the elements of collection-like objects.
1.0.1
- New method BitVector.append(BitVector).
- PrefixCoderTransformationStrategy and UTF16TransformationStrategy
now are thread-safe, as they return a new instance at each call.
They now both support prefix-free and non-prefix-free encodings.
- UTF16TransformationStrategy has been now replaced by two singleton
instances in TransformationStrategies.
- Significantly optimised LongArrayBitVector.getLong().
1.0
- First release.