dsiutils.2.6.9.source-code.CHANGES Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of dsiutils Show documentation
The DSI utilities are a mish mash of classes accumulated during the last ten years in projects developed at the DSI (Dipartimento di Scienze dell'Informazione, i.e., Information Sciences Department), now DI (Dipartimento di Informatica, i.e., Informatics Department), of the Universita` degli Studi di Milano.
There is a newer version: 2.7.3
Show newest version
2.6.9

- Added automatic module name.

2.6.8

- Fixed missing data files from tests (thanks to Pierre Gruet for
  reporting this bug).

2.6.7

- New methods for composing permutations.

- Big versions of front-coded string lists.

2.6.6

- Fixed SLF4J dependencies (again).

2.6.5

- Fixed erroneous casting in LongBigArrayBitVector. Thanks to Louie
  Helm for reporting this bug.

2.6.4

- New LongBigArrayBitVector class for bit vectors longer than 2^37.

2.6.3

- Fixed major bug in splitting methods for pseudrandom number
  generators: they were not avancing the generator state, which
  means that multiple calls to split() would have returned
  identical generators.

2.6.2

- Fixed build.xml: we now use the classifier in naming artifacts.

2.6.0

- New, slightly faster PRNGs using the ++ scrambler.

2.5.5

- Faster, no-test versions of Fast.int2nat() and Fast.nat2int().

- ByteBufferLongBigList instances are now mutable.

2.5.4

- Fixed old-standing bug in LongArrayBitVector.{next|previous}{One|Zero}().

2.5.3

- Removed boundary checks from LongArrayBitVector.

2.5.2

- Fixed dependence from SLF4J.

2.5.1

- Improved jump()/longJump() methods for generating non-overlapping
  sequences. Thanks to Danilo Fabiani and Tim Mattox for reporting
  problems and suggesting solutions.

2.5.0

- New PRNGs.

- ObjectParser now has default factory methods getInstance,
  newInstance and valueOf.

2.4.2

- Fixed bug in nextInt(1) call for xoroshiro128+ and xorshift1024*φ
  implementations.

2.4.1

- xorshift1024*φ generators

- More Java 8-ish code.

- Eliminated spaces inside ( and [.

- Removed multiplication-free nextDouble()/nextFloat() from all PRNG
  implementations. Now they all generate floating-point numbers using 53
  and 24 significant bits, respectively. The difference is only
  in the least significant bit of half the values generated (previously
  it would have always been zero). A new methods with a Fast suffix
  have been added to obtain the old behavior.
 
- In xoroshiro128+ and xorshift1024*φ implementations, nextInt()
  implementations now return the upper half (instead of the lower half)
  and the shortcut for for powers of two in nextInt() and nextLong() now
  returns the high bits instead of the low bit.

2.4.0

- Java 8-only, fastutil 8.

2.3.6

- Added ReorderingBlockingQueue.

2.3.5

- Moved logback-classic dependency from compile to runtime.

- Fixed licence in Kahan-summation classes.

- Fast constructor for LongArrayBitVector instances of given length.

2.3.4

- Now all RandomGenerator implementations implement Serializable,
  so all generators can be serialized and restarted in the same 
  state.

2.3.3

- xoroshiro128+ implementations and new JMH-based speed measurements.

- SplitMix64 now provides a truly uniform nextInt(int): the generated
  sequence, however, will not be the same.

- SplitMix64 and XorShift1024Star will now return different results for
  nextLong(long)/nextInt(int) when the argument is a power of 2 (but more
  quickly).

2.3.2

- New JSAP StringParser for enums.

2.3.1

- Fixed bug in PrefixFreeTransformationStrategy.

2.3.0

- New transformation strategies for byte arrays.

- Several new, high-performance raw transformation strategies that do
  not reverse bit order.

2.2.8

- New class FileLinesByteArrayCollection.

2.2.7

- New high-performance raw transformation strategy for byte arrays.

2.2.6

- Fixed jump() function of xorshift1024* generators.

2.2.5

- All pseudorandom number generators now use a smart, faster conversion to
  float/double, SplitMix64 is used everywhere to generate a large state
  from a seed, and the shifts of XorShift128Plus have been slightly
  modified to obtain a better polynomial weight. The generators should be
  pretty stable now (hopefully).

2.2.4

- New InputBitStream.copyto()/OutputBitStream.copyFrom() methods.

2.2.3

- Improved speed for Fast.select() thanks to an idea of Giuseppe 
  Ottaviano.

- New SplitMix64Random PRNG that instantiates Java 8's SplittableRandom
  and provides the fastest generator passing BigCrush using only 64 bits
  of state.

- Now all nextBoolean() methods in PRNGs check for nextLong)() < 0, which
  is faster. 1024-bit generators with a large state space are now
  initialized using a SplitMix64RandomGenerator, which will cause, given a
  seed, a different sequence to be generated.

- XorShift64Random[Generator] has been deprecated, as
  SplitMix64Random[Generator] is better with every respect.

- MutableString's method that read UTF-8 strings are now deprecated,
  as they don't handle properly surrogate pairs. The self-delimiting
  methods have not been deprecated, but a warning has been added to
  the documentation.

2.2.2

- Fixed Maven dependencies.

2.2.1

- Some jackknife BigDecimal operations were not propagating the math
  context.

- Slightly improved speed of InputBitStream.readUnary() methods (thanks to
  Giuseppe Ottaviano for the tip).

2.2.0

- Removed all deprecated dependencies from Log4J.

- Removed deprecated methods in Fast.

- Removed OfflineIterable.length() method.

- Removed Utf16TransformationStrategy, IntHyperLogLogCounterArray, XorShiftStarRandom, XorShiftStarRandomGenerator.

2.1.10

- Fixed wrong computation of local speed upon restart of a ProgressLogger.

- New XorShift128PlusRandom generators. These are currently the fastest
  generators that have no systematic failure in BigCrush, albeit the state
  space is good only for single-thread usage.

2.1.9

- Fixed iterator bug in FrontCodedStringList, which moreover now implements
  java.util.RandomAccess.

- Made HyperLogLogCounterArray.counterLongwords public.

2.1.8

- Many utility methods previously in WebGraph's HyperBall class have been
  moved to HyperLogLogCounterArray.

2.1.7

- Fixed (again, sorry, I think it will be the last time :) the parameters
  of xorshift* generators.

- XorShiftStarRandom[Generator] has been deprecated in favour of
  XorShift64StarRandom[Generator] 2.1.6. The new class has slightly
  improved parameters and a seeding procedure that avoids "zeroland"
  transient behaviour when seeding with small numbers (I do it all the
  time...).

- SummaryStats is now thread safe.

2.1.5

- Further replaced BloomFilter methods (hopefully for the last time).

- Switched to standard vByte coding in ByteDiskQueue (the previous
  coding, mistakenly called Group Varint, was slower).

2.1.4

- Replaced BloomFilter.contains(Hasher) with
  BloomFilter.contains(HashCode). The first method is now deprecated, as
  it calls Hasher.hash(), which is not idempotent.

- Fixed bug in the value returned by LongArrayBitVector.set(int,boolean).

2.1.3

- Replaced BloomFilter.add(Hasher) with BloomFilter.add(HashCode). The 
  first method is now deprecated, as it calls Hasher.hash(), which is
  not idempotent.

2.1.2

- New class for Kahan summations.

2.1.1

- New method InputBitStream.position().

2.1

- Decided the final values, finally, for all constants in XorShiftStar
  pseudorandom number generators, included the new XorShift1024Star
  generator. These classes are no longer experimental.

- Added isEmpty() method to MutableString.

- Added slower UTF-32 transformation strategy that handles surrogate pairs.

- New PermutedFrontCodedStringList has the option of passing a text file
  for the permutation, and instructions on how to build the permutation
  using just UN*X commands (suggested by Peter Mika).

- Added file: functionality to ObjectParser.

- BloomFilter now supports custom hashers.

- Util.format() was localized, leading to inconsistent formatting. Thanks to
  Nicola Tonellotto for reporting this bug.

- New ByteDiskQueue for handling large queues.

- We no longer accept repeated characters in the string specifying the
  additional word constituents of a FastBufferedReader.

- New ProgressLogger.start() methods that make it possible to start
  ahead of current time.

- Fixed very old set of pernicious bugs in reading/writing coded longs in
  bit streams due to missing L's in constant literals.

- IntHyperLogLogCounterArray has been deprecated in favour of HyperLogLogCounterArray,
  which supports long indices and values.

2.0.15

- OfflineIterable instances can now be reused.

- DelimitedWordReader now implements copy() correctly.

2.0.14

- ProgressLogger can now fix the time units used for speed and time per
  item.

- The one-argument constructor of ByteBufferLongBigList was not computing
  the parameters passed to the chained constructor correctly. Thanks to
  Tim Potter for reporting and fixing this bug.

2.0.13

- New LongBigArraySignedStringMap for signing very large maps
  and keep the signatures on disk, memory mapped.

- More integer overflow bugs fixes (thanks to Tim Potter).

2.0.12

- The fix to it.unimi.dsi.big.util.ShiftAddXorSignedStringMap was
  not complete.

2.0.11

- Removed VOID_FUNNEL from BloomFilter (it wasn't really necessary).

- We did not *really* switch to SLF4J for logging. We now do. In the
  process, we lost serialization compatibility for some classes, and some
  loggers that were useful only in the main method are no logger cached in
  static fields.

- it.unimi.dsi.big.util.ShiftAddXorSignedStringMap was not implementing
  Size64 and not using size64(). Now it does. Thanks to Tim Potter for
  reporting this problem.

- it.unimi.dsi.big.util.LiterallySignedStringMap was not implementing
  Size64. Now it does.

2.0.10

- LongArrayBitVector now exploits the fact that shift operations
  use just the lower necessary bits of their right operand, avoiding
  a few masking operations.

- BloomFilter has been significantly enhanced and it is not longer
  binary compatible with its previous version.

- We switched to SLF4J for logging.

- ProgressLogger can now optionally display the local speed, that is, the
  speed shown by the process since the last log. It now also based on
  SLF4J and Log4J constructors have been deprecated.

- All Log4J-related methods in Util have been deprecated. If
  ConsoleAppender cannot be found (e.g., because you are using a bridging
  package that mimics Log4J) a warning is issued.

- Util used to share a static number format without synchronization.
  Now old format methods are synchronized, and there are new equivalent
  method to which you can supply your own NumberFormat.

2.0.9

- Fixed again (hopefully in the right way this time) the Ivy dependency
  file.

2.0.8

- Several methods in Fast now use equivalent methods in Integer/Long or
  have just been deprecated in favour of their copies, as recent JVMs
  intrinsify such methods.

- The Ivy dependency file has been Maven-normalized (now we have a
  "compile" scope instead of a "build" scope). Thanks to Kazó Csaba for
  reporting this issue.

2.0.7

- New InputBitStream methods for multiple coded writing (e.g.,
  writeGammas()).

- New static methods in it.unimi.dsi.big.util.StringMaps offer
  big views of standard string and prefix maps.

2.0.6

- New NullLogger for throwing away all logging.

2.0.4

- Fixed jar building issues.

2.0.3

- new Jackknife class to compute arbitrary-precision jackknife estimates.

- Fixed bug in SummaryStats.max().

- Now property files use "=" instead of " = " as separator, thanks to the
  new setter in PropertiesConfigurationLayout, and are thus sourceable.

- Now XorShiftStarRandom uses the right number of bits to generate doubles
  and floats: the previous version could occasionally generate 1.

- New ByteBufferLongBigList class (along the lines of
  ByteBufferInputStream) that exposes arbitrarily large binary files of
  longs (with settable endianness) as a big list.

- We now use Ivy to handle jar dependencies.

2.0.2

- Now ByteBufferInputStream initialises copies lazily, which should help
  when mapping large files with concurrent access (e.g., WebGraph's).

- Now XorShiftStarRandom uses 63 bits to generate doubles, and provides
  truly uniform nextInt(int) and nextLong(long) methods.

- Fixed computation of singular names in ProgressLogger.

- Now IntHyperLogLogCounterArray has a count() method that returns the
  estimate of a single counter.

2.0.1

- Now there are permutation methods for big arrays.

- New SummaryStats class for basic statistics bookkeeping (I know,
  there are billions of those, but we wanted something really
  simple, fast, and supporting >2^31 values).

2.0

- WARNING: This release has minor binary incompatibilities with previous
  releases, mainly due to the move from the interface
  it.unimi.dsi.util.LongBigList to the now standard
  it.unimi.dsi.fasutil.longs.LongBigList. It is part of a parallel release
  of fastutil, the DSI Utilities, Sux4J, MG4J, WebGraph, etc. that were
  all modified to fit the new interface, and that prepare the way for our
  "big" versions, that is, supporting >2^31 entries in arrays (simulated),
  elements in lists, terms, documents, nodes, etc. Please read our (short)
  "Moving Java to Big Data" document (JavaBig.pdf) for details.

- We now require Java 6.

- it.unimi.dsi.util.LongBigList is dead. Long live to
  it.unimi.dsi.fastutil.longs.LongBigList. We're sorry for the
  nuisance--adapting the code should be very easy (and we warned you
  anyway :).

- New it.unimi.dsi.big namespace for classes that need redesign to handle
  collections with more than Integer.MAX_VALUE elements. Presently we
  provide an adaptation of all classes built around StringMap. The classes
  have the same name of the non-big counterparts, so watch out.

- Now you can save directly a synchronised ImmutableExternalPrefixMap.

- Instances of an ImmutableExternalPrefixMap contain now a small cache
  that will store the most recent answers.

- Transformation strategies now have a length method that computes the
  length of a representation (possibly without actually computing the
  representation).

- New method BitVector.equals(v,from,to) that compares segments of
  two bit vectors.

- A ProgressLogger will now print the time per item (besides the
  number of items per time unit). Some patterns compute the
  correct singular from the plural item name.

- The automatic Log4J configuration system will now log on stderr
  instead of stdout.

- Fixed very old bug of Interval: the implemented sorted set was
  returning iterators with an off-by-one starting point.

- Fixed very old bug in OutputBitStream.position() for wrapped output
  streams. Thanks to Soumen Chakrabarti for reporting this bug.

1.0.12

- The DSI utilities are now distributed under the GNU Lesser General Public
  License 3.

- We are no longer dependent on COLT, thanks to the new sorting stuff in
  fastutil (which, in fact, is partially borrowed from COLT) and
  java.security.SecureRandom.

- New IntHyperLogLogCounterArray that computes estimates of unique elements in
  streams using HyperLogLog counters.

- LongArrayBitVectors now check that its length does not exceed 2^37.

- Now ProgressLogger makes it possible to update with a given value, or
  to update and force display. Moreover, times per time interval are
  scaled so slow processes gets items per minute, hour or day.

- New Util methods to manipulate and generate permutations.

- New Util.randomSeed()/Util.randomSeedBytes() methods that generates a
  reasonable seed using System.nanoTime() and MurmurHash3.

- We now check that the buffer of a FastBufferedReader is of nonzero length.

- Fast flip() methods in LongArrayBitVector.

1.0.11

- BitVectors.readFast()/BitVectors.writeFast() now use more liberal interfaces.

- Equality in AbstractBitVector is computed quickly word by word, and not
  scanning bit by bit (ouch!).

- The testForPosition boolean flag in InputBitStream and OutputBitStream
  constructors now just inhibits just the very slow reflective test for
  the existence of a getChannel() method. Conformance to the
  RepositionableStream interface is always checked (requested by Bryan
  Thompson).

- {Input,Output}BitStream.close() was throwing a null pointer exception in
  certain cases (thanks to Bryan Thompson for finding and fixing this bug).

- CanonicalFast64CodeWordDecoder now works with 0 or 1 symbols (thanks to
  Bryan Thompson for letting me notice this problem).

- IntBloomFilter was initialising incorrectly the field m (thanks to David
  Greenspan for finding and fixing this bug).

- NullOutputStream is now a RepositionableSteam. In this way, the creation
  of an OutputBitStream wrapping a NullOutputStream is much quicker.

- Fixed integer overflow bugs in LongArrayBitVector.wrap() and elsewhere.

1.0.10

- Fixed return type of writeLongZeta to int.

- New DelimitedWordReader class for tokenising streams by
  delimiters (rather than by word constituents).

- Fixed a small bug in ObjectParser: string arguments were
  trimmed even if delimited by quotes.

- ObjectParser will now accept null contexts.

- FastBufferedReader/DelimitedWordReader have toString()/toSpec()
  methods.

- New MutableString.skipSelfDelimUTF8() method.

1.0.9

- New ByteBufferInputStream.map() method that will map any file into a
  memory-mapped array of ByteBuffers and expose them as a measurable,
  repositionable InputStream. Very useful to work around the 2GiB limit
  of FileChannel.map().

- HuTuckerCodec can now get frequencies as a vector of longs.
  HuTuckerTransformationStrategy uses the new constructor.

- Code in replace() and clear() methods was not using the fact
  that unused parts of a LongArrayBitVector are maintained
  clear. This caused a significant slowdown when replacing
  or clearing vectors with a large backing array.

- New OfflineIterable class that makes it possible to dump and
  reread quickly temporary data.

1.0.8

- InputBitStream and OutputBitStream have new constructors that make it
  possible to skip the reflective test that are necessary to support
  position(). This feature was requested by Bryan Thompson, who had to
  create often a large number of bit streams. Additionally, there is a
  constructor accepting a file input/output stream that invokes
  getChannel() without using reflection.

- New Util.getDebugLogger() method for getting more detailed
  debugging programmatically (e.g., when testing). More generally,
  now it is possible to configure the log level when autoconfiguring
  Log4J.

- BitVector.hashCode() has been finally nailed down and documented. The
  chosen function seems to be reasonable and quick. Sorry if some
  serialised classes out there depended on it...

- Fixed old bug in ImmutableBinaryTrie: a root with a non-empty
  compacted path would have returned an incorrect approximated
  interval on prefixes of that path. Thanks to Peni Nissani for
  finding this bug.

- Slight impromevent to Fast.select(), which was using a few redundant
  instruction when doing broadword comparisons.

1.0.7

- Fixed obnoxious bug in LongArrayBitVector.replace(BitVector). In
  some cases bit vectors with garbage after length could have been
  generated.

- Fixed bug in LongArrayBitVector.longestCommonPrefixLength(). When
  comparing a string to one of its prefix the returned length was
  sometimes incorrect.

- New methods BitVector.firstZero/BitVector.lastZero() with
  fast implementations.

- New methods in BitVectors that dump on disk and expose
  as an iterable object the bit vectors returned by an iteartor.

1.0.6

- New transformation strategy that remaps bit vectors so that they
  are prefix free.

- Fixed a horrendous bug: AbstractBitVector was implementing equals(),
  but not hashCode(). This fact was causing a number of problems with
  collections. Since I had to fix it anyway, I built in a better
  hash function, too.

1.0.5

- WARNING: ObjectParser now throws more accurate exception. As
  an unpleasing side effect, it throws more checked exceptions
  than it used to, so your code might not compile.

- New (obviously missing) interval comparators in Intervals.

- Fixed absolutely stupid bug in CanonicalFast64CodeWordDecoder:
  the size of the lastCodeWordPlusOne array would have been
  the number of codewords, and not the number of codeword
  lengths. The resulting array was an order of magnitude larger
  than necessary, which wasn't properly a bug, but very annoying
  nonetheless (in particular when you generate thousands of
  decoders, as we do for our large time-aware graphs).

- New optional context object in ObjectParser.

1.0.4

- FastBufferedReader has now a more flexible approach to word
  segmentation.

1.0.3

- ShiftAddXorSignedStringMap moved here from Sux4J. It can
  sign any given function (usually one from Sux4J).

- Fixed bug in LongArrayBitVector.length(long) (the
  underlying long array was not cleared properly).

- Optimised implementations for LongArrayBitVector.{fill,flip}().

- Fixed very stupid but pernicious bug in ImmutableExternalPrefixMap,
  which was actually a bug in AbstractPrefixMap: the default
  return value of the underlying function should have been
  set to -1.

1.0.2

- Fixed very old bug in MutableString.print(PrintStream).

- New methods in BitVector. In particular, fast().

- Fixed bug in LongArrayBitVector.replace(LongArrayBitVector).

- Fixed bug in LongArrayBitVector.copy(LongArrayBitVector).

- New ISO-8859-1 transformation strategy.

- Double (CharSequence/MutableString) implementation for
  UTF16/ISO-8859-1 transformation strategies.
  
- Several performance enhancements to bit vector classes.

- A series of wrappers in TransformationStrategies apply
  transforms to all the elements of collection-like objects.

1.0.1

- New method BitVector.append(BitVector).

- PrefixCoderTransformationStrategy and UTF16TransformationStrategy
  now are thread-safe, as they return a new instance at each call.
  They now both support prefix-free and non-prefix-free encodings.

- UTF16TransformationStrategy has been now replaced by two singleton
  instances in TransformationStrategies.

- Significantly optimised LongArrayBitVector.getLong().

1.0

- First release.