g4j.5.2.source-code.CHANGES Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of mg4j Show documentation
MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java.
There is a newer version: 5.2.2
5.1.1 -> 5.2

- End-of-the-line release. With this release, the big release becomes
  the official release of MG4J. For some time, this version will be fixed
  in case of bugs.

5.1 -> 5.1.1

- Fixed very subtle bug in documents returned from HtmlDocumentFactory.
  Unparsed document coming from streaming sources would have accessed the
  data source during finalization due to toString() returning the document
  title. This was causing random error reading, say, from WArc streams, if
  a document was not closed properly. Added blurb to AbstractDocument that
  warns about this issue.

- Fixed a bug in dynamic class naming ("Payload " was used instead of
  "Payload"). Thanks to Dmitri Portnov for fixing this bug.

- Switched to SLF4J for logging.

5.0 -> 5.1

- A small revolution is taking place in MG4J: now most classes handling
  indices have an IOFactory parameter that makes it possible to open files
  in alternative filesystems, such as HDFS. Beware--the feature is very
  pervasive and there might be missing spots. Thanks to Tim Potter for
  useful discussions and for testing this new feature.

- InputStreamDocumentSequence was not behaving correctly in case of
  keyboard input (two EOFs were necessary).

- The Maven artifacts did not contain the Velocity templates. Thanks
  to Andrew MacKinlay for reporting this issue.

4.0.4 -> 5.0

- WARNING: this release has source and binary incompatibilities with
  previous releases. Watch out. 

- nextDocument() now returns DocumentIterator.END_OF_LIST instead of -1 to
  denote list exhaustion. To avoid confusion and ease the transition, the
  package prefix of MG4J is now it.unimi.di.*, following the change of
  name of our department.

- it.unimi.di.mg4j.search.DocumentIterator is now strictly lazy; in
  particular, it does not implement java.util.Iterator. Please replace
  calls to DocumentIterator.hasNext() with a check to
  DocumentIterator.nextDocument() != DocumentIterator.END_OF_LIST, or try
  whether the semantics of DocumentIterator.mayHaveNext() suits you. The
  change aligns the behaviour of the two versions of MG4J.

- The plethora of methods that accessed the positions of a term in an
  IndexIterator have been replaced by the single lazy nextPosition() call,
  which returns IndexIterator.END_OF_POSITIONS when the positions are
  exhausted. Some static methods in IndexIterators should help with the
  transition.

- MG4J is no longer based on gap-based indices. Classical interleaved indices
  are used for incremental index construction and high-performance indices
  are still supported for historical reasons, but all new indices are by
  default built using the new quasi-succinct format.

- DiskBasedIndex.getInstance() now return an Index instead of a
  BitStreamIndex. Old code should check with a reflective call whether the
  result is a BitStreamIndex and act accordingly, as now it might be a
  QuasiSuccinctIndex, too.

4.0.1 -> 4.0.4

- Fixed SimpleParser.parse(MutableString), which was throwing a
  NullPointerException.

- Now DocumentRankScorer can load score files of any type.

4.0 -> 4.0.1

- We now force the number of documents of a virtual index to be equal
  to that specified by the resolver. Collections in which the last few
  documents were not referred would have generated virtual indices with
  fewer documents than the standard ones.

- DocumentSequenceImmutableGraph is now part of MG4J. Building graphs
  out of web documents should be quite easy.

- Fixed a small bug in the equals method of Term.

- Fixed bug in the equals and hashCode methods of Select (before, only 
  Index was taken into account, and not the actual subquery).

- Fixed several small inconsistencies in the Scorer hierarchy.

- Added the SubsetDocumentSequence class to extract a subset of documents
  from a given sequence.

- The default target for skipping structures is now 1%.

- Now ConsecutiveDocumentIterator has specialized code for non-gapped
  phrases cointaining just terms.

- Now Combine loads sizes when compressing positions using interpolative coding.

- We now use bare-bones heaps and array priority queues to increase speed.

- BM25Scorer and BM25FScorer have significantly faster ranking logic.

3.0.1 -> 4.0

- WARNING: This release has minor binary incompatibilities with previous
  releases, mainly due to the move from the interface
  it.unimi.dsi.util.LongBigList to the now standard
  it.unimi.dsi.fasutil.longs.LongBigList. It is part of a parallel release
  of fastutil, the DSI Utilities, Sux4J, MG4J, WebGraph, etc. that were
  all modified to fit the new interface, and that prepare the way for our
  "big" versions, that is, supporting >2^31 entries in arrays (simulated),
  elements in lists, terms, documents, nodes, etc. Please read our (short)
  "Moving Java to Big Data" document (JavaBig.pdf) for details.

- We now require Java 6.

- WARNING: document iterators will return FALSE, instead of TRUE, for
  indices for which there are no intervals. The actual intervals returned
  (if there are any) has not changed, but the placeholder role of TRUE
  has been taken by FALSE.

- WARNING: The semantics of TermCollectionVisitor.prepare(ReferenceSet)
  has changed slightly.

- PdfDocumentFactory has been removed. Please use the new Tika-based
  factory for PDF parsing.

- Backport from the big version of DocumentIterator.END_OF_LIST as a
  substitute for Integer.MAX_VALUE. Please use it in new code--it also
  makes transitions to the big version easier.

- Refined semantics for DocumentIterator, with new streamlined
  implementations based on AbstractDocumentIterator.

- A long-standing bug in skipTo() has been fixed thanks to a very detailed
  and replicable bug report by Soumen Chakrabarti. If the last posting
  in a list had an ordinal position that was an exact multiple of the
  quantum, skipping beyond the pointer contained in the posting would
  have erroneously returned the last pointer instead of Integer.MAX_VALUE.

- A few serious bugs of the alignment operators have been fixed. It is
  also significantly faster.

- A new set of classes interfaces with Apache's Tika to provide parsing
  of Office, RTF, etc. files.

- Many fixes to the remote classes (still experimental!).

- IdentityDocumentFactory was not using the FIELDNAME property.

- The toString() method of LowPass was erroneously printing "<" instead of
  "~".

- Query is now serializable and AbstractCompositeQuery exposes the component
  queries.

- The range operator for payloads was broken.

3.0 -> 3.0.1

- MG4J is now distributed under the GNU Lesser General Public License 3.

- MG4J is no longer dependent on COLT or jal, but it requires at least
  fastutil 6.

- Memory usage during indexing is a bit less tight, due to the new linear
  probing hash maps in fastutil 6 which could use more RAM.

- SKEWED_GOLOMB is no longer supported for writing.

- When loading offsets in memory, the bit stream used to read them
  was not properly closed.

- Fixed bug that would cause an error when creating a single empty
  batch under Windows.

- Fixed bug that was preventing payload indices from working correctly
  (thanks to Polina Morozova for finding and fixing this bug).

- Combine and subclasses now will work even if the occurrences field
  of the component indices is not set (thanks to Soumen Chakrabarti
  for reporting this bug).

- Fixed bug in AlignDocumentIterator that was causing random
  IllegalStateExceptions (thanks to Roi Blanco for reporting this bug).

2.1.3 -> 3.0

- WARNING: Massive revamp of the DocumentIteratorVisitor subsystem. Now
  such visitors can return data, much like a QueryIteratorBuildervisitor.
  It also has a special visit method for MultiTermIndexIterators. You'll
  have to adapt your previous implementations.

- WARNING: QueryParser instances are required to provide a parse(MutableString)
  method and two new escape methods that can be used to turn a string into
  a text token. This feature is fundamental for automatic query generation
  (thanks to Hugo Zaragoza for pointing out this problem).

- WARNING: To make a few things easier, we now have explicit document
  iterators representing true and false. Their construction requires a
  reference index (contrarily to that was happening with
  DocumentIterators.EMPTY_ITERATOR), so the getInstance() methods of most
  document iterators had to be updated, and DocumentIteratorVisitor
  instances need to implemented two new visit() methods. The iterators are
  generated by the tokens #TRUE and #FALSE.

- WARNING: Indexing of virtual fields uses much less memory, but batches
  now have a different content: they represent actual positions in the
  final virtual document. Sizes of each batch represent the known size of
  a virtual moment when the batch was written. With this change, Paste
  does no longer require more memory than Concatenate.

- WARNING: A new RemappingDocumentIterator class makes it possible to
  mix results from different indices with positional operators. Since
  there is a new Remap query node, all DocumentVisitors will have
  to be updated. 

- WARNING: All deprecated classes have been removed.

- WARNING: The -B option of IndexBuilder is now aligned to Scan--it
  specifies the basename of a collection to be built at indexing
  time. It used to be the size of the Combine buffer.

- New classes for efficient document collection construction at
  indexing time. The architecture is now also very open--you can
  plug in your own builders.

- Completely restructured size handling for Combine and subclasses.
  Unless you use Golomb coding, you will not need to load sizes.
  This is true even of batches of virtual fields, as Paste now
  by default does not renumber positions, but rather expects them
  to be already renumbered. The old behaviour can be obtained
  via a flag.

- We moved to Jetty 6. Also, a few problems with Velocity not finding
  templates have been fixed.

- New, more intelligent memory handling that should be able to avoid
  completely out-of-memory errors. There is also a limit on the
  number of terms per batch that should help with garbage collection.

- Fixed a bug in collection creation: we used to provide the original
  factory, but this is wrong as we might not be indexing all fields. Now
  we generate a suitable factory that contains only the indexed fields.

- New important feature: high-performance indices may have now variable
  quanta depending on the list frequency and density. Indices now sport a
  .posnumbits file that records how many bits are used to store positions.
  It is used as a basic statistics to compute the correct quantum. You
  can ask for a percentage of the index to be used to skip towers, and
  the right quantum for each list will be computed for you. The process
  is quite empirical, so always look into .stats files to check that
  you are actually using no more than the percentage requested. In general,
  old indices will have to be rebuilt before being able to Combine them
  into an index with variable quanta, but for high-performance indices
  the tool ComputePosNumBitsPositions can be used to add the missing
  file.

- Memory mapping of indices now uses the new multiplexed approach
  implemented in ByteBufferInputStream. This means that we can
  map into memory essentially every index. Thanks to Valentin Tablan
  and Ian Roberts for suggesting this approach.

- Now we feature an implementation of the state-of-the-art BM25F ranking
  function.

- ZipDocumentCollection.getInstance() makes it possible to load
  realiably ZipDocumentCollection instances even if they are not
  in the current directory.

- New UTF-8 nice mathematical symbols for conjunction, disjunction, TRUE
  and FALSE.

- Fixed problem with too many connections open when using
  JdbcDocumentCollection.

- A new SUCCINCTSIZES URI key makes it possible to ask for loading sizes
  into an Elias-Fano compressed list. This will slow down access by
  two orders of magnitude, but it can be very useful when pasting large
  indices, as pasting needs to load a large amount of size data.

- EmptyIndexIterator instances are no longer Index-based singletons. This
  change was necessary to make it possible to run ranking algorithms that
  require to set the weight or id even of empty iterators. This should
  cause no problem.

- All document iterators have now a settable weight. The weight can
  be espressed in standard syntax using braces. Note that weights
  per se have no meaning--it is up to the scorers to use them.

- Now the metadata-only option of Combine and its implementations generates
  the file of frequencies. This is very useful as it makes it possible to
  compute the term frequencies for the virtual documents obtained by
  concatenating all fields--something that is necessary for the correct
  computation of BM25F.

- Fixed a bug in the grammar: queries such as "(a))" would have been 
  parsed as "(a)" because of a lack of check for EOF (thanks to
  Hugo Zaragoza for reporting this bug).

- The parser will now accept Unicode characters 0x2227 and 0x2228
  (the standard mathematical symbols for conjunction and disjunction)
  for AND and OR, respectively.

- Following some testing TREC GOV2, the defaults for MAXPREANCHOR and
  MAXPOSTANCHOR in HtmlDocumentFactory have been reduced to 8 and 4,
  respectively.

- Fixed old bug in SemiExternalGammaList; readBits(0) was not called
  after numLongs estimation, leading to EOFExceptions.

- Document pointers can now be coded in unary.

- Fixed bad bug in PartitionLexically: for high-performance indices,
  the positions of the last term were not being written.

- HttpFileServer has a settable port.

- New Scorer.getWeights() method to get weights.

- Fixed a bug in TfIdf scorer that would have caused NaNs.

- Query accepts a newline-separated list of titles, besides
  the usual serialised object.

2.1.2 -> 2.1.3

- URLMPHVirtualDocumentResolver required a sorted list, even if this was
  not in the class specification. Now you can choose between a sorted list
  (with reduced space occupancy) or a generic list (thanks to Nuno
  Cardoso for reporting this bug).

- Fixed problem with VelocityViewServlet (getTemplate() must not be
  invoked statically on Velocity for things to work properly; thanks to
  Valentin Villenave for reporting a problem with the Lilypond Snippet
  Repository which led me to fix this bug).

2.1.1 -> 2.1.2

- AlignDocumentIterator (syntax: ^) makes it possible to align
  document/interval iterators from different indices. Using this
  feature MG4J can easily support queries based on semantic tagging.

- Fixed another bug in Snowball stemmers: calling processTerm() with
  a null argument would have caused an exception.

- Now Scan and IndexBuilder accept parseable objects as sequences.
  The same happens for the WORDREADER property of some factories,
  making it possible to create a moderately command-line-configurable
  FastBufferedReader as WordReader.

- UNICODE_INPUT is now set in SimpleParser.jj, making it possible to
  write wild Unicode queries again.

- QueryServlet now forces UTF-8 for output.

- We now distribute the javacc-generated files for easier installation.

- More liberal Velocity template-resolution setup, now documented in
  the HttpQueryServer Javadoc.

- The --skips command-line option is gone. --no-skips disable skips
  for interleaved indices only. By default, all indices have skips
  that use about 2% of the index size.

- Fixed bad integer overflow bug when using large heights.

- New -i option for URLMPHVirtualDocumentResolver, mimicking
  the same option in Sux4J's functions.

2.1 -> 2.1.1

- Major fix: the Snowball stemmers would generated empty strings, and
  Combine would choke (generating empty indices) on empty strings.

- Removed obsolete PorterStemmerTermProcessor.

2.0.1 -> 2.1

- WARNING: Most utility classes have been moved to dsiutils. Old versions
  are still here and deprecated, but you'll have some problems when importing
  this version. Always check which version you're using!

- WARNING: TermMap has been replaced by StringMap (in dsiutils). PrefixMap
  exists, but the dsiutils signature is completely different from the old one.

- Lots of stemmers coming from Snowball. We actually made some improvements
  to the Java Snowball compiler to get this working at a reasonable speed.

- New (somewhat experimental) feature: you can get the terms that caused
  an interval to be emitted.

- Sequential scan was not working for high-performance indices if positions
  were not read. The problem was evident when combining high-performance indices
  specifying -cPOSITIONS:NONE.

- Fixed a couple of NullPointerException in index construction (thanks to
  Marko Srdanovic for reporting these bugs).

- Fixed missing call to super.close() in AbstractIndexClusterIndexReader that
  was causing spurious warnings.

- Now Query has multiplex on by default.

- Fixed bug in MutableString.subSequence() (thanks to Espen Amble Kolstad
  for reporting this bug). MutableString is now in dsiutils.

- New QueryExpander interface for modifying queries between parsing and
  actual resolution. It can be used, for instance, to do term expansion.
  A simple abstract implementation (AbstractTermExpander) is provided
  for term expansion. Also, an implementation that multiplexes terms
  over indices (MultiIndexTermExpander) is provided.

- New allLines() method in LineIterator. LineIterator is now in dsiutils.

2.0 -> 2.0.1

- Can you believe that? Fast.leastSignificantBit() under very peculiar
  circumstance was returning random data, but apparently this was causing
  no warm. I don't wanna know.

- Better memory handling: buffer reallocation logic in index construction
  could cause out-of-memory errors. Now we retry a small reallocation after
  dumping the content in a temporary file, and record the event so the
  Scan process can dump the current batch.

- Fixed old minor bug in Combine: term files and global-counts files were
  not closed, leading to bizarre and spurious too-many-open-files errors.

- Fixed derelativisation when using FileSystemItem.

1.1.3 -> 2.0

- METAWARNING: This release has so many changes and so many new features
  that we strongly suggest to read carefully all information below
  and the manual.

- WARNING: there are performance improvements due to fixed-point
  computation of Golomb moduli (yes, it *really* slows down things), but
  unfortunately all indices have to be rebuilt.

- WARNING: virtual fields have changed in a completely incompatible
  way, and the same happened to AnchorExtractor. This was necessary
  to get finally rid of problems with System.identityHashCode()
  (see below).

- WARNING: BitStreamIndexIterator will now throw an UnsupportedOperationException
  when positions or intervals are retrieved on an index without positions.
  Previously, getting positions would have produced the same effect, but
  getting intervals would have returned TRUE. This was causing a very confusing
  behaviour with ordered AND, consecutivity, etc., as they were returning
  false positives.

- WARNING: a great deal of work has gone into making all relevant iterators
  fully lazy. Please use DocumentIterator.nextDocument() and
  IntervalIterator.nextInterval(), after reading the related Javadoc
  documentation. The change has produced significant performance
  improvements.

- WARNING: IOExceptions are now rethrown by most index-access methods.
  Previously, they would have been catched and wrapped into
  RuntimeException, but this behaviour was slightly slowing down methods
  called very often like nextDocument().

- WARNING: The old sequential reading methods (e.g., readDocumentPointer())
  are no longer available (I guess nobody was using them anyway). They are
  replaced by an IndexReader.nextIterator() method that returns an index
  iterator on the term after the current one, until exhaustion.

- WARNING: Quanta are now restricted to powers of two.

- Completely new kind of index (high-performance). It uses the Lucene idea
  of keeping positions in a separate file, and enriches it with MG4J skip
  structures. It is now the default index type.

- Completely rewritten index reading. Now a ruby script generate different
  readers for different combination of flags, increasing significantly
  performance due to the reduced logic overhead. A generic class is always
  available, but for production sites wired index readers are the right
  choice. The wired, faster class is fetched automagically by reflection
  if available.

- Completely new, memory-adaptive index construction strategy. Just specify
  a number of *documents* per batch and let MG4J do the rest. Please read the
  Scan class documentation.

- New payload-based indices. Now it is possible to index dates, integers,
  or any other payload. By default we supply range queries.

- Significant improvements in performance. System.identityHashCode() turned
  out to be *deadly* slow, so we dropped reference-based open hash map and
  started using brute-force array maps (you need fastutil >= 5.0.7)
  whenever we have to have to manipulate very small sets. The gains are
  suprising, in particular for queries containing frequent terms.

- Even more improvement due to parallel reimplementation of all operators for
  the special case in which all document iterators are index iterators. In
  this case all intervals have length 1 and can be retrieved eagerly. In some
  cases performance is almost doubled.

- New low-level coded-integer skipping methods have further increased performance
  in certain situations (e.g., phrasal queries containing stopwords).

- Now we use precomputed bit codes for 65536 words, uniformly. This
  requires 4MiB of memory just for precomputed words, but it almost doubles
  decoding speed (as the logic is much, much simpler).

- New bulk reading methods for integers in gamma, shifted gamma and delta
  coding. They make readDocumentPositions() several times faster as most
  decodings do not require a method call.

- Many fixes to the code involving generics.

- Fixed stupid bugs in PartitionLexically.

- Moved sizes into Index (brom BitStreamIndex) and added new SIZES property
  that makes it possible to specify a global sizes file. This way, it is
  possible to use BM25 on clusters.

- Major fixes to documentally clustered document iterators.

- Fixed subtle semantic issue in LowPassDocumentIterator: TRUE iterators
  now make the iterator valid.

- Fixed subtle semantic issue in subclasses of AbstractOrderedIntervalIterator:
  how TRUE subiterators are considered as always matching (so the actual
  interval matching is performed just on non-TRUE iterators).

- Fixed bug in ScoreDocumentBoundedSizeQueue that was causing enqueuing
  of documents with score equal to the minimum.

- Improved implementation of MinimalPerfectHash. By fixing deterministically
  the perfect hash functions we reduce to virtually zero the trials during
  the construction (thanks to [email protected] for suggesting the idea).

- Fixed old copy-and-paste bug in non-scored requests to QueryEngine: offset
  was not used at all (but I guess nobody was using that method anyway).

- Completely new support for query expansion. A MultiTermIndexIterator
  behaves in all respects like an IndexIterator, but it's actually built
  by merging the index iterators of several terms. The "frequency" is
  settable so to solve term-dependency problems in IDF-based ranking schemes.
  For debugging purposes, + can be used (instead of |) to cause the
  constructon of a MultiTermIndexIterator.

- Brouwerian difference is now supported. It kills all intervals of the minuend
  that appear in the subtrahend. It can be used for searching for terms forcing
  however the context in which they are found *not* to contain some
  terms, or more generally a query. It can also be used to modify index granularity
  by subtracting 2-element intervals that cross section boundaries.

- ConsecutiveDocumentIterator now support gaps that can be used to match arbitrary
  words. This is particularly useful to perform phrasal queries in indices where
  some terms have not being indexed. Gaps are specifiable using $ instead
  of a term in the built-in parser.

- New methods to access the front of a subclass of AbstractUnionDocumentIterator,
  that is, the indices of the component iterators positioned on the current document.
  They are used by all union-based iterators, providing a significant performance
  improvements on large unions.

- New metadata-only mode for Combine and related subclasses. Mainly useful for getting
  the global sizes, terms, etc. of a cluster.

- The array-writing methods of OutputBitStream now take a long for the
  bit length/offset, and correspondingly return a long. The old methods are
  still present, but they are deprecated (just to avoid proliferation).

- Deprecated all minimal perfect hashing constructors using the platform default
  encoding. They are just an endless cause of problems. There are now constructors
  with just a filename and an encoding (which can be null to mean the platform
  encoding, but you have to explicitly ask for it).

- Now all TermMap implementations have a constructor accepting an Iterable.

- New constructors and main method options for minimal perfect hash tables, prefix
  dictionaries and front-coded lists that support reading gzip'd files.

- Query provides a clearer selection between *no interval selection* and
  *no intervals*.

- Fixed bug in ImmutableBinaryTrie: prefixes of the first binary string
  would have generated an empty approximated interval (instead of [0]).

- Fixed bug in writeShiftedGamma()/readShiftedGamma(), and modified test.bsh
  so that it detects the bug.

- The SPIRE 2006 algorithms are by now obsolete--we have new, provably optimally
  lazy algorithms. The code reflects this.

- Lots, lots, lots of unit tests.

1.1.2 -> 1.1.3

- New score(digits) method for ResultItem for easier display.

- Now JdbcDocumentCollection works with factories featuring more than one field.

- Reintroduced the JavaBeans Activation Framework in dependencies.

- Fixed lack of calls to close() in some document factory, generating
  spurious warnings.

- Fixed static fields in QueryServlet.

1.1.1 -> 1.1.2

- Fixed default values of K_1 and B in BM25 scorer following
  Büttcher & Clarke's paper.

- Fixed interval methods for nonsense calls on the empty interval.

- Dumped jline--we now suggest using rlwrap.

- More sensible hash for intervals. As a consequence, the serialUID
  had to be bumped.

- Fixed serious bug in OrderedAndDocumentIterator that was dropping
  several correct intervals (thanks to Fabien Campagne for finding
  this bug).

- Fixed very old bug in InputBitStream.read(byte[], int)--reads of
  full length would have caused an ArrayIndexOutOfBoundsException
  (thanks to Kevin Dorff for finding this bug).

- OrDocumentIterator was using an indirect queue instead of a 
  semi-indirect queue, maybe for historical reasons.

- Complete rewrite of interval operators due to new algorithms, to
  be included in the revised SPIRE 2006 paper. On TREC data this led
  to an average 3% increase in speed. Now the algorithms used by MG4J
  are provably optimally lazy.

- The BulletParser now accepts element-type names with dashes, etc., and
  moreover parses correctly explicit CDATA sections (thanks to Kevin Dorff
  for finding these bugs).

- Support for unsigning signed minimal perfect hash maps.

- New Shift-Add-Xor-based signed minimal perfect hashes (even with long
  signatures). Moreover, now all signed hashes have a main() method
  generating by default instances of that hash.

- Massive speed improvements in OutputBitStream: finally we write
  precomputed words for small integers, analogously to what happens
  in InputBitStream.

1.1 -> 1.1.1

- Better loading of InputBitStream data, working also with multiple class
  loaders, and serialisability of SelectedInterval (fixed by the Twease
  people).

- AbstractAggregator was not setting up the equalisation factors when
  equalisation was not required, resulting in divisions by 0.

- CountScorer is now a DelegatingScorer (as it should have always been).

- The empty-constructor interval selector wasn't really letting out *all*
  interval--overlapping intervals would have been discarded.

- Fixed a *very old* bug in the computation of minimal-interval semantics.
  Now the code is fully aligned with our SPIRE 2006 paper.

1.0.2 -> 1.1

- IMPORTANT: IndexWriter.close() no longer save automagicall
  properties--you have to fetch them with IndexWriter.properties().

- Java 5 only.

- Probably the largest rewrite and extension in the history of MG4J. Too
  many changes, fixes and optimisations to be described here. Almost nothing
  is backward-compatible.

- We are starting to distribute unit tests with each release. We have
  actually many more tests, but they are not cast inside JUnit and rather
  undocumented. You are welcome to donate unit tests.

1.0.1 -> 1.0.2

- Fixed bug in InputStreamDocumentCollection: the document index (and thus 
  the title) was never incremented.
- New parsing factory for the BulletParser: you decide how to parse your names (an
  idea by Fabien Campagne).
- Now we use 1.26n integers to minimally hash n words. 1.25n is in fact the
  threshold--you need something larger than that. The change should be fully
  backward-compatible.
- Now FileLinesCollection returns a Closeable FileLinesIterator.
- BloomFilter does not implement any longer the nonsensical size() method. add()
  is more efficient and does not return a value.

1.0.0 -> 1.0.1

- Fixed bug in Paste if the size of size lists differ (now we extend to zero).
- The "field" property was not propogated by Combine.
- A missing throws clause in AbstractDocumentCollection's implemention of iterator() 
  was making it impossible to throw exceptions in implementing subclasses.
- New, efficient single-query iterator for JdbcDocumentCollection.
- NULLs do not generate null pointer exceptions any longer in JDBC document collections.
  They're converted to empty input streams.

0.9.2 -> 1.0.0

- Too much to be written.

0.9.1 -> 0.9.2

- IMPORTANT: To avoid clashes with List, the get() method of TermMap
  has been changed to getTerm(). We're sorry for this inconvenience.

- Now we support prefixes by means of a PrefixMap. There are easy
  (ternary search trees) and very sophisticated (semi-external tries)
  implementation. If you have a PrefixMap you can search for things
  like "foo*" (meaning "starts with foo"), provided that the terms
  starting with "foo" do not exceed a constant defined in QueryParser.

- Interval has new methods that compare to points.

- Fixed stupid bug in ClarkCormack scorer: we were comparing the document
  indices, not the scores. Ouch.

- Fixed ScoredDocumentBoundedSizeQueue: now stability is forced by making
  the order an actual order (not a preorder) so it is possible to get the
  k-th to (k+j)-th ranked documents in a consistent way. The new version
  is, unfortunately, completely incompatible with the old one.

- New CachingDocumentIterator: it decorates a DocumentIterator so that
  you can get several times its interval iterators.

- FastBufferedOutputStream was NOT flushing. The flush() method was inherited,
  but of course that didn't work. FastBuffered{In,Out}putStream are
  now deprecated as they have been moved to fastutil.

- OrDocumentIterator would have caused IllegalStateException in some circumstances
  (the array of underlying iterators was assumed to have null'd position for
  empty component iterators, but this wasn't happening).

- InputBitStream is a boolean iterator, and OutputBitStream has a method
  accepting a boolean iterator. This opens a new world of possibilities 8^).

- New replace() and delete() methods in MutableString for handling more
  easily deletions or substitution of a class of characters.

- readLine() no long empties its argument on end of file.

0.9.0 -> 0.9.1

- A couple of missing methods in it.unimi.dsi.mg4j.util.Fast were necessary for
  WebGraph.

0.8.2 -> 0.9.0

- IMPORTANT: Int2IntArrayMap and Int2LongArrayMap no longer exist: offsets and
  sizes now are type-specific array lists (and they can be easily generated using
  fastutil wrappers).

- Fixed stupid, stupid bug in state handling in IndexReader. Sequential
  reads of an entire index would have thrown an IllegalStateException.

- Changed a few array static methods to faster fastutil counterparts.

- Fixed small glitch in lastIndexOf() semantics--searches from negative
  offset of the empty string would have returned 0 instead of -1.

- Bunch of new methods in MutableString ((last)indexOfAnyBut, (co)span).

- Golomb read/write methods now support modulus 0, 0 being the only valid
  argument (and the result returned upon reads).

- Minimal perfect hashes support lists with less then 16 elements, by
  storing them transparently in a vector.

- Signed hashes have an incompatible format (sorry).

- Minimal perfect hashes support optimal weight length computation for
  sorted term collections.

- New left/right trim methods. Moreover, trim methods preverve looseness
  and compactness.

- Literally zillions of new features, everywhere.

- Experimental support for multi-index minimal-interval semantics and
  skipping towers.

0.8.1 -> 0.8.2

- New methods for starting and stopping a progress meter with messages.

- New FastMultiByteArrayInputStream class that can hold 256 PiB (256 PiB = 2^28
  GiB) and expose them as a repositionable stream.
 
0.8 -> 0.8.1

- Modified imports and class name for compliance with fastutil 3.0.

- Relicensed under the LGPL.

0.7.1 -> 0.8

- New NullInputStream to support new InputBitStream direct array wrapping.

- position() in InputBitStream will always work if the new position is
  within the current buffer.

- Removed unused buffer in InputBitStream, and made unget buffer allocation
on-demand.

- Eliminated finalizers from streams.

- New debugging class.

- Fixed bug in position().

- Now the ProgressMeter gives items/s at the first printout.

- New methods for variable-length nibble coding.

- New methods for zeta coding (a new code!).

- Fixed erroneous serialisation of CRC32SignedMinimalPerfectHash.

- Completely renewed hashing scheme for minimal perfect hashing: supports
  the empty string and it is faster to compute. Moreover, MinimalPerfectHash
  now has an offline builder that never loads the words actually in RAM,
  thus allowing to hash very large sets (albeit slowly), and checks 
  a suitable system property to provide optional verbose logging. The
  serialisation, unfortunately, is incompatible.


0.7 -> 0.7.1

- Removed experimental classes.

- Fixed two bad bugs introduced during the in 0.7 during optimisation.

0.6 -> 0.7

- IMPORTANT: MG4J now uses the new fastutil package name (i.e., no
  more fastutil). If you use parts of MG4J that require fastutil, you
  should upgrade.

- New replace() methods for MutableString that entirely replace
  the string content. New copy() method to obtain easily a compact
  copy of a mutable string.

- New RepositionableStream interface to mark streams that can
  be repositioned by bit streams.

- New FastByteArrayInputStream to read memory blocks as bit
  streams.

- New unsynchronised FastBufferedReader.

- ProgressMeter count value has now setters and getters.

- Programmable meter quantum for FirstPass.

- Some optimisations.

0.5 -> 0.6

- IMPORTANT: streams and self-delimiting string formats are not
  binary compatible with previous versions. Please read the docs.

- Fixed bug with serialisation of empty strings and set serialVersionUID.

- Too many addition to be described in a file, but in short: optimised
  indexOf() family of methods, flexible index construction, QuickSearch
  fast searches.

0.4 -> 0.5

- MutableString has now a more coherent policy for compactness and looseness.
  They are preserved by all operations.

- StringBuffer-specific methods have been killed to reduce code duplication.
  You need to recompile so that java uses the alternative CharSequence-specific
  methods.

- Several new methods such as startsWith, endsWith etc.

0.3 -> 0.4

IMPORTANT: the hash computation functions in 
MinimalPerfectHash have been changed. Please regenerate your maps.

MinimalPerfectHash has been reimplemented to use CharSequence, so
it is more general. Moreover, we have a new SignedMinimalPerfectHash
that can be used to avoid false positives.

- New replace methods in mutable strings.

- Now we try to return a reference to this in all mutable string methods.

- Various fixes to documentation.

0.2 -> 0.3
- Introduced new class MutableString.

0.1 -> 0.2
- By mistake writeLongDelta() was really called writeDelta().