All Downloads are FREE. Search and download functionalities are using the official Maven repository.

src.it.unimi.dsi.big.mg4j.io.package.html Maven / Gradle / Ivy

Go to download

MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. The big version is a fork of the original MG4J that can handle more than 2^31 terms and documents.

The newest version!


  
    MG4J: Managing Gigabytes for Java
  

  

    

Bit-level I/O classes.

Package Specification

The standard Java API lacks bit-level I/O classes: to this purpose, MG4J provides {@link it.unimi.dsi.big.mg4j.io.InputBitStream} and {@link it.unimi.dsi.big.mg4j.io.OutputBitStream}, which can wrap any standard Java corresponding stream and make it work at the bit level; moreover, they provide support for several useful formats (such as unary, binary, minimal binary, γ, δ and Golomb encoding).

Compression can be achieved using self-delimiting formats supported by the classes above, or also by arithmetic coding, using the classes {@link it.unimi.dsi.big.mg4j.io.ArithmeticCoder} and {@link it.unimi.dsi.big.mg4j.io.ArithmeticDecoder}. Note that arithmetic coding is not very efficient in the present implementation, as it does not allow a varying number of symbols.

Bit input and output streams offer also efficient buffering and a way to reposition the bit stream in case the underlying byte stream is a file-based stream or a {@link it.unimi.dsi.fastutil.io.RepositionableStream}.

Conventions

All coding methods work on natural numbers. The encoding of zero is very natural for some techniques, and much less natural for others. To keep methods rationally organised, all methods are able to encode any natural number. If, for instance, you want to write positive numbers in unary encoding and you do not want to waste a bit, you have to decrement them first (i.e., instead of p you must encode p−1).





© 2015 - 2025 Weber Informatics LLC | Privacy Policy