All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.broadinstitute.hellbender.utils.tsv.package-info Maven / Gradle / Ivy

There is a newer version: 4.6.0.0
Show newest version
/**
 * Utility classes to read and write tab separated value (tsv) files.
 * 

File format description

* *

* Tab separated values may contain any number of comment lines (started with {@value org.broadinstitute.hellbender.utils.tsv.TableUtils#COMMENT_PREFIX}), * a column name containing line (aka. the header line) and any number of data lines one per record. *

*

While comment lines can contain any sequence of characters, the header and data lines are divided in * columns using exactly one {@value org.broadinstitute.hellbender.utils.tsv.TableUtils#COLUMN_SEPARATOR_STRING} character.

*

Blank lines are treated as having a single column with the empty string as the only value (or column name)

*

* The header line is the first non-comment line, whereas any other non-comment line after that is * considered a data line. Comment lines can appear anywhere in the file and their * present is ignored by the reader ({@link org.broadinstitute.hellbender.utils.tsv.TableReader TableReader} implementations). *

*

* The header line values, the column names, must all be different (otherwise a formatting exception will be thrown), and * all data lines have to have as many values as there are columns in the header line. *

*

Values can be quoted using {@value org.broadinstitute.hellbender.utils.tsv.TableUtils#QUOTE_STRING}. This becomes necessary when the value contain * any special formatting characters like a new-line, the quote character itself, the column separator character or * the escape character {@value org.broadinstitute.hellbender.utils.tsv.TableUtils#ESCAPE_STRING}.

*

Within quotes, especial characters must be escaped using the {@value org.broadinstitute.hellbender.utils.tsv.TableUtils#ESCAPE_STRING}

*

Examples 1:

*
 *     # comment 1
 *     # comment 2
 *     CONTIG   START   END     NAME    SAMPLE1 SAMPLE2
 *     # comment 3
 *     chr1     123100  123134 tgt_0    100.0   102.0
 *     chr1     134012  134201 tgt_1    50      12
 *     # comment 4
 *     chr2     ...
 * 
*

Reading tsv files

* You will need to extend class * {@link org.broadinstitute.hellbender.utils.tsv.TableReader TableReader}, either using * a top- or inner class and overriding {@link org.broadinstitute.hellbender.utils.tsv.TableReader#createRecord(DataLine) createRecord} * method to map input data-lines, wrapped into a {@link org.broadinstitute.hellbender.utils.tsv.DataLine DataLine}, to * your row element class of choice. *

* Example, a SimpleInterval reader from a tsv file with three columns, CONTIG, START and END: *

*
 *
 *     ...
 *
 *     public void doWork(final File inputFile) throws IOException {
 *
 *         final TableReader<SimpleInterval> reader = new TableReader<SimpleInterval>(inputFile) {
 *
 *            // Optional (but recommended) check that the columns in the file are the ones expected:
 *            @Override
 *            protected void processColumns(final TableColumns columns) {
 *                  if (!columns.containsExactly("CONTIG","START","END"))
 *                      throw formatException("Bad column names");
 *            }
 *
 *            @Override
 *            protected TableCounts createRecord(final DataLine dataLine) {
 *                return new SimpleInterval(dataLine.get("CONTIG"),
 *                                       dataLine.getInt("START"),
 *                                       dataLine.getInt("END"));
 *            }
 *         };
 *
 *         for (final SimpleInterval interval : reader) {
 *             // whatever you wanna do per interval.
 *         }
 *         reader.close();
 *         ...
 *
 *     }
 * 
*

Writing tsv files

* You will need to extend class * {@link org.broadinstitute.hellbender.utils.tsv.TableWriter TableWriter}, either using * a top- or inner class and overriding {@link org.broadinstitute.hellbender.utils.tsv.TableWriter#composeLine composeLine} * method to map your record object type to a output line, represented by a {@link org.broadinstitute.hellbender.utils.tsv.DataLine DataLine}. *

* Instances of {@link org.broadinstitute.hellbender.utils.tsv.DataLine DataLine} can be obtained by calling {@link org.broadinstitute.hellbender.utils.tsv.DataLine DataLine} * can be obtained by calling the writers protected parameter-less method {@link org.broadinstitute.hellbender.utils.tsv.TableWriter#composeLine composeLine}. *

*

* The column names are passed in order to the constructor. *

*

* Example: *

*
 *     public void doWork(final File outputFile) throws IOException {
 *
 *         final TableWriter<SimpleInterval> writer =
 *              new TableWriter<SimpleInterval>(outputFile, new TableColumns("CONTIG","START","END")) {
 *            @Override
 *            protected void composeLine(final SimpleInterval interval, final DataLine dataLine) {
 *                // we can use append with confidence because we know the column order.
 *                dataLine
 *                    .append(interval.getContig())
 *                    .append(interval.getStart(),interval.getEnd());
 *            }
 *         };
 *
 *         for (final SimpleInterval interval : intervalsToWrite) {
 *             writer.writeRecord(interval);
 *         }
 *         writer.close();
 *         ...
 *
 *     }
 * 
*

Readers and Writers using function composition

* {@link org.broadinstitute.hellbender.utils.tsv.TableUtils TableUtils} contains methods to create * readers and writers without the need to explicitly extending {@link org.broadinstitute.hellbender.utils.tsv.TableReader TableReader} * or {@link org.broadinstitute.hellbender.utils.tsv.TableWriter TableWriter} but by specifying their behaviour through * lambda functions. *

Example of a reader: *

 *     final TableReader<SimpleInterval> reader = TableUtils.reader(inputFile,
 *                (columns,formatExceptionFactory) -> {
 *                   // we check the columns is what we except them to be:
 *                   if (!columns.matchesExactly("CONTIG","START","END"))
 *                      throw formatExceptionFactory.apply("Bad header");
 *                   // we return the lambda to translate dataLines into intervals.
 *                   return (dataLine) -> new SimpleIntervals(dataLine.get(0),dataLine.getInt(1),dataLine.getInt(2));
 *                });
 * 
*

* The lambda that you need to indicate seems a bit complicate but is not so... basically it receives the * columns in the input and it must return another lambda that will translate data-lines into records considering * those columns. *

*

* Before doing that, it check whether the columns are the excepted ones and int the correct order (always recommended). *

* The additional formatExceptionFactory parameter allows the reader implementation to correctly report formatting issues. *

*

Example of a writer:

*
 *     final TableWriter<SimpleInterval> reader = TableUtils.reader(outputFile,
 *                new TableColumnCollection("CONTIG","START","END"),
 *                (interval,dataLine) -> {
 *                  dataLine.append(interval.getContig()
 *                          .append(interval.getStart(),interval.getEnd());
 *                });
 * 
*

* The case of the writer is far more simple as there is no need to report formatting errors as we are * the ones producing the file. *

*/ package org.broadinstitute.hellbender.utils.tsv;




© 2015 - 2024 Weber Informatics LLC | Privacy Policy