All Downloads are FREE. Search and download functionalities are using the official Maven repository.

META-INF.CHANGES Maven / Gradle / Ivy

There is a newer version: 2024.q3.1
Show newest version
jsoup changelog

Release 1.15.3 [2022-Aug-24]
  * Security: fixed an issue where the jsoup cleaner may incorrectly sanitize crafted XSS attempts if
    SafeList.preserveRelativeLinks is enabled.
    

  * Improvement: the Cleaner will preserve the source position of cleaned elements, if source tracking is enabled in the
    original parse.

  * Improvement: the error messages output from Validate are more descriptive. Exceptions are now ValidationExceptions
    (extending IllegalArgumentException). Stack traces do not include the Validate class, to make it simpler to see
    where the exception originated. Common validation errors including malformed URLs and empty selector results have
    more explicit error messages.

  * Bugfix: the DataUtil would incorrectly read from InputStreams that emitted reads less than the requested size. This
    lead to incorrect results when parsing from chunked server responses, for e.g.
    

  * Build Improvement: added implementation version and related fields to the jar manifest.
    

*** Release 1.15.2 [2022-Jul-04]
  * Improvement: added the ability to track the position (line, column, index) in the original input source from where
    a given node was parsed. Accessible via Node.sourceRange() and Element.endSourceRange().
    

  * Improvement: added Element.firstElementChild(), Element.lastElementChild(), Node.firstChild(), Node.lastChild(),
    as convenient accessors to those child nodes and elements.

  * Improvement: added Element.expectFirst(cssQuery), which is just like Element.selectFirst(), but instead of returning
    a null if there is no match, will throw an IllegalArgumentException. This is useful if you want to simply abort
    processing if an expected match is not found.

  * Improvement: when pretty-printing HTML, doctypes are emitted on a newline if there is a preceding comment.
    

  * Improvement: when pretty-printing, trim the leading and trailing spaces of textnodes in block tags when possible,
    so that they are indented correctly.
    

  * Improvement: in Element#selectXpath(), disable namespace awareness. This makes it possible to always select elements
    by their simple local name, regardless of whether an xmlns attribute was set.
    

  * Bugfix: when using the readToByteBuffer method, such as in Connection.Response.body(), if the document has not
    already been parsed and must be read fully, and there is any maximum buffer size being applied, only the default
    internal buffer size is read.
    

  * Bugfix: when serializing HTML, newlines in elements descending from a pre tag were incorrectly skipped. That caused
    what should have been preformatted output to instead be a run of text.
    

  * Bugfix: when pretty-print serializing HTML, newlines separating phrasing content (e.g. a  tag within a 

tag would be incorrectly skipped, instead of normalized to a space. Additionally, improved space normalization between other end of line occurences, and whitespace handling after a closing *** Release 1.15.1 [2022-May-15] * Change: removed previously deprecated methods and classes (including org.jsoup.safety.Whitelist; use org.jsoup.safety.Safelist instead). * Improvement: when converting jsoup Documents to W3C Documents in W3CDom, preserve HTML valid attribute names if the input document is using the HTML syntax. (Previously, would always coerce using the more restrictive XML syntax.) * Improvement: added the :containsWholeText(text) selector, to match against non-normalized Element text. That can be useful when elements can only be distinguished by e.g. specific case, or leading whitespace, etc. * Improvement: added Element#wholeOwnText() to retrieve the original (non-normalized) ownText of an Element. Also added the :containsWholeOwnText(text) selector, to match against that. BR elements are now treated as newlines in the wholeText methods. * Improvement: added the :matchesWholeText(regex) and :matchesWholeOwnText(regex) selectors, to match against whole (non-normalized, case sensitive) element text and own text, respectively. * Improvement: when evaluating an XPath query against a context element, the complete document is now visible to the query, vs only the context element's sub-tree. This enables support for queries outside (parent or sibling) the element, e.g. ancestor-or-self::*. * Improvement: allow a maxPaddingWidth on the indent level in OutputSettings when pretty printing. This defaults to 30 to limit the indent level for very deeply nested elements, and may be disabled by setting to -1. * Improvement: when cloning a Node or an Element, the clone gets a cloned OwnerDocument containing only that clone, so as to preserve applicable settings, such as the Pretty Print settings. * Improvement: added a convenience method Jsoup.parse(File). * Improvement: in the NodeTraversor, added default implementations for NodeVisitor.tail() and NodeFilter.tail(), so that code using only head() methods can be written as lambdas. * Improvement: in NodeTraversor, added support for removing nodes via Node.remove() during NodeVisitor.head(). * Improvement: added Node.forEachNode(Consumer) and Element.forEach(Consumer * Bugfix: boolean attribute names should be case-insensitive, but were not when the parser was configured to preserve case. * Bugfix: when reading from SequenceInputStreams across the buffer, the input stream was closed too early, resulting in missed content. * Bugfix: a comment with all dashes () should not emit a parse error. * Bugfix: when throwing a SelectorParseException for an invalid selector, don't try to String.format the input, as that could throw an IllegalFormatException. * Bugfix: when serializing HTML with Pretty Print enabled, extraneous whitespace may be added on closing tags, or extra newlines may be added at the end of script blocks. * Bugfix: when copy-creating a Safelist from another, perform a deep-copy of the original's settings, so that changes to the original after creation do not affect the copy. * Bugfix [Fuzz]: speed improvement when parsing constructed HTML containing very deeply incorrectly stacked formatting elements with many attributes. * Bugfix [Fuzz]: during parsing, a StackOverflowException was possible given crafted HTML with hundreds of nested table elements followed by invalid formatting elements. *** Release 1.14.3 [2021-Sep-30] * Improvement: added native XPath support in Element#selectXpath(String) * Improvement: added full support for the