net.ruippeixotog.scalascraper.model.Document.scala Maven / Gradle / Ivy
The newest version!
package net.ruippeixotog.scalascraper.model
/** A representation of an HTML document.
*
* This trait provides methods for retrieving the document's location and the root element, with which further queries
* can be made. It also has methods for quick retrieval of common information and nodes, such as the title and body of
* the page.
*
* Depending on the type of [[net.ruippeixotog.scalascraper.browser.Browser]] used to load `Document` objects, the
* respective pages may or may not be dynamic. As such, there are no guarantees of whether the document's location is a
* constant value and that returned [[Element]] instances will be updated as the DOM nodes are updated. The
* documentation of each `Browser` implementation should be read for more information on the semantics of its
* `Document` and `Element` implementations.
*/
trait Document {
/** The type of the elements contained in this document.
*/
type ElementType <: Element.Strict[ElementType]
/** The location of this document.
*/
def location: String
/** The root element of this document.
*/
def root: ElementType
/** The title of this document.
*/
def title: String = root.select("title").headOption.fold("")(_.text.trim)
/** The `head` element of this document.
*/
def head: ElementType = root.select("head").head
/** The `body` element of this document.
*/
def body: ElementType = root.select("body").head
/** The HTML representation of this document as a string.
*/
def toHtml: String
}
object Document {
type Typed[E <: Element.Strict[E]] = Document { type ElementType = E }
}
© 2015 - 2025 Weber Informatics LLC | Privacy Policy