All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.univocity.api.entity.html.builders.FieldDefinition Maven / Gradle / Ivy

/*
 * Copyright (c) 2013 Univocity Software Pty Ltd. All rights reserved.
 * This file is subject to the terms and conditions defined in file
 * 'LICENSE.txt', which is part of this source code package.
 */

package com.univocity.api.entity.html.builders;

/**
 * Provides the options available for adding fields into a HTML entity, which are defined with the help of
 * {@link com.univocity.api.entity.html.HtmlEntitySettings}, a {@link Group} or a {@link PartialPath} associated with
 * the given entity.
 *
 * @author Univocity Software Pty Ltd - [email protected]
 * @see Group
 * @see PartialPath
 * @see PathStart
 * @see com.univocity.api.entity.html.HtmlEntitySettings
 */
public interface FieldDefinition {
	/**
	 * Associates a regular field with an entity. Regular fields are used by the parser to retain values for a row. When
	 * all values of a row are collected, the parser submits the row to the output, and clears all values collected
	 * for all fields. If the parser collects a value for a field that already contains data, the record will be submitted
	 * to the output and the incoming value will be associated with the given field in a new row.
	 *
	 * For example, you could define a field called "headings" then match `h1` elements to get their text. When the parser
	 * runs, the `h1` elements found the HTML document will be returned and be available in the field "headings", e.g.:
	 *
	 * ```java
	 * HtmlEntityList entityList = new HtmlEntityList();
	 * entityList.configureEntity("heading)
	 *     .addField("headings")
	 *         .match("h1")
	 *         .getText();
	 * ```
	 *
	 * @param fieldName name of the field to be created. If called more than once, a new {@link PathStart} will be
	 *                  returned, allowing multiple paths to be used to collect data into the same field.
	 *
	 * @return a {@link PathStart}, so that a path to the target HTML content to be captured can be defined
	 */
	PathStart addField(String fieldName);

	/**
	 * Associates a persistent field with an entity. A persistent field is a field that retains its value until it is
	 * overwritten by the parser. When all values of a row are collected, the parser submits the row to the output,
	 * and clears the values collected for all fields, except the persistent ones, so they will be reused in subsequent
	 * records.
	 *
	 * An example of using persistent fields can be explained by viewing this HTML:
	 *
	 * ```html
	 * 
*
*

first

*

lorem

*
*
*

second

*

ipsum

*
*
* ``` * * In this example, we want get two rows with three columns: `[55, first, lorem]` and `[55, second, ipsum]`. The value * "55" in both records should come from the `id` of the `div`. The following rules can be defined to produce this output: * * ```java * HtmlEntityList entities = new HtmlEntityList(); * HtmlEntitySettings entity = entities.configureEntity("test"); * * entity.addPersistentField("persistentID").match("div").getAttribute("id"); * entity.addField("title").match("h1").getText(); * entity.addField("text").match("p").getText(); * ``` * * As the "persistentID" field was created as a persistent field, it will retain its value and the parser * will reapply it into subsequent rows. If a regular {@link #addField(String)} were used instead, * the output would be `[55, first, lorem]` and `[null, second, ipsum]` as the `div` and its `id` would be matched * once only. * * **NOTE:** A persistent field is also "silent" and does not trigger new rows (see {@link #addSilentField(String)}. * If a persistent field's path finds another match while processing the same record, the first value will be * replaced by the new one, and no new records will be generated. * * A {@link RecordTrigger} can be used to force new rows to be generated. * * @param fieldName name of the persistent field to be created. If called more than once, a new {@link PathStart} * will be returned, allowing multiple paths to be used to collect data into the same field. * * @return a {@link PathStart}, so that a path to the target HTML content to be captured can be defined */ PathStart addPersistentField(String fieldName); /** * Associates a "silent" field with an entity. A silent field does not trigger new records when values of a field * are overwritten, i.e. if the parser collects a value for a field that already contains data, * and the field is silent, it won't submit a new record. The parser will simply replace the previously collected value * with the newly parsed value. * * A {@link RecordTrigger} can be used to force new rows to be generated. * * A usage example of silent fields can be shown with this HTML document: * * ```html *
*
*

first

*

lorem

*

second

*
*
* ``` * * To get the text of the `p` element along with the **second** header: * * ```java * HtmlEntityList entities = new HtmlEntityList(); * HtmlEntitySettings entity = entities.configureEntity("test"); * * entity.addSilentField("silent") * .match("h1") * .containedBy("article") * .getText(); * * entity.addField("text").match("article").match("p").getText(); * ``` * * The parser will return `[second, lorem]`. When the parser finishes parsing the `p` element, the row will actually * be `[first, lorem]`. As soon as the parser finds the second `h1` element, instead of creating a new row with this value, * it will replace the "first" `String` with "second" generating the row `[second, lorem]`. * * If `addField` was used in this example instead of `addSilentField`, two rows would be produced: * `[first, lorem]` and `[second, null]` * * @param fieldName name of the silent field to be created. If called more than once, a new {@link PathStart} * will be returned, allowing multiple paths to be used to collect data into the same field. * * @return a {@link PathStart}, so that a path to the target HTML content to be captured can be defined */ PathStart addSilentField(String fieldName); /** * Creates a field that with a specified value. An example to use this method can * be shown with this HTML document: * * ```html *
*
*

first

*

lorem

*
*
*

second

*

ipsum

*
*
*

third

*

lol

*
*
* ``` * * And the following code: * * ```java * HtmlEntityList entities = new HtmlEntityList(); * HtmlEntitySettings entity = entities.configureEntity("test"); * * // creates a constant field * entity.addField("constant","cool article"); * * // regular fields * entity.addField("title").match("h1").getText(); * entity.addField("content").match("p").getText(); * ``` * * When the parser runs, it will get the text from each article heading and `p` element. It will also attach the * constant "cool article" to the first column of each row, producing: * * ``` * [cool article, first, lorem] * [cool article, second, ipsum] * [cool article, third, lol] * ``` * * @param fieldName name of the field to be created * @param constantValue a constant value associated with the given field */ void addField(String fieldName, String constantValue); /** * Adds a field to this entity whose value will be populated with the value collected by the parent entity. * The parent entity is the entity that "owns" a link follower directly. * * @param fieldName the name of the parent entity field to be added and whose value will be copied over. */ void addFieldFromParent(String fieldName); /** * Adds a field to this entity whose value will be populated with the value collected by a parent entity. * The parent entity can be any entity in the link following path. * * @param parentEntity the name of the parent entity who has the given field name. * @param fieldName the name of the parent entity field to be added and whose value will be copied over. */ void addFieldFrom(String parentEntity, String fieldName); }




© 2015 - 2025 Weber Informatics LLC | Privacy Policy