All Downloads are FREE. Search and download functionalities are using the official Maven repository.

doc.api.au.id.jericho.lib.html.OutputDocument.html Maven / Gradle / Ivy

Go to download

Jericho HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. It also provides high-level HTML form manipulation functions.

There is a newer version: 2.3
Show newest version






OutputDocument (Jericho HTML Parser 1.5-dev1)

















au.id.jericho.lib.html
Class OutputDocument

java.lang.Object
  extended byau.id.jericho.lib.html.OutputDocument

public final class OutputDocument
extends java.lang.Object

Represents a modified version of an original source text.

An OutputDocument represents an original source text that has been modified by substituting segments of it with other text. Each of these substitutions is registered by adding an IOutputSegment to the OutputDocument. After all of the substitutions have been added, the modified text can be retrieved using the output(Writer) or toString() methods.

The registered OutputSegments must not overlap each other, but may be adjacent.

The following example converts all externally referenced style sheets to internal style sheets:

  OutputDocument outputDocument=new OutputDocument(htmlText);
  Source source=new Source(htmlText);
  StringBuffer sb=new StringBuffer();
  List linkStartTags=source.findAllStartTags(Tag.LINK);
  for (Iterator i=linkStartTags.iterator(); i.hasNext();) {
    StartTag startTag=(StartTag)i.next();
    Attributes attributes=startTag.getAttributes();
    String rel=attributes.getValue("rel");
    if (!"stylesheet".equalsIgnoreCase(rel)) continue;
    String href=attributes.getValue("href");
    if (href==null) continue;
    String styleSheetContent;
    try {
      styleSheetContent=CommonTools.getString(new URL(href).openStream()); // note CommonTools.getString method is not defined here
    } catch (Exception ex) {
      continue; // don't convert if URL is invalid
    }
    sb.setLength(0);
    sb.append("<style");
    Attribute typeAttribute=attributes.get("type");
    if (typeAttribute!=null) sb.append(' ').append(typeAttribute);
    sb.append(">\n").append(styleSheetContent).append("\n</style>");
    outputDocument.add(new StringOutputSegment(startTag,sb.toString()));
  }
  String convertedHtmlText=outputDocument.toString();
 

See Also:
IOutputSegment, StringOutputSegment

Constructor Summary
OutputDocument(java.lang.CharSequence sourceText)
          Constructs a new OutputDocument based on the specified source text.
 
Method Summary
 void add(FormControl formControl)
          ***************************
 void add(FormFields formFields)
          ***************************
 void add(IOutputSegment outputSegment)
          Adds the specified output segment to this OutputDocument.
 java.lang.CharSequence getSourceText()
          Returns the original source text upon which this OutputDocument is based.
 void output(java.io.Writer writer)
          Outputs the final content of this OutputDocument to the specified Writer.
 java.lang.String toString()
          Returns the final content of this OutputDocument as a String.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

OutputDocument

public OutputDocument(java.lang.CharSequence sourceText)
Constructs a new OutputDocument based on the specified source text.

Note that a Source object can be passed directly as an argument to this constructor as it implements the CharSequence interface.

Parameters:
sourceText - the source text.
Method Detail

getSourceText

public java.lang.CharSequence getSourceText()
Returns the original source text upon which this OutputDocument is based.

Returns:
the original source text upon which this OutputDocument is based.

add

public void add(IOutputSegment outputSegment)
Adds the specified output segment to this OutputDocument.

Note that for efficiency reasons no exception is thrown if the added output segment overlaps another, however in this case an OverlappingOutputSegmentsException will be thrown when the output is generated.


add

public void add(FormControl formControl)
***************************


add

public void add(FormFields formFields)
***************************


output

public void output(java.io.Writer writer)
            throws java.io.IOException
Outputs the final content of this OutputDocument to the specified Writer.

An OverlappingOutputSegmentsException is thrown if any of the output segments overlap. For efficiency reasons this condition is not caught when the offending output segment is added.

Throws:
java.io.IOException - if an I/O exception occurs.
OverlappingOutputSegmentsException - if any of the output segments overlap.

toString

public java.lang.String toString()
Returns the final content of this OutputDocument as a String.

Returns:
the final content of this OutputDocument as a String.
Throws:
OverlappingOutputSegmentsException - if any of the output segments overlap.






© 2015 - 2024 Weber Informatics LLC | Privacy Policy