
org.htmlparser.http.package.html Maven / Gradle / Ivy
Show all versions of htmllexer Show documentation
The http package is responsible for HTTP connections to servers.
The Lexer and Parser provide many ways to supply text to be parsed,
but this package only deals with cases where a URL is supplied as a
string, with the expectation that the Lexer or Parser will perform
the HTTP connection.
The {@link org.htmlparser.http.ConnectionManager} class adds
- cookie
- proxy
- password protected URL
capabilities when accessing the internet via the
HTTP protocol.
Each of these capabilities requires conditioning the HTTP connection.
A HTTP header utility class is also included.
The {@link org.htmlparser.http.ConnectionMonitor} interface is a callback
mechanism for the ConnectionManager to notify an interested application
when an HTTP connection is made. Example uses may include conditioning the
connection further, accessing HTTP header information, or providing reporting
or statistical functions. Callbacks are not performed for FileURLConnections,
which are also handled by the connection manager.
The {@link org.htmlparser.http.Cookie} class is a container for
cookie data received and sent in HTTP requests and responses. It may be
necessary to prime the ConnectionManager with cookies received via a
login procedure in order to access protected HTML content.
A typical use of this package, might look something like this:
ConnectionManager manager = Parser.getConnectionManager ();
// set up proxying
manager.setProxyHost ("proxyhost.mycompany.com");
manager.setProxyPort (8888);
manager.setProxyUser ("FredBarnes");
manager.setProxyPassword ("secret");
// set up cookies
Cookie cookie = new Cookie ("USER", "FreddyBaby");
manager.setCookie (cookie, "www.freshmeat.net");
cookie = new Cookie ("PHPSESSID", "e5dbeb6152e70d99427f2458d8969f8b");
cookie.setDomain (".freshmeat.net");
manager.setCookie (cookie, null);
// set up security to access a password protected URL
manager.setUser ("FredB");
manager.setPassword ("holy$cow");
// set up (an inner class) for callbacks
ConnectionMonitor monitor = new ConnectionMonitor ()
{
public void preConnect (HttpURLConnection connection)
{
System.out.println (HttpHeader.getRequestHeader (connection));
}
public void postConnect (HttpURLConnection connection)
{
System.out.println (HttpHeader.getResponseHeader (connection));
}
};
manager.setMonitor (monitor);
// perform the connection
Parser parser = new Parser ("http://frehmeat.net");
The ConnectionManager used by the Parser class is actually held by the
{@link org.htmlparser.lexer.Page#mConnectionManager Page} class.
It is accessible from the Parser (or the Page class) via
{@link org.htmlparser.Parser#getConnectionManager getConnectionManager()}.
It is a static (singleton) instance so that subsequent connections made by the
parser will use the contents of the cookie jar from previous connections.
By default, cookie processing is not enabled. It can be enabled by either
setting a cookie or using
{@link org.htmlparser.http.ConnectionManager#setCookieProcessingEnabled setCookieProcessingEnabled()}.