public class HTMLPageParser extends Object implements PageParser
Builds an HTMLPage object from an HTML document. This behaves similarly to the FastPageParser, however it's a complete rewrite that is simpler to add custom features to such as extraction and transformation of elements.
To customize the rules used, this class can be extended and have the userDefinedRules() methods overridden.
HTMLProcessor| Constructor and Description |
|---|
HTMLPageParser() |
| Modifier and Type | Method and Description |
|---|---|
protected void |
addUserDefinedRules(State html,
PageBuilder page) |
Page |
parse(char[] data)
This builds a Page.
|
public Page parse(char[] data) throws IOException
PageParserparse in interface PageParserIOExceptionprotected void addUserDefinedRules(State html, PageBuilder page)