public class CmsHtmlParser extends org.htmlparser.visitors.NodeVisitor implements I_CmsHtmlNodeVisitor
NodeVisitor
implementations, which provides some often used utility functions.
This base implementation is only a "pass through" class, that is the content is parsed, but the generated result is exactly identical to the input.
| 限定符和类型 | 字段和说明 |
|---|---|
protected boolean |
m_echo
Indicates if "echo" mode is on, that is all content is written to the result by default.
|
protected java.util.List<java.lang.String> |
m_noAutoCloseTags
List of upper case tag name strings of tags that should not be auto-corrected if closing divs are missing.
|
protected java.lang.StringBuffer |
m_result
The buffer to write the out to.
|
protected static java.lang.String[] |
TAG_ARRAY
The array of supported tag names.
|
protected static java.util.List<java.lang.String> |
TAG_LIST
The list of supported tag names.
|
| 构造器和说明 |
|---|
CmsHtmlParser()
Creates a new instance of the html converter with echo mode set to
false. |
CmsHtmlParser(boolean echo)
Creates a new instance of the html converter.
|
| 限定符和类型 | 方法和说明 |
|---|---|
protected java.lang.String |
collapse(java.lang.String string)
Collapse HTML whitespace in the given String.
|
protected org.htmlparser.PrototypicalNodeFactory |
configureNoAutoCorrectionTags()
Internally degrades Composite tags that do have children in the DOM tree
to simple single tags.
|
java.lang.String |
getConfiguration()
Returns the configuartion String of this visitor or the empty String if was not provided
before.
|
java.util.List<java.lang.String> |
getNoAutoCloseTags()
Returns a list of upper case tag names for which parsing / visiting will not correct missing closing tags.
|
java.lang.String |
getResult()
Returns the text extraction result.
|
java.lang.String |
getTagHtml(org.htmlparser.Tag tag)
Returns the HTML for the given tag itself (not the tag content).
|
java.lang.String |
process(java.lang.String html,
java.lang.String encoding)
Extracts the text from the given html content, assuming the given html encoding.
|
void |
setConfiguration(java.lang.String configuration)
Set a configuartion String for this visitor.
|
void |
setNoAutoCloseTags(java.util.List<java.lang.String> noAutoCloseTagList)
Sets a list of upper case tag names for which parsing / visiting should not correct missing closing tags.
|
void |
visitEndTag(org.htmlparser.Tag tag)
Visitor method (callback) invoked when a closing Tag is encountered.
|
void |
visitRemarkNode(org.htmlparser.Remark remark)
Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
|
void |
visitStringNode(org.htmlparser.Text text)
Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
|
void |
visitTag(org.htmlparser.Tag tag)
Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.
|
protected java.util.List<java.lang.String> m_noAutoCloseTags
protected static final java.lang.String[] TAG_ARRAY
protected static final java.util.List<java.lang.String> TAG_LIST
protected boolean m_echo
protected java.lang.StringBuffer m_result
public CmsHtmlParser()
false.
public CmsHtmlParser(boolean echo)
echo - indicates if "echo" mode is on, that is all content is written to the resultprotected org.htmlparser.PrototypicalNodeFactory configureNoAutoCorrectionTags()
setNoAutoCloseTags(List)public java.lang.String getConfiguration()
I_CmsHtmlNodeVisitorgetConfiguration 在接口中 I_CmsHtmlNodeVisitorI_CmsHtmlNodeVisitor.getConfiguration()public java.lang.String getResult()
I_CmsHtmlNodeVisitorgetResult 在接口中 I_CmsHtmlNodeVisitorI_CmsHtmlNodeVisitor.getResult()public java.lang.String getTagHtml(org.htmlparser.Tag tag)
tag - the tag to create the HTML forpublic java.lang.String process(java.lang.String html,
java.lang.String encoding)
throws org.htmlparser.util.ParserException
I_CmsHtmlNodeVisitorprocess 在接口中 I_CmsHtmlNodeVisitorhtml - the content to extract the plain text fromencoding - the encoding to useorg.htmlparser.util.ParserException - if something goes wrongI_CmsHtmlNodeVisitor.process(java.lang.String, java.lang.String)public void setConfiguration(java.lang.String configuration)
I_CmsHtmlNodeVisitorThis will most likely be done with data from an xsd, custom jsp tag, ...
setConfiguration 在接口中 I_CmsHtmlNodeVisitorconfiguration - the configuration of this visitor to set.I_CmsHtmlNodeVisitor.setConfiguration(java.lang.String)public void visitEndTag(org.htmlparser.Tag tag)
I_CmsHtmlNodeVisitorvisitEndTag 在接口中 I_CmsHtmlNodeVisitorvisitEndTag 在类中 org.htmlparser.visitors.NodeVisitortag - the tag that is ended.I_CmsHtmlNodeVisitor.visitEndTag(org.htmlparser.Tag)public void visitRemarkNode(org.htmlparser.Remark remark)
I_CmsHtmlNodeVisitorvisitRemarkNode 在接口中 I_CmsHtmlNodeVisitorvisitRemarkNode 在类中 org.htmlparser.visitors.NodeVisitorremark - the remark Tag to visit.I_CmsHtmlNodeVisitor.visitRemarkNode(org.htmlparser.Remark)public void visitStringNode(org.htmlparser.Text text)
I_CmsHtmlNodeVisitorvisitStringNode 在接口中 I_CmsHtmlNodeVisitorvisitStringNode 在类中 org.htmlparser.visitors.NodeVisitortext - the text that is visited.I_CmsHtmlNodeVisitor.visitStringNode(org.htmlparser.Text)public void visitTag(org.htmlparser.Tag tag)
I_CmsHtmlNodeVisitorvisitTag 在接口中 I_CmsHtmlNodeVisitorvisitTag 在类中 org.htmlparser.visitors.NodeVisitortag - the tag that is visited.I_CmsHtmlNodeVisitor.visitTag(org.htmlparser.Tag)protected java.lang.String collapse(java.lang.String string)
string - the string to collapsepublic java.util.List<java.lang.String> getNoAutoCloseTags()
public void setNoAutoCloseTags(java.util.List<java.lang.String> noAutoCloseTagList)
setNoAutoCloseTags 在接口中 I_CmsHtmlNodeVisitornoAutoCloseTagList - a list of upper case tag names for which parsing / visiting
should not correct missing closing tags to set.