I do this fairly reliably with HTML Parser (assuming the HTML document does not change its structure). A web service with a stable API is much better, but sometimes we don’t have one.
General idea:
, (div, meta, span ..) , , . :
<span class="price"> $7.95</span>
"", span class "".
HTML Parser .
filter = new HasAttributeFilter("class", "price");
, Nodes, instanceof, , , , span -
if (node instanceof Span) // or any other supported element.
.
HTML Parser , :
:
<meta name="description" content="Amazon.com: frankenstein: Books"/>
:
import org.htmlparser.Node;
import org.htmlparser.Parser;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;
import org.htmlparser.filters.HasAttributeFilter;
import org.htmlparser.tags.MetaTag;
public class HTMLParserTest {
public static void main(String... args) {
Parser parser = new Parser();
HasAttributeFilter filter = new HasAttributeFilter("name", "description");
try {
parser.setResource("http://www.youtube.com");
NodeList list = parser.parse(filter);
Node node = list.elementAt(0);
if (node instanceof MetaTag) {
MetaTag meta = (MetaTag) node;
String description = meta.getAttribute("content");
System.out.println(description);
}
} catch (ParserException e) {
e.printStackTrace();
}
}
}