I found using hpple quite useful for parsing dirty HTML. The Hpple project is an Objective-C wrapper in the XPathQuery HTML parsing library. Using it, you can send an XPath request and get the result.
Requirements
-Add libxml2 includes in your project
- Project Menu-> Change Project Settings
- Search for Header Search Paths
- Add a new search path "$ {SDKROOT} / usr / include / libxml2"
- Enable recursive option
-add libxml2 library to your project
- Project Menu-> Change Project Settings
- Search for Other Linker Flags Settings
- Add new search flag "-lxml2"
-From hpple get the following source code files, add them to your project:
- TFpple.h
- TFpple.m
- TFppleElement.h
- TFppleElement.m
- XPathQuery.h
- XPathQuery.m
- Take a walk through the w3school XPath Tutorial to feel comfortable with XPath.
Code example
#import "TFHpple.h" NSData *data = [[NSData alloc] initWithContentsOfFile:@"example.html"];
Known Issues
Since hpple is a wrapper on top of XPathQuery, which is another wrapper, this option is probably not the most efficient. If performance is a problem in your project, I recommend that you code your own lightweight solution based on the hpple library code and xpathquery.
Albaregar Oct 24 '09 at 15:30 2009-10-24 15:30
source share