I'm just starting out with HTMLUnit, and what I'm looking for is to take a webpage and extract raw text from it minus all the html markup.
Can htmlunit accomplish this? If so, how? Or is there another library I should look at?
for example, if the page contains
<body><p>para1 test info</p><div><p>more stuff here</p></div>
I would like him to output
para1 test info more stuff here
thank
source
share