Is there any decent, custom HTML for the Markdown Java API?

I want to keep the text that I clear from different sources without the HTML tags that are on it, but also preserving as much of the structure as possible as I can reasonably.

Markdown seems to be the solution to this (or perhaps MultiMarkdown).

There is a question asking for a conversion from HTML to markdown, but I want to point out some specific things:

  • ALL links (including images) link only to END (i.e. no embedded URLs)
  • There is NO embedded HTML (I'm not even 100% sure how I would like to deal with complex HTML ... but it will not be embedded!)

So, my question is formulated in the title: is there any decent, custom HTML code for the Markdown Java API?

+5
source share
1 answer

You can try to adapt the HtmlCleaner , which provides a workable interface in the DOM:

TagNode root = htmlCleaner.clean( stream );
Object[] found = root.evaluateXPath( "//div[id='something']" );
if( found.length > 0 && found instanceof TagNode ) {
    ((TagNode)found[0]).removeFromTree();
}

This will allow you to structure the output stream in any format you want using a fairly simple API.

+2
source

All Articles