What is the easiest way to convert a dump of SO data from HTML back to Markdown?

I just got access to the Stackoverflow data dump , and I'm disappointed that the message body field is in HTML, not Markdown.I suspect Markdown is in the original database, because this is what I see if I try to edit the answer.

I want to restore Markdown from a large set of answers. I will process hundreds of records in batch mode using either command line tools or some Lua or C library, so an interactive tool like wmd Markdown editor is not suitable. Can people say What tools are available to help me recover Markdown from a Stackoverflow data dump?


(A related question, not a duplicate: Convert HTML back to Markdown in wmd .)

+6
c lua markdown
source share
2 answers

Markdownify converts HTML to Markdown.

See also: MetaSO / Can Markdown be recovered from a SO data dump?

+5
source share

take a look at pandoc: http://johnmacfarlane.net/pandoc/

There is an html2markdown tool included with pandoc that works very well, and the program runs from the command line, which makes batch conversion pretty enjoyable.

here is the man page: http://johnmacfarlane.net/pandoc/html2markdown.1.html

+2
source share

All Articles