HTML for text: I accept this statement to indicate that all HTML formatting, with the exception of line breaks, will be canceled.
What I did for such an enterprise, I use regexp to detect any set of tags. If the value in the tags is br or br /, a line break is inserted, otherwise the tag will be discarded.
It works only for simple html pages. The tables will obviously be linearized.
I was thinking about determining the value of the title between the wrapper of the title tag so that the converter automatically puts the title at the top of the page. We need to add some more algorithm. To my time it is better to spend with ...
I read about using the Google Data API to load a document into Google Docs, and then using the same API to load / export it as text. Or why text when I could do pdf. But you should get a Google account if you don't already have one.
Download / Export Google Docs Data
Google api docs data for java
Blessed Geek Apr 13 '10 at 6:20 2010-04-13 06:20
source share