Getting source text from JTextPane

In my application, I use JTextPane to display some log data. Since I want to pin some specific lines in this text (for example, error messages), I set the contentType to " text/html ". That way I can format the text.

Now I create a JButton that copies the contents of this JTextPane to the clipboard. This part is simple, but my problem is that when I call myTextPane.getText() , I get HTML code, for example:

 <html> <head> </head> <body> blabla<br> <font color="#FFCC66"><b>foobar</b></font><br> blabla </body> </html> 

instead of getting only raw content:

 blabla foobar blabla 

Is there a way to get only the text of my JTextPane as plain text? Or do I need to convert HTML to source?

+7
java swing jtextpane
source share
4 answers

Based on accepted answer to: Removing HTML from a Java string

 MyHtml2Text parser = new MyHtml2Text(); try { parser.parse(new StringReader(myTextPane.getText())); } catch (IOException ee) { //handle exception } System.out.println(parser.getText()); 

A slightly modified version of the Html2Text class found in the answer I linked to

 import java.io.IOException; import javax.swing.text.html.*; import javax.swing.text.html.parser.*; public class MyHtml2Text extends HTMLEditorKit.ParserCallback { StringBuffer s; public MyHtml2Text() {} public void parse(Reader in) throws IOException { s = new StringBuffer(); ParserDelegator delegator = new ParserDelegator(); delegator.parse(in, this, Boolean.TRUE); } public void handleText(char[] text, int pos) { s.append(text); s.append("\n"); } public String getText() { return s.toString(); } } 

If you need finer processing, consider using a larger interface defined by HTMLEditorKit.ParserCallback

+5
source share

No need to use ParserCallback. Just use:

 textPane.getDocument().getText(0, textPane.getDocument().getLength()) ); 
+16
source share

You need to do it yourself, unfortunately. Imagine if some content was specific to HTML, such as images - the textual representation is unclear. Include alt text or not, for example.

+2
source share

(is RegExp allowed? This is not parsing, is it)

Take the result of getText () and use String.replaceAll () to filter all the tags. Than crop () to remove leading and trailing spaces. For the spaces between your first and last "blabla" I do not see a general solution. Perhaps you can spill the rest around CRLF and crop all the lines again.

(I am not a regular expression expert - maybe someone can provide a regular expression and earn some reputation;))

Edit

.. I just assumed that you are not using < and > in your text, otherwise it .. let's say this is a challenge.

+2
source share

All Articles