Convert paragraph breaks and paragraph breaks to a new line in java

I basically have an HTML snippet with <br>and <p></p>inside. I managed to remove all the HTML tags, but this leaves the text in a bad format.

I want something like nl2br()in PHP, besides the reverse input and output, and also takes tags into account <p>. is there a library for it in java?

+5
source share
3 answers

You basically need to replace each <br>with \nand each <p>with \n\n. So, in those places where you manage to remove them, you need to insert \nand \n\naccordingly.

Jsoup HTML- ( HTML , , ).

public static void main(String[] args) throws Exception {
    String originalHtml = "<p>p1l1<br/><!--</p>-->p1l2<br><!--<p>--></br><p id=p>p2l1<br class=b>p2l2</p>";
    String text = br2nl(originalHtml);
    String newHtml = nl2br(text);

    System.out.println("-------------");
    System.out.println(text);
    System.out.println("-------------");
    System.out.println(newHtml);
}

public static String br2nl(String html) {
    Document document = Jsoup.parse(html);
    document.select("br").append("\\n");
    document.select("p").prepend("\\n\\n");
    return document.text().replace("\\n", "\n");
}

public static String nl2br(String text) {
    return text.replace("\n\n", "<p>").replace("\n", "<br>");
}

(: replaceAll() , charsequence-by-charsequence , regexpattern-by-charsequence)

:

<p>p1l1<br/><!--</p>-->p1l2<br><!--<p>--></br><p id=p>p2l1<br class=b>p2l2</p>
-------------


p1l1 
p1l2 



p2l1 
p2l2
-------------
<p>p1l1 <br>p1l2 <br> <br> <p>p2l1 <br>p2l2

, .

+12

br2nl p2nl . :

String plain = htmlText.replaceAll("<br>","\\n").replaceAll("<p>","\\n\\n").replaceAll("</p>","");
+3

replaceAll. . http://www.rgagnon.com/javadetails/java-0454.html . 2 , p br. , , html slash n

+1

All Articles