I am trying to parse and sanitize markdowns on the client and server side.
On the client side, I use PageDown as a tag editor. This is exactly what StackOverflow uses and comes with an excellent preview field. This preview window displays sanitized html, so it removes tags such as tags <div>.
On the server side, I use PegDown and JSoup to analyze and sanitize markdowns.
However, I find cases where the output of these two data is not the same. For instance:
Accounting input: how are <div>tags</div> treated?
PageDown output: <p>how are tags treated?</p>
PegDown / JSoup Output:
<p>how are </p>tags treated?
<p></p>
I am not doing anything with JSoup. Here is my code:
public class Main {
public static void main(String... args){
PegDownProcessor pdp = new PegDownProcessor();
String markdown = "how are <div>tags</div> treated?";
String html = pdp.markdownToHtml(markdown);
Whitelist whitelist = Whitelist.relaxed().removeTags("div");
html = Jsoup.clean(html, whitelist);
System.out.println(html);
System.out.println("Done.");
}
}
, , , . : JSoup , <div> <p>?
, / , / . , . , , , <p>, , .
: html , PageDown?
: OWASP, : t20 > , <p> "" , html, PageDown.