Match PegDown + JSoup output to PageDown output

I am trying to parse and sanitize markdowns on the client and server side.

  • On the client side, I use PageDown as a tag editor. This is exactly what StackOverflow uses and comes with an excellent preview field. This preview window displays sanitized html, so it removes tags such as tags <div>.

  • On the server side, I use PegDown and JSoup to analyze and sanitize markdowns.

However, I find cases where the output of these two data is not the same. For instance:

Accounting input: how are <div>tags</div> treated?

PageDown output: <p>how are tags treated?</p>

PegDown / JSoup Output:

<p>how are </p>tags treated?
<p></p>

I am not doing anything with JSoup. Here is my code:

public class Main {

    public static void main(String... args){

        PegDownProcessor pdp = new PegDownProcessor();

        String markdown = "how are <div>tags</div> treated?";

        String html = pdp.markdownToHtml(markdown);

        Whitelist whitelist = Whitelist.relaxed().removeTags("div");

        html = Jsoup.clean(html, whitelist);
        System.out.println(html);

        System.out.println("Done.");
    }
}

, , , . : JSoup , <div> <p>?

, / , / . , . , , , <p>, , .

: html , PageDown?

: OWASP, : t20 > , <p> "" , html, PageDown.

+4
1

JSoup , <div>

HTML 5 div p. Jsoup , html p.

, , , Jsoup#clean :

  • Parse dirty html
  • HTML 5

2 <p> div. p . Jsoup , , (.. ).

1 2 HTML-, HTML 5. 3 div .

/ , / .

, , , . Pagedown Javascript, Javascript .

:

  • Nashorn ( Java 8)
  • Rhino
  • V8

, Nashorn:

Caller.java

ScriptEngine engine = new ScriptEngineManager().getEngineByName("nashorn");
engine.eval(new FileReader("script.js"));

Invocable invocable = (Invocable) engine;

Object result = invocable.invokeFunction("myFunction", "fooValue");

System.out.println(result);
System.out.println(result.getClass());

script.js

function myFunction(foo) {
   // ...
}

.

+2

All Articles