Prevent Jsoup from dropping extra spaces

I use Jsoup to sanitize user input from a form. The form in question is contained <textarea>, which expects plain text. When the form is submitted, I clear the input with Jsoup.clean(textareaContents); however, since html ignores extra spaces, it Jsoup.clean()removes valuable space characters from input.

For example, if someone entered some lines of text in textarea:

hello

test

after Jsoup.clean(), you will receive:

hello test

How to Jsoup.clean()save spaces? I know that it is designed to parse html, and it is not html, so is there a better alternative?

+5
source share
3 answers

, , HTML, . < > &lt; &gt; . ( , ).

jsoup HTML cleaner, , HTML- outuput HTML, .

+4

, , TextNode.getWholeText().

:

/**
 * @param cell element that contains whitespace formatting
 * @return
 */
public static String getText(Element cell) {
    String text = null;
    List<Node> childNodes = cell.childNodes();
    if (childNodes.size() > 0) {
        Node childNode = childNodes.get(0);
        if (childNode instanceof TextNode) {
            text = ((TextNode)childNode).getWholeText();
        }
    }
    if (text == null) {
        text = cell.text();
    }
    return text;
}

, ( node). , Element.text().

+8

Neeme Praks . HTML .

<span>This is<br />some text.  Cool story.</span>

"This is"

, , , null.

. , . , , . , HTML .

:

This is<br />some text.  Cool story.

public static String getText(Element cell) {
    StringBuilder textBuilder = new StringBuilder();
    for (Node node : cell.childNodes()) {
        if (node instanceof TextNode) {
            textBuilder.append(((TextNode)node).getWholeText());
        }
        else {
            for (Node childNode : node.childNodes()) {
                textBuilder.append(getText((Element)childNode));
            }
            textBuilder.append(node.outerHtml());
        }
    }
    if (cell.childNodes().isEmpty()) {
        textBuilder.append(cell.outerHtml());
    }
    return textBuilder.toString();
}
+1

All Articles