For any arbitrary text file full of printable chara...">

Rendering Plaintext as HTML that supports spaces - without <pre class = "prettyprint-override">

For any arbitrary text file full of printable characters, how can it be converted to HTML, which will be displayed in the same way (with the following requirements)?

  • Doesn't rely on any HTML rules without spaces by default
    • No <pre>
    • CSS white-space rules
    Tags
  • <p> are accurate but not required ( <br /> and / or <div> are fine)
  • Spacebar is supported exactly.

    Given the following input lines (ignore erroneous highlighting of automatic syntax):

     Line one Line two, indented four spaces 

    The browser should conclude exactly the same, keeping the indentation of the second line and the gap between the "indentation" and the "spaces". Of course, I'm actually not looking for a monospaced output, and the font is orthogonal to the algorithm / markup.

    Given that the two lines are the full input file, an example of the correct output would be:

     Line one<br />&nbsp;&nbsp;&nbsp;&nbsp;Line two, indented&nbsp;&nbsp;&nbsp; four spaces 
  • Soft packaging in the browser is desirable. That is, the resulting HTML should not force the user to scroll even if the input lines are wider than their viewing area (provided that individual words are still narrowed than the specified viewing screen).

Im looking for a fully defined algorithm . Bonus points to implement in python or javascript .

(Please do not just reply that I have to use <pre> tags or the white-space CSS rule, as my requirements make these parameters invalid. Also, do not send unverified and / or naive sentences, such as "replace all spaces" with &nbsp; "In the end, Im a positive solution is technically possible - its an interesting problem, don't you think?)

+7
source share
4 answers

The solution to this, while still allowing the browser to wrap long lines , is to replace each sequence of two spaces with a space and space without a gap.

The browser will correctly display all spaces (normal and non-breaking), while at the same time keeping long lines (due to normal spaces).

JavaScript:

 text = html_escape(text); // dummy function text = text.replace(/\t/g, ' ') .replace(/ /g, '&nbsp; ') .replace(/ /g, ' &nbsp;') // second pass // handles odd number of spaces, where we // end up with "&nbsp;" + " " + " " .replace(/\r\n|\n|\r/g, '<br />'); 
+14
source

Use zero-width space ( &#8203; ) to preserve spaces and allow text to flow around. The basic idea is to connect each space or sequence of spaces with a space of zero width. Then replace each space with a non-breaking space. You also want to encode html and add line breaks.

If you don't like Unicode characters, this is trivial. You can simply use string.replace() :

 function textToHTML(text) { return ((text || "") + "") // make sure it is a string; .replace(/&/g, "&amp;") .replace(/</g, "&lt;") .replace(/>/g, "&gt;") .replace(/\t/g, " ") .replace(/ /g, "&#8203;&nbsp;&#8203;") .replace(/\r\n|\r|\n/g, "<br />"); } 

If this is normal for a white wrap space, connect each space with a zero-width space, as described above. Otherwise, to keep the empty space together, map each sequence of spaces to a space of zero width:

  .replace(/ /g, "&nbsp;") .replace(/((&nbsp;)+)/g, "&#8203;$1&#8203;") 

To encode Unicode characters is a bit more complicated. You need to iterate the line:

 var charEncodings = { "\t": "&nbsp;&nbsp;&nbsp;&nbsp;", " ": "&nbsp;", "&": "&amp;", "<": "&lt;", ">": "&gt;", "\n": "<br />", "\r": "<br />" }; var space = /[\t ]/; var noWidthSpace = "&#8203;"; function textToHTML(text) { text = (text || "") + ""; // make sure it is a string; text = text.replace(/\r\n/g, "\n"); // avoid adding two <br /> tags var html = ""; var lastChar = ""; for (var i in text) { var char = text[i]; var charCode = text.charCodeAt(i); if (space.test(char) && !space.test(lastChar) && space.test(text[i + 1] || "")) { html += noWidthSpace; } html += char in charEncodings ? charEncodings[char] : charCode > 127 ? "&#" + charCode + ";" : char; lastChar = char; } return html; } 

Now just a comment. Without using monospace fonts, you will lose some formatting. Consider how these lines of text in a monospaced font form columns:

 ten seven spaces eleven four spaces 

Without a monospace font, you will lose the columns:

eleven four spaces

It seems that the correction algorithm will be very complicated.

+9
source

Although this does not quite meet all your requirements - firstly, it does not handle tabs, I used the following stone, which adds the wordWrap() method to Javascript String s several times to make something look like what you are describing - like this that this can be a good starting point for coming up with something that also does the extra things you want.

 //+ Jonas Raoni Soares Silva //@ http://jsfromhell.com/string/wordwrap [rev. #2] // String.wordWrap(maxLength: Integer, // [breakWith: String = "\n"], // [cutType: Integer = 0]): String // // Returns an string with the extra characters/words "broken". // // maxLength maximum amount of characters per line // breakWith string that will be added whenever one is needed to // break the line // cutType 0 = words longer than "maxLength" will not be broken // 1 = words will be broken when needed // 2 = any word that trespasses the limit will be broken String.prototype.wordWrap = function(m, b, c){ var i, j, l, s, r; if(m < 1) return this; for(i = -1, l = (r = this.split("\n")).length; ++i < l; r[i] += s) for(s = r[i], r[i] = ""; s.length > m; r[i] += s.slice(0, j) + ((s = s.slice(j)).length ? b : "")) j = c == 2 || (j = s.slice(0, m + 1).match(/\S*(\s)?$/))[1] ? m : j.input.length - j[0].length || c == 1 && m || j.input.length + (j = s.slice(m).match(/^\S*/)).input.length; return r.join("\n"); }; 

I would also like to comment that it seems to me that in the general case you want to use a monospace font if tabs are involved, because the width of the words will vary depending on the proportional font used (making the results of using tabs is highly dependent on the font).

Refresh . Here's a slightly more readable version provided by the online javascript beautifier :

 String.prototype.wordWrap = function(m, b, c) { var i, j, l, s, r; if (m < 1) return this; for (i = -1, l = (r = this.split("\n")).length; ++i < l; r[i] += s) for (s = r[i], r[i] = ""; s.length > m; r[i] += s.slice(0, j) + ((s = s.slice(j)).length ? b : "")) j = c == 2 || (j = s.slice(0, m + 1).match(/\S*(\s)?$/))[1] ? m : j.input.length - j[0].length || c == 1 && m || j.input.length + (j = s.slice(m).match(/^\S*/)).input.length; return r.join("\n"); }; 
+2
source

It is very simple if you use jQuery library in your project.

Just one line, add asHTml extenstion to the String class and:

 var plain='&lt;a&gt; i am text plain &lt;/a&gt;' plain.asHtml(); /* '<a> i am text plain </a>' */ 

DEMO: http://jsfiddle.net/abdennour/B6vGG/3/

Note. You do not need to have access to DoM. Just use the jQuery $('<tagName />') constructor design pattern $('<tagName />')

0
source

All Articles