How to implement parsing your own HTML tags in text

Question

How to implement parsing your own HTML tags in text

I have a task to implement my own tags that make the text bold , underline or strike out using any nesting. how

*bold text* _underlinetext_ -strikethrough-

I also need to make my own hyperlink like

 [link | http://stackoverflow.com]

The first thought that came up was to apply a regular expression. Code:

 View.prototype.parseText = function(text) { text = text.replace(/\*([^\*]+)\*/g, '<b>$1</b>'); text = text.replace(/\_([^\_]+)\_/g, '<u>$1</u>'); text = text.replace(/\-([^\-]+)\-/g, '<s>$1</s>'); text = text.replace(/\[([^\|].+)\|(.+)\]/g, '<a href="$2">$1</a>'); return text;};

It works, but I need extensibility. Regex is not a good idea, as it is hard-coded. How to implement this task with a state machine (or any jQuery plugin)? I would appreciate any help.

+4

javascript jquery finite-state-machine

milgoff Dec 22 '12 at 15:08

source share

3 answers

No matter what you do to expand the tag system, you need to: 1. Define the tag and 2. Replace it with equivalent HTML.

Even if you write your own parser in js, at the end of the day you will still have to complete the 2 above steps, so it will not be more extensible than what you have now.

Regex is a tool for specifying if you have no other requirements (for example, replace only inside such an element, but do something else in another element that requires parsing).

You can transfer regular expression calls to a function and simply add regular expressions to this function when you need to extend this function. If necessary on multiple pages, add it to an external js file.

 function formatUserContent(text) { text = text.replace(/\*([^\*]+)\*/g, '<b>$1</b>'); text = text.replace(/\_([^\_]+)\_/g, '<u>$1</u>'); text = text.replace(/\-([^\-]+)\-/g, '<s>$1</s>'); text = text.replace(/\[([^\|].+)\|(.+)\]/g, '<a href="$2">$1</a>'); return text; }

After that, expanding a function is as simple as adding

 text = text.replace(/\+([^\-]+)\+/g, '<em>$1</em>');

in the body of the function. I doubt that deploying your own final state machine will be easier to spread, just the opposite.

Spending hours on a state machine in the hope that it can save several minutes at some unknown time in the future is just not a good investment ... unless, of course, you want you to write a state machine, in this case forward.

As a side note, I would recommend making your regex even more proof of a fool.

 text = text.replace(/\[([^\|].+)\|\s*(http://.+)\]/g, '<a href="$2">$1</a>');

(If you do not have user interface elements that will perform the task for the user)

+3

Sylverdrag Dec 22 '12 at 16:07

source share

Perhaps you want to use an existing library, such as the Markdown library, at http://www.showdown.im/

If you prefer to write your own, I would recommend looking at the source code to see how it is analyzed (and possibly the source code for Markdown processors in other languages). Some recommendations for you:

Use jQuery to control markup
Do not use regular expressions to parse a language. You will encounter problems when markup elements are mixed together.

+1

Rob Dec 22 '12 at 15:36

source share

Minko gechev · Accepted Answer · 2012-12-22T16:16:03+0000

I can offer you the following implementation of http://jsfiddle.net/NwRCm/5/

It uses a state template (slightly modified due to JavaScript and purpose). Under the surface, all states are implemented with regular expressions, but this is the most effective way, in my opinion.

 /* View definition */ function View(container) { this.container = container; this._parsers = []; this._currentState = 0; }; View.prototype.parse = function(text) { var self = this; this._parsers.forEach(function (e) { self._parse(e); }); return this.container.innerHTML; }; View.prototype._parse = function (parser) { var text = parser.parse(this.container.innerHTML); this.container.innerHTML = text; return text; }; View.prototype.nextState = function () { if (this._currentState < this._parsers.length) { return this._parse(this._parsers[this._currentState++]); } return null; }; View.prototype.addParser = function (parser) { if (parser instanceof Parser) { return this._parsers.push(parser); } else { throw 'The parser you\'re trying to add is not an instance of Parser'; } }; /* end of the View definition */ /* Simulation of interface */ function Parser() {}; Parser.prototype.parse = function () { throw 'Not implemented!'; }; /* Implementation of bold parser */ function BoldParser() {}; BoldParser.prototype = new Parser(); BoldParser.prototype.parse = function (text) { text = text.replace(/\*([^\*]+)\*/g, '<b>$1</b>'); return text; }; /* Implementation of underline parser */ function UnderlineParser() {}; UnderlineParser.prototype = new Parser(); UnderlineParser.prototype.parse = function (text) { text = text.replace(/\_([^\_]+)\_/g, '<u>$1</u>'); return text; }; /* Link parser */ function LinkParser() {}; LinkParser.prototype = new Parser(); LinkParser.prototype.parse = function (text) { text = text.replace(/\[([^\|].+)\|(.+)\]/g, '<a href="$2">$1</a>'); return text; }; var v = new View(document.getElementById('container')); v.addParser(new UnderlineParser()); v.addParser(new BoldParser()); v.addParser(new LinkParser()); v.nextState(); v.nextState(); v.nextState();

Let me take a little deeper into the implementation. First we have a base “class” (constructor function). Each view has a base container and a list of parsers; it also remembers which parser should be applied next.

After that, we have an "abstract class" (a constructor function with a method in the prototype that throws an exception) with the name Parser it defines the parse method, which must be implemented by each parser.

After that, we simply define different specific parsers and add them to the view. We can pass states one by one ( View nextState ) or pass all states in one method call ( View parse ). We can dynamically add new parsers.

A thing that can be approved includes a factory flies to control parsers.

The approach with the abstract constructor function is also very useful when implementing various templates, such as, for example, the Template method.

Editing may be a little overhead due to the definition of all these design functions and objects. Everything can be done with callbacks, that is, each state will be a different function. I used this approach because I was looking for the easiest to understand, clear from the answer to the specific functions of the language. I hope that I have achieved this.

How to implement parsing your own HTML tags in text

More articles: