Volumetric Hebrew and English text in div

I am trying to add a span tag around the Hebrew and English sentences in the paragraph. For example. "so everything that happens, אתכם?" will become:

[span]so[/span][span]היי[/span][span]all whats up[/span][span]אתכם[/span] 

I try with a regular expression, but just delete the Hebrew words and join the English words in one space.

 var str = 'so היי all whats up אתכם?' var match= str.match(/(\b[az]+\b)/ig); var replace = match.join().replace(match.join(),'<span>'+match.join()+'</span>') 
+8
javascript html regex
source share
3 answers

Previous answers here did not take into account the requirements of the whole word. In fact, this is difficult to achieve, since the word boundary \b does not support word boundaries with neighboring Unicode characters in Hebrew, which we can only match with a character class using the notation \u .

I suggest using search commands and capture groups to make sure we capture the whole Hebrew word ( (^|[^\u0590-\u05FF])([\u0590-\u05FF]+)(?![\u0590-\u05FF]) , which ensures that there is a non-Jewish symbol or the beginning of a line before the Hebrew word - add \s if there are spaces between Hebrew words!) and \b[az\s]+\b to match a sequence of whole English words separated by spaces .

If you plan to embed <span> tags in a sentence around whole words, here is a function that might help:

 var str = 'so היי all whats up אתכם?'; //var str = 'so, היי, all whats up אתכם?'; var result = str.replace(/\s*(\b[az\s]+\b)\s*/ig, '<span>$1</span>'); result = result.replace(/(^|[^\u0590-\u05FF])([\u0590-\u05FF]+)(?![\u0590-\u05FF])/g, '$1<span>$2</span>'); document.getElementById("r").innerHTML = result; 
 span { background:#FFCCCC; border:1px solid #0000FF; } 
 <div width="645" id="r"/> 

Result:

 <span>so</span><span>היי</span><span>all whats up</span><span>אתכם</span>? 

If you don't need any punctuation marks or alphanumeric objects in your release, just combine all the words in English and Hebrew, and then use

 var str = 'היי, User234, so 222היי all whats up אתכם?'; var re = /(^|[^\u0590-\u05FF])([\u0590-\u05FF]+)(?![\u0590-\u05FF])|(\b[az\s]+\b)/ig; var res = []; while ((m = re.exec(str)) !== null) { if (m.index === re.lastIndex) { re.lastIndex++; } if (m[1] !== undefined) { res.push('<span>'+m[2].trim()+'</span>'); } else { res.push('<span>'+m[3].trim()+'</span>'); } } document.getElementById("r").innerHTML = res.join(""); 
 span { background:#FFCCCC; border:1px solid #0000FF; } 
 <div width="645" id="r"/> 

Result:

 <span>היי</span><span>so</span><span>היי</span><span>all whats up</span><span>אתכם</span> 
+9
source share

I think the regex you want is similar to [^az^\u0591-\u05F4^\s] . I'm not quite sure how you want to handle spaces.

My decision

Copy str to the new var res , replacing any characters that are not AZ / Hebrew.
Turn all English characters (az) into str and wrap them in span using res.replace .
Do the same for the Hebrew characters.

It's not exactly 100%, but it seems to work quite well IMO.

 var str = 'so היי all whats up אתכם?'; var finalStr = str.replace(/([^az^\u0591-\u05F4^\s])/gi, ''); var rgx = /([az ]+)/gi; var mat = str.match(rgx); for(var i=0; i < mat.length; ++i){ var match = mat[i]; finalStr = finalStr.replace(match.trim(),'<span>'+match.trim()+'</span>'); } rgx = /([\u0591-\u05F4 ]+)/gi; var mat = str.match(rgx); for(var i=0; i < mat.length; ++i){ var match = mat[i]; finalStr = finalStr.replace(match.trim(),'<span>'+match.trim()+'</span>'); } document.getElementById('res').innerHTML = finalStr; 

http://jsfiddle.net/daveSalomon/0ns6nuxy/1/

+1
source share

Judging by this post , you can try something like this: ((?:\s*\w+)+|(?:\s*[\u0590-\u05FF]+)+?(?=\s?[A-Za-z0-9!?.])) Https://regex101.com/r/kA3yV5/4

You may need to edit it for your specific cases (for example, if some non-word characters begin to appear), but this does the trick. He tries to match words and sentence forms from the list of English characters, if it does not work, he tries to make words / sentences from the list of characters in Hebrew until the English character appears again.

This is not ideal yet, since you can add other punctuation characters, and you do not need some places in 1st position (since javascript does not support lookbehinds, I did not find a way to remove them in place, but they can be in position 1 and deleted from line)

0
source share

All Articles