How can I combine a character and then “combine accent” with one character?

How to combine a character followed by an “accent combination” with one character?

I take the phrase that the user enters the web page and sends it to the French-English dictionary. Sometimes a dictionary search fails because there are two representations for most accented characters. For instance:

  • é can be done with one character: \xE9 (latin small letter e with a sharp).
  • But it can also be represented by two characters: e + \u0301 (combining a sharp accent).

I always want to send the first (single character) to the dictionary.

Right now, I am doing this, replacing every two-character occurrence that I find with an equivalent single character. But is there an easier (i.e., single-line) way to do this, either in JavaScript or in the browser, when extracting it will form an input field?

 function translate(phrase) { // Combine accents into a single accented character, if necessary. var TRANSFORM = [ // Acute accent. [/E\u0301/g, "\xC9"], // É [/e\u0301/g, "\xE9"], // é // Grave accent. [/a\u0300/g, "\xE0"], // à [/e\u0300/g, "\xE8"], // è [/u\u0300/g, "\xF9"], // ù // Cedilla (no combining accent). // Circumflex. [/a\u0302/g, "\xE2"], // â [/e\u0302/g, "\xEA"], // ê [/i\u0302/g, "\xEE"], // î [/o\u0302/g, "\xF4"], // ô [/u\u0302/g, "\xFB"], // û // Trema. [/e\u0308/g, "\xEB"], // ë [/i\u0308/g, "\xEF"], // ï [/u\u0308/g, "\xFC"] // ü // oe ligature (no combining accent). ]; for (var i = 0; i < TRANSFORM.length; i++) phrase = phrase.replace(TRANSFORM[i][0], TRANSFORM[i][1]); // Do translation. ... } 
+7
source share
1 answer

This is called normalization , it looks like you want NFC normalization:

Symbols are decomposed and then rearranged by canonical equivalence.

Or, in other words, it replaces any combined characters with the equivalent of one character.

This is built into ECMAScript 6 as String.prototype.normalize , so if you only support newer browsers perfectly, you can simply do the following:

 phrase = phrase.normalize('NFC'); 

To support older browsers, it looks like this library does what you want:
https://github.com/walling/unorm

Usage will be phrase = UNorm.nfc(phrase) .

+9
source

All Articles