Find similar ASCII character in Unicode

Does anyone know how easy it is to find Unicode characters similar to ASCII characters. An example is the CYRILLIC SMALL LETTER DZE () ". I would like to search and replace similar characters. I like the same person to read. You do not see the difference when looking at him.

+7
replace unicode ascii similarity fuzzy
source share
2 answers

As other commentators note, Unicode normalization ("compatibilty characters") will not help you here, since you are not looking for official equivalences, but for similarities in glyphs (letter forms). (The Unicode related technical report is still worth reading, though, since it is very well written.)

If I were you, to save you the tedious job of collecting a list of characters, I would look for resources on campaigning about homography : This is a method to maliciously mislead web users by displaying URLs containing domain names in which some letters are replaced visually similar letters. Another Unicode Technical Report in the security section contains a section about the problem. There is also - and this may be what you most need - a "confusables" table . Here is another article with mostly punctuation characters, some of which are ASCII, which have visually similar counterparts to non-ASCII code tables .

I hope you do not ask about creating such an attack.

+11
source share

See the Unicode database: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt .

Each line describes a unicode caharacter, for example:

1E9A;LATIN SMALL LETTER A WITH RIGHT HALF RING;Ll;0;L;<compat> 0061 02BE;;;;N;;;;; 

If there are similar (compatible) characters for this character, it will appear in the <compat> field of the entry. In this example, 0061 (ASCII a ) is compatible with the LATIN SMALL LETTER A WITH RIGHT HALF RING Unicode character.

As for your character, the entry

 0455;CYRILLIC SMALL LETTER DZE;Ll;0;L;;;;;N;;;0405;;0405 

which, as you see, does not indicate a compatibility symbol.

-one
source share

All Articles