How to get an "English name" for a character?

Question

How to get an "English name" for a character?

I just used this most useful link: How to check if a given line is a legal / valid file name under Windows?

And inside some verification code there is something similar (ignore the fact that I do not use the StringBuilder class and ignore the error when composing the message (you don’t need to tell them about “Colon” more than once if it appears in the line more than once)):

string InvalidFileNameChars = new string(Path.GetInvalidFileNameChars()); Regex ContainsABadChar = new Regex("[" + Regex.Escape(InvalidFileNameChars) + "]"); MatchCollection BadChars = ContainsABadChar.Matches(txtFileName.Text); if (BadChars.Count > 0) { string Msg = "The following invalid characters were detected:\r\n\r\n"; foreach (Match Bad in BadChars) { Msg += Bad.Value + "\r\n"; } MessageBox.Show(Msg, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error); return; }

This MessageBox will look something like this (using the example in which the colon is found):

- to begin -

The following invalid characters were found:

:

- end -

I would like to say something like:

- to begin -

The following invalid characters were found:

Colon →:

- end -

I like to have an English name. Not a killer, but it was curious if there was any function there (which does not exist for the Char class, but may exist in some other class that I do not think about):

Char.GetEnglishName (':');

+7

c # .net char character

Justooking Jan 19 '12 at 18:29

source share

4 answers

The problem you are facing is that you should be able to represent a Unicode space that will be large. If you really want this, drop the contents of this page into the dictionary, then use this extension method on char:

 public static string ToName(this char c) { string result = ""; // or "unknown" or null or whatever _charToName.TryGetValue(c, out result); return result; } // ... string name = c.ToName();

+4

plinth Jan 19 '12 at 18:37

source share

I compiled a dictionary of symbol names, which I compiled from various sources, for a personal tool that I made to search for Unicode characters: http://jumpingfishes.com/unicodechars.htm

The dictionary is expressed as an array of JavaScript and contains 20,761 definitions. Feel free to take my JavaScript to create a C # dictionary:
http://jumpingfishes.com/unicodeDescriptions.js

Edit: Better yet, here is the text file I used to create my JavaScript. This might be a little easier than parsing to create a C # dictionary. It contains the character code in hexadecimal format, followed by a tab, and then a description of the character.
http://jumpingfishes.com/unicodeDictionary.txt

+1

gilly3 Jan 19 '12 at 19:23

source share

as stated in the answer to this question Search for a Unicode character name in .Net by @ rik-hemsley

Now it's easier than ever, since there is a package in nuget called Unicode Information

With this, you can simply call:

 UnicodeInfo.GetName(character)

+1

Jjs Feb 23 '15 at 1:32

source share

Ryan emerle · Accepted Answer · 2012-01-19T19:10:24+0000

You can simply use the basic Latin basic and control the unicode block if you don't need to consider every character, ever.

You can define a table as a simple string array to quickly search:

 string[] lookup = new string[128]; lookup[0x00]="Null character"; lookup[0x01]="Start of Heading"; lookup[0x02]="Start of Text"; lookup[0x03]="End-of-text character"; lookup[0x04]="End-of-transmission character"; lookup[0x05]="Enquiry character"; lookup[0x06]="Acknowledge character"; lookup[0x07]="Bell character"; lookup[0x08]="Backspace"; lookup[0x09]="Horizontal tab"; lookup[0x0A]="Line feed"; lookup[0x0B]="Vertical tab"; lookup[0x0C]="Form feed"; lookup[0x0D]="Carriage return"; lookup[0x0E]="Shift Out"; lookup[0x0F]="Shift In"; lookup[0x10]="Data Link Escape"; lookup[0x11]="Device Control 1"; lookup[0x12]="Device Control 2"; lookup[0x13]="Device Control 3"; lookup[0x14]="Device Control 4"; lookup[0x15]="Negative-acknowledge character"; lookup[0x16]="Synchronous Idle"; lookup[0x17]="End of Transmission Block"; lookup[0x18]="Cancel character"; lookup[0x19]="End of Medium"; lookup[0x1A]="Substitute character"; lookup[0x1B]="Escape character"; lookup[0x1C]="File Separator"; lookup[0x1D]="Group Separator"; lookup[0x1E]="Record Separator"; lookup[0x1F]="Unit Separator"; lookup[0x20]="Space"; lookup[0x21]="Exclamation mark"; lookup[0x22]="Quotation mark"; lookup[0x23]="Number sign"; lookup[0x24]="Dollar sign"; lookup[0x25]="Percent sign"; lookup[0x26]="Ampersand"; lookup[0x27]="Apostrophe"; lookup[0x28]="Left parenthesis"; lookup[0x29]="Right parenthesis"; lookup[0x2A]="Asterisk"; lookup[0x2B]="Plus sign"; lookup[0x2C]="Comma"; lookup[0x2D]="Hyphen-minus"; lookup[0x2E]="Full stop"; lookup[0x2F]="Slash"; lookup[0x30]="Digit Zero"; lookup[0x31]="Digit One"; lookup[0x32]="Digit Two"; lookup[0x33]="Digit Three"; lookup[0x34]="Digit Four"; lookup[0x35]="Digit Five"; lookup[0x36]="Digit Six"; lookup[0x37]="Digit Seven"; lookup[0x38]="Digit Eight"; lookup[0x39]="Digit Nine"; lookup[0x3A]="Colon"; lookup[0x3B]="Semicolon"; lookup[0x3C]="Less-than sign"; lookup[0x3D]="Equal sign"; lookup[0x3E]="Greater-than sign"; lookup[0x3F]="Question mark"; lookup[0x40]="At sign"; lookup[0x41]="Latin Capital letter A"; lookup[0x42]="Latin Capital letter B"; lookup[0x43]="Latin Capital letter C"; lookup[0x44]="Latin Capital letter D"; lookup[0x45]="Latin Capital letter E"; lookup[0x46]="Latin Capital letter F"; lookup[0x47]="Latin Capital letter G"; lookup[0x48]="Latin Capital letter H"; lookup[0x49]="Latin Capital letter I"; lookup[0x4A]="Latin Capital letter J"; lookup[0x4B]="Latin Capital letter K"; lookup[0x4C]="Latin Capital letter L"; lookup[0x4D]="Latin Capital letter M"; lookup[0x4E]="Latin Capital letter N"; lookup[0x4F]="Latin Capital letter O"; lookup[0x50]="Latin Capital letter P"; lookup[0x51]="Latin Capital letter Q"; lookup[0x52]="Latin Capital letter R"; lookup[0x53]="Latin Capital letter S"; lookup[0x54]="Latin Capital letter T"; lookup[0x55]="Latin Capital letter U"; lookup[0x56]="Latin Capital letter V"; lookup[0x57]="Latin Capital letter W"; lookup[0x58]="Latin Capital letter X"; lookup[0x59]="Latin Capital letter Y"; lookup[0x5A]="Latin Capital letter Z"; lookup[0x5B]="Left Square Bracket"; lookup[0x5C]="Backslash"; lookup[0x5D]="Right Square Bracket"; lookup[0x5E]="Circumflex accent"; lookup[0x5F]="Low line"; lookup[0x60]="Grave accent"; lookup[0x61]="Latin Small Letter A"; lookup[0x62]="Latin Small Letter B"; lookup[0x63]="Latin Small Letter C"; lookup[0x64]="Latin Small Letter D"; lookup[0x65]="Latin Small Letter E"; lookup[0x66]="Latin Small Letter F"; lookup[0x67]="Latin Small Letter G"; lookup[0x68]="Latin Small Letter H"; lookup[0x69]="Latin Small Letter I"; lookup[0x6A]="Latin Small Letter J"; lookup[0x6B]="Latin Small Letter K"; lookup[0x6C]="Latin Small Letter L"; lookup[0x6D]="Latin Small Letter M"; lookup[0x6E]="Latin Small Letter N"; lookup[0x6F]="Latin Small Letter O"; lookup[0x70]="Latin Small Letter P"; lookup[0x71]="Latin Small Letter Q"; lookup[0x72]="Latin Small Letter R"; lookup[0x73]="Latin Small Letter S"; lookup[0x74]="Latin Small Letter T"; lookup[0x75]="Latin Small Letter U"; lookup[0x76]="Latin Small Letter V"; lookup[0x77]="Latin Small Letter W"; lookup[0x78]="Latin Small Letter X"; lookup[0x79]="Latin Small Letter Y"; lookup[0x7A]="Latin Small Letter Z"; lookup[0x7B]="Left Curly Bracket"; lookup[0x7C]="Vertical bar"; lookup[0x7D]="Right Curly Bracket"; lookup[0x7E]="Tilde"; lookup[0x7F]="Delete";

Then all you have to do is:

 var englishName = lookup[(int)'~'];

Or:

  public static string ToEnglishName(this char c) { int i = (int)c; if( i < lookup.Length ) return lookup[i]; return "Unknown"; } var name = ':'.ToEnglishName(); // Colon

How to get an "English name" for a character?

More articles: