C #: problems using dictionary with languages ​​other than English

So, I'm basically trying to load the contents of a .txt file containing 1 word per line in the dictionary.

I had no problems with this when the words in this file were in English, but changing the file to a language with accents, I had problems.

It was necessary to change the encoding when creating the stream reader, as well as the culture in the ToLower method when adding a word to the dictionary.

Basically, I now have something similar to this:

if (!dict.ContainsKey(word.ToLower(culture))) dict.Add(word.ToLower(culture), true); 

The problem is that words like "esta" and "está" are considered the same. So, is there a way to set the ContainsKey method to a specific language, or do we need to implement something in comparable strings? Anyway, I'm new to C #, so I would try an example.

Another problem plunges into a new file ... after a hundred words stop adding the rest of the file, leaving the word incomplete ... but I don’t see any special characters in this word to finish the method, any ideas about this a problem?

Many thanks.

EDIT: 1st problem solved using hormone John Skeet.

Regarding the second problem: Well, I changed the file format to UTF8 and deleted the encoding in the stream reader, since now it recognizes accents just right. Testing some things on the second question now.

The second problem was also solved, it was a mistake on my part ... shame ...

Thks for a quick reply to everyone, and especially Jon Skeet.

+4
source share
2 answers

I assume that you are trying to get case insensitive for a dictionary. Instead of calling ToLower use the Dictionary constructor, which takes an equality ToLower , and use StringComparer.Create(culture, true) to build a suitable comparator.

I do not know what your second problem is about - we need to diagnose it in more detail, including the code that you use, ideally.

EDIT: UTF-7 is almost certainly not the correct encoding. Do not just guess the encoding; find out what it really is. Where did this text file come from? What can you open it successfully?

I suspect that at least some of your problems are related to using UTF-7.

+7
source

The problem is what you use when opening a file for reading. It looks like you can use ASCIIEncoding.

.NET processes strings inside UTF-8, so this problem will not occur internally.

+1
source

All Articles