Replacing specific Unicode characters in rows read from Excel

I am trying to replace some unwanted characters in a string derived from an Excel spreadsheet. The reason is that our Oracle database uses the WE8ISO8859P1 character set, which does not define several characters that Excel “useful” inserts into your text (curly quotes, em and en dashes, etc.) because I do not control the database or how Excel tables are created, I need to replace the characters with something else.

I am extracting the contents of the cell into a string like this:

string s = xlRange.get_Range("A1", Missing.Value).Value2.ToString().Trim();

Viewing a line in Visual Studio Text Visualiser shows that the text will be filled and restored correctly. Then I try to replace one of the unwanted characters (in this case, the right cubic quote character):

s = Regex.Replace(s, "\u0094", "\u0022");

But it does nothing (Text Visualiser shows that it still exists). To try and verify that the character I want to replace is actually there, I tried:

bool a = s.Contains("\u0094");

but returns false. However:

bool b = s.Contains(""");

returns true.

My (somewhat inadequate) understanding of strings in .NET is that they are encoded in UTF-16, while Excel is likely to use ANSI. Does this mean that I need to change the encoding of the text as it exits Excel? Or am I doing something else wrong? Any advice would be greatly appreciated. I read and re-read all the articles that I can find about Unicode and encoding, but I'm still not wiser.

+5
source share
2 answers

Yes, lines in .Net: UTF-16 .

; , - . "\u0094" ( , ). :

((int)"""[0]).ToString("X") "201D"

""" == "\u201D" true

"\u0094" == "" ( - ) false

UTF-16 , , (.. "\UXXXXXXXX", , ( ) "\uXXXX".). .

- :

+4

NVARCHAR NTEXT VARCHAR TEXT , . , , , Unicode.

+2

All Articles