Decoding is correct for printing

I have the following line:

=?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt=... 

which is an encoding

 [proconact-Verbesserung #279] (Neu) Stellvertretungen Benutzerrecht - andere können für andere Stellvertretungen erstellen ändern usw. dadurch ist der Schutz der Aktiviäten Mails nicht gewährt. 

I am looking for a way to decode a quoted string.

I tried:

 private static string DecodeQuotedPrintables(string input, string charSet) { Encoding enc = new ASCIIEncoding(); try { enc = Encoding.GetEncoding(charSet); } catch { enc = new UTF8Encoding(); } var occurences = new Regex(@"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline); var matches = occurences.Matches(input); foreach (Match match in matches) { try { byte[] b = new byte[match.Groups[0].Value.Length / 3]; for (int i = 0; i < match.Groups[0].Value.Length / 3; i++) { b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier); } char[] hexChar = enc.GetChars(b); input = input.Replace(match.Groups[0].Value, hexChar[0].ToString()); } catch { ;} } input = input.Replace("?=", "").Replace("=\r\n", ""); return input; } 

when i call (where s is my string)

 var x = DecodeQuotedPrintables(s, "utf-8"); 

it will return

 =?utf-8?Q?[proconact_-_Verbesserung_#_(Neu)_Stellvertretungen_Benutzerrecht_-_andere_können_für_andere_Stellvertretungen_erstellen_ändern_usw._dadurch_ist_der_Schutz_der_Aktiviäten_Mails_nicht_gewährt=... 

What can I do to remove _ and start =?utf-8?Q? and final =.. ?

+4
source share
5 answers

The text you are trying to decode is usually found in the MIME headers and is encoded according to the specification defined in the following Internet standard: RFC 2047: MIME (Internet Multipurpose Email Extensions). Part Three: Message header extensions for non-ASCII text .

There is an example implementation of such a decoder on GitHub; maybe you can extract some ideas from it: RFC2047 decoder in C # .

You can also use this online tool to compare your results: MIME Internet Header Decoder .

Please note that your sample text is incorrect. The specification states:

 encoded-word = "=?" charset "?" encoding "?" encoded-text "?=" 

According to the specification, any encoded word must end with ?= . Thus, your sample should be fixed with:

 =?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt= 

... to (scroll to the far right):

 =?utf-8?Q?=5Bproconact_=2D_Verbesserung_=23=32=37=39=5D_=28Neu=29_Stellvertretungen_Benutzerrecht_=2D_andere_k=C3=B6nnen_f=C3=BCr_andere_Stellvertretungen_erstellen_=C3=A4ndern_usw=2E_dadurch_ist_der_Schutz_der_Aktivi=C3=A4ten_Mails_nicht_gew=C3=A4hrt?= 

Strictly speaking, your pattern is also invalid because it exceeds the 75 character limit imposed on any encoded word; however, most decoders are generally tolerant of this mismatch.

+5
source

I tested 5+ code snippets and this is working: I changed part of the regex:

Testing line:

  im sistemlerimizde bak=FDm =E7al=FD=FEmas=FD yap=FDlaca=F0=FDndan; www.gib.= 

Call example:

  string encoding = "windows-1254"; string input = "im sistemlerimizde bak=FDm =E7al=FD=FEmas=FD yap=FDlaca=F0=FDndan; www.gib.="; DecodeQuotedPrintables(input, encoding); 

Code snippet:

  private static string DecodeQuotedPrintables(string input, string charSet) { System.Text.Encoding enc = System.Text.Encoding.UTF7; try { enc = Encoding.GetEncoding(charSet); } catch { enc = new UTF8Encoding(); } ////parse looking for =XX where XX is hexadecimal //var occurences = new Regex(@"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline); var occurences = new Regex("(\\=([0-9A-F][0-9A-F]))", RegexOptions.Multiline); var matches = occurences.Matches(input); foreach (Match match in matches) { try { byte[] b = new byte[match.Groups[0].Value.Length / 3]; for (int i = 0; i < match.Groups[0].Value.Length / 3; i++) { b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier); } char[] hexChar = enc.GetChars(b); input = input.Replace(match.Groups[0].Value, hexChar[0].ToString()); } catch { ;} } input = input.Replace("?=", "").Replace("=\r\n", ""); return input; } 
+2
source

Following my comment, I suggest

 private static string MessedUpUrlDecode(string input, string encoding) { Encoding enc = new ASCIIEncoding(); try { enc = Encoding.GetEncoding(charSet); } catch { enc = new UTF8Encoding(); } string messedup = input.Split('?')[3]; string cleaned = input.Replace("_", " ").Replace("=...", ".").Replace("=", "%"); return System.Web.HttpUtility.UrlDecode(cleaned, enc); } 

assuming the distortion of the original lines is consistent.

0
source

As mentioned in the standard .NET class , exists for this purpose.

 string unicodeString = "=?UTF-8?Q?YourText?="; System.Net.Mail.Attachment attachment = System.Net.Mail.Attachment.CreateAttachmentFromString("", unicodeString); Console.WriteLine(attachment.Name); 
0
source

I'm not sure how to remove

 =?utf-8?Q? 

If this appears all the time, if so, you can do this:

 input = input.Split('?')[3]; 

To get rid of the final '=', you can remove it:

 input = input.Remove(input.Length - 1); 

You can get rid of "_" by replacing it with a space:

 input = input.Replace("_", " "); 

You can use these code snippets in your DecodeQuotedPrintables function.

Hope this helps!

-1
source

Source: https://habr.com/ru/post/1410823/