What is the easiest algorithm to avoid a single character?

I am trying to write two escape(text, delimiter) functions escape(text, delimiter) and unescape(text, delimiter) with the following properties:

  • The escape result does not contain delimiter .

  • unescape is the flip side of escape , i.e.

     unescape(escape(text, delimiter), delimiter) == text 

    for all text and delimiter values

OK to limit valid delimiter values.


Background . I want to create a string of values โ€‹โ€‹separated by a separator. To retrieve the same list from a string again, I have to make sure that the individual split strings do not contain a separator.


What I tried : I came up with a simple solution (pseudocode):

 escape(text, delimiter): return text.Replace("\", "\\").Replace(delimiter, "\d") unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\\", "\") 

but found that property 2 could not be executed in the test line "\d<delimiter>" . I currently have the following working solution

 escape(text, delimiter): return text.Replace("\", "\b").Replace(delimiter, "\d") unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\b", "\") 

which seems to work until delimiter not \ , b or d (this is normal, I don't want to use them as delimiters). However, since I have not officially proven my case, I am afraid that I have missed some case where one of the properties is violated. Since this is such a common problem, I assume that there is already a โ€œwell-known provenโ€ algorithm for this, so my question is (see name).

+7
source share
2 answers

Your first algorithm is correct.

Error in unescape () implementation: you need to replace both \d with delimiter and \\ with \ , in the same pass . You cannot use multiple calls to replace () as follows.

Here is a sample C # code for safely quoting strings separated by delimiter:

  static string QuoteSeparator(string str, char separator, char quoteChar, char otherChar) // "~" -> "~~" ";" -> "~s" { var sb = new StringBuilder(str.Length); foreach (char c in str) { if (c == quoteChar) { sb.Append(quoteChar); sb.Append(quoteChar); } else if (c == separator) { sb.Append(quoteChar); sb.Append(otherChar); } else { sb.Append(c); } } return sb.ToString(); // no separator in the result -> Join/Split is safe } static string UnquoteSeparator(string str, char separator, char quoteChar, char otherChar) // "~~" -> "~" "~s" -> ";" { var sb = new StringBuilder(str.Length); bool isQuoted = false; foreach (char c in str) { if (isQuoted) { if (c == otherChar) sb.Append(separator); else sb.Append(c); isQuoted = false; } else { if (c == quoteChar) isQuoted = true; else sb.Append(c); } } if (isQuoted) throw new ArgumentException("input string is not correctly quoted"); return sb.ToString(); // ";" are restored } /// <summary> /// Encodes the given strings as a single string. /// </summary> /// <param name="input">The strings.</param> /// <param name="separator">The separator.</param> /// <param name="quoteChar">The quote char.</param> /// <param name="otherChar">The other char.</param> /// <returns></returns> public static string QuoteAndJoin(this IEnumerable<string> input, char separator = ';', char quoteChar = '~', char otherChar = 's') { CommonHelper.CheckNullReference(input, "input"); if (separator == quoteChar || quoteChar == otherChar || separator == otherChar) throw new ArgumentException("cannot quote: ambiguous format"); return string.Join(new string(separator, 1), (from str in input select QuoteSeparator(str, separator, quoteChar, otherChar)).ToArray()); } /// <summary> /// Decodes the strings encoded in a single string. /// </summary> /// <param name="encoded">The encoded.</param> /// <param name="separator">The separator.</param> /// <param name="quoteChar">The quote char.</param> /// <param name="otherChar">The other char.</param> /// <returns></returns> public static IEnumerable<string> SplitAndUnquote(this string encoded, char separator = ';', char quoteChar = '~', char otherChar = 's') { CommonHelper.CheckNullReference(encoded, "encoded"); if (separator == quoteChar || quoteChar == otherChar || separator == otherChar) throw new ArgumentException("cannot unquote: ambiguous format"); return from s in encoded.Split(separator) select UnquoteSeparator(s, separator, quoteChar, otherChar); } 
+3
source

Perhaps you may have an alternative replacement for the case when the delimiter begins with \ , b or d . Use the same alternative replacement in unescape algorithm as well

0
source

All Articles