Some changes to the Soundex Algorithm

This algorithm is configured to execute on the first word or until it fills in four encoded lines. For example, the result of typing "Horrible Great" is: H612. He neglects the second word, or, in other words, only the first letter of the second word is required to fill in the encoded string.

I would like to change it by taking the first word and finding its encoded string, and THEN take the second word and find its encoded string; The output should be "H614 G600". Please, I would like to know if there is a way to do this by making some changes to ** this code.
Thank you very much.

private string Soundex(string data) { StringBuilder result = new StringBuilder(); if (data != null && data.Length > 0) { string previousCode = "", currentCode = "", currentLetter = ""; result.Append(data.Substring(0, 1)); for (int i = 1; i < data.Length; i++) { currentLetter = data.Substring(i,1).ToLower(); currentCode = ""; if ("bfpv".IndexOf(currentLetter) > -1) currentCode = "1"; else if ("cgjkqsxz".IndexOf(currentLetter) > -1) currentCode = "2"; else if ("dt".IndexOf(currentLetter) > -1) currentCode = "3"; else if (currentLetter == "l") currentCode = "4"; else if ("mn".IndexOf(currentLetter) > -1) currentCode = "5"; else if (currentLetter == "r") currentCode = "6"; if (currentCode != previousCode) result.Append(currentCode); if (result.Length == 4) break; if (currentCode != "") previousCode = currentCode; } } if (result.Length < 4) result.Append(new String('0', 4 - result.Length)); return result.ToString().ToUpper(); } 
+7
source share
3 answers

Of course, here is the solution I came across. I wrapped the existing algorithm with another method that breaks the lines and calls the original method. To use this, you called SoundexByWord ("Horrible Great") instead of calling Soundex ("Horrible Great") and received the output "H614 G630".

 private string SoundexByWord(string data) { var soundexes = new List<string>(); foreach(var str in data.Split(' ')){ soundexes.Add(Soundex(str)); } #if Net35OrLower // string.Join in .Net 3.5 and before require the second parameter to be an array. return string.Join(" ", soundexes.ToArray()); #endif // string.Join in .Net 4 has an overload that takes IEnumerable<string> return string.Join(" ", soundexes); } 
+4
source

yes - first parse the string into an array of words (after selecting a separator)

then do it on every word

then we collect the results in some acceptable way and return.

0
source

The implementation in the question is correct, but creates excess garbage with string operations. It implements the Char -array implementation, which is faster and creates very little garbage. It is designed as an extension method and processes phrases (words separated by spaces):

  public static String Soundex( this String input ) { var words = input.Split( ' ' ); var result = new String[ words.Length ]; for( var i = 0; i < words.Length; i++ ) result[ i ] = words[ i ].SoundexWord(); return String.Join( ",", result ); } private static String SoundexWord( this String input ) { var result = new Char[ 4 ] { '0', '0', '0', '0' }; var inputArray = input.ToUpper().ToCharArray(); if( inputArray.Length > 0 ) { var previousCode = ' '; var resultIndex = 0; result[ resultIndex ] = inputArray[ 0 ]; for( var i = 1; i < inputArray.Length; i++ ) { var currentLetter = inputArray[ i ]; var currentCode = ' '; if( "BFPV".IndexOf( currentLetter ) > -1 ) currentCode = '1'; else if( "CGJKQSXZ".IndexOf( currentLetter ) > -1 ) currentCode = '2'; else if( "DT".IndexOf( currentLetter ) > -1 ) currentCode = '3'; else if( currentLetter == 'L' ) currentCode = '4'; else if( "MN".IndexOf( currentLetter ) > -1 ) currentCode = '5'; else if( currentLetter == 'R' ) currentCode = '6'; if( currentCode != ' ' && currentCode != previousCode ) result[ ++resultIndex ] = currentCode; if( resultIndex == 3 ) break; if( currentCode != ' ' ) previousCode = currentCode; } } return new String( result ); } 
0
source

All Articles