The US Census Bureau uses a special encoding called "soundex" to search for information about a person. Soundex is an encoding of surnames (surnames) based on how the surname sounds, and not how it is written. Surnames that sound the same but differ in different ways, for example SMITH and SMYTH, have the same code and are filed together. The soundex coding system was designed so that you can find the last name, even if it may have been written under various spellings.
In this lab, you will develop, code, and document a program that creates a soundex code when entering with a last name. The user will be asked to indicate the last name, and the program should display the appropriate code.
Basic Soundex Encoding Rules
Each sound coding of a surname consists of a letter and three numbers. The letter used is always the first letter of the last name. Numbers are assigned to the remaining letters of the surname in accordance with the soundex manual below. Zero values ββare added at the end, if necessary, to always create a four-character code. Additional letters are not counted.
Soundex Coding Guide
Soundex assigns a number for different consonants. Consonants that sound the same are assigned the same number:
Number of consonants
1 B, F, P, V 2 C, G, J, K, Q, S, X, Z 3 D, T 4 L 5 M, N 6 R
Soundex ignores the letters A, E, I, O, U, H, W, and Y.
There are 3 additional rules for encoding sounds. A good program design will implement them as one or more separate functions.
Rule 1. Names with double letters
If the surname has double letters, they should be considered as one letter. For example:
- Gutierrez is encoded by G362 (G, 3 for T, 6 for the first R, second R is ignored, 2 for Z).
Rule 2. Names with letters side by side that have the same Soundex code
If the last name has different letters side by side, which have the same number in the audio coding guide, they should be treated as one letter. Examples:
Pfister is encoded as P236 (P, F is ignored because it is considered the same as P, 2 for S, 3 for T, 6 for R).
Jackson is encoded as J250 (J, 2 for C, K is ignored just like C, S is ignored just like C, 5 for adding N, 0).
Rule 3. Consonant Separators
3.a. If the vowel (A, E, I, O, U) separates two consonants that have the same soundex code, the consonant is encoded to the right of the vowel. Example:
- Tymczak is encoded as T-522 (T, 5 for M, 2 for C, Z is ignored (see the Side by Side rule above), 2 for K). Since the vowel "A" separates Z and K, K. is encoded.
3.b. If "H" or "W" is shared by two consonants that have the same soundex code, the consonant on the right is not encoded. Example:
* Ashcraft is encoded by A261 (A, 2 for S, C is ignored just like S with H in between, 6 for R, 1 for F). It is not encoded A226.
So far this is my code:
surname = raw_input("Please enter surname:") outstring = "" outstring = outstring + surname[0] for i in range (1, len(surname)): nextletter = surname[i] if nextletter in ['B','F','P','V']: outstring = outstring + '1' elif nextletter in ['C','G','J','K','Q','S','X','Z']: outstring = outstring + '2' elif nextletter in ['D','T']: outstring = outstring + '3' elif nextletter in ['L']: outstring = outstring + '4' elif nextletter in ['M','N']: outstring = outstring + '5' elif nextletter in ['R']: outstring = outstring + '6' print outstring
The code does enough of what they are asking for, I'm just not sure how to encode the three rules. This is where I need help. So any help is appreciated.