Rhyme Logic Using CMU Sentence Dictionary
OK Suppose you want to use the CMU Pronouncing Dictionary data (file example: cmudict-0.7b) to create a list of all words that rhyme with "LOVE".
Here you can do this:
First, you need to learn the pronunciation of LOVE. You will find this line in the dictionary, where "LOVE" and "L AH1 V" are separated by two spaces:
LOVE L AH1 V
This suggests that the word LOVE pronounced L AH1 V
Then find the vowel phoneme with primary stress. In other words, look for the number "1" in this pronunciation. The text immediately to the left of 1 is a vowel that has primary stress ( AH ). This text and everything to the right of it is your “rhyme phonemes” (due to the lack of a better term). So rhyme phonemes for LOVE AH1 V
We did half! Now we just need to find other words whose pronunciations end in AH1 V If you are playing Notepad ++, try to find everything in the current document for the AH1 V$ template using the Regular Expression search mode. This will match lines like:
Line 392: ABOVE AH0 B AH1 V Line 10266: BELOVE B IH0 L AH1 V Line 30204: DENEUVE D IH0 N AH1 V Line 30205: DENEUVE(1) D IY0 N AH1 V Line 34064: DOVE D AH1 V Line 48177: GLOVE GL AH1 V Line 49053: GOV G AH1 V ... etc
Rhyme woooooords!
There are many ways to implement this and many angular cases, but this is approximately the approach that seems to use many electronic rhyming dictionaries when searching for perfect rhymes.
A hypothetical SQL approach for storing rhyme data
Obviously, performance will be a problem if you just scan the dictionary every time someone wants to rhyme. If this is a concern, you can try to save or index the data in different ways.
Although not the most efficient on disk, I had good experience storing this stuff in an SQL table with indexed columns.
For a simple conceptual example, you can compute the "rhyme phonemes" of all words in the dictionary, and then insert them into the "Rhymes" table, whose columns are {WordText, RhymePhonemes}. For example, you can see entries such as:
{"ABOVE", "AH1 V"} {"DOVE", "AH1 V"} {"OUTLIVE", "IH1 V"} {"GRADUATE", "AE1 JH AH0 W AH0 T"} {"GRADUATE", "AE1 JH AH0 W EY2 T"}
... etc.
Then, to find the rhymes, you should issue a query, for example:
SELECT OTHER.WordText FROM Rhymes INPUT INNER JOIN Rhymes OTHER ON OTHER.RhymePhonemes = INPUT.RhymePhonemes WHERE INPUT.WordText = 'love' AND OTHER.WordText <> INPUT.WordText ORDER BY OTHER.WordText
This is also useful if you plan to print a dictionary where all similar words are grouped together.
There are, of course, many other ways of storing / retrieving data for various trade-offs, but hopefully this helps you get started.
I was also lucky to store the raw pronunciation in the database in various “full” formats (forward and reverse pronunciation strings, with voltage labels and without voltage signs, etc.), but not “sliced” into specific parts as phoneme rhyme columns.
Gotchas
Again, the original explanation with “love” will absolutely help you in rhyme. However, along the way, you are likely to encounter other problems that need to be considered. Here's the heads-up:
- Some words have several pronunciations. In the CMU dictionary, alternative pronunciations are marked with text such as
(1) , (2) , etc., Following the word, as in GRADUATE(2) . If someone wants to rhyme these words, you need to decide, between showing the rhymes of ALL agreed pronunciations or the user's choice, which pronunciation they really mean. - What do you do when a pronunciation has two or more "1s"? Choose the first one? Choose the last one? If you choose the latter, you will find more rhymes, but this may not be the most natural stress choice.
- What do you do when the pronunciation does not have a "1"? This does not happen so much, but it happens, for example:
ACCREDIT AH0 KR EH2 D AH0 T and AIKIN EY0 K IH0 N In this case, I would choose the next best stress (for example, select 2, 3, 4, etc., If 1 is missing). If they are all 0, I do not have good advice. - Some pronunciations are missing. This is a great start, but it doesn’t have all the words or spellings you might need. American spelling is preferable to English spelling.
- Some pronunciation is not what you expect, and you can trim. For example, there is a pronunciation of “or,” which sounds like “er.”
- You can compare "rhyme phonemes" with the removal of beats. This matters only for words whose main stress is not on the last vowel (therefore, you do not see the problem with the example of “love”).