Your method of “looking to see if the colors between the white plate and the black text match” basically searches for areas where the pixel intensity changes from black to white and vice versa many times. Edge detection can do pretty much the same thing. However, implementing your own methods is still a good idea, because you will learn a lot in this process. Heck, why not get around and compare the results of your method with the finished edge compilation algorithm?
At some point, you will need to have a binary image, for example, with black pixels corresponding to a non-symbol label, and white pixels corresponding to an is-a-character label. Perhaps the easiest way is to use the threshold function. But it will work well if the characters have already been emphasized in some way.
As mentioned in your other thread, you can do this using the black hat operator, which leads to something like this:

If you have a threshold image higher, say, with the Otsu method (which automatically determines the global threshold level), you will get the following:

There are several ways to clear this image. For example, you can find connected components and throw away those that are too small, too large, too wide, or too tall to be a character:

Since the characters in your image are relatively large and fully connected, this method works well.
You can then filter out the remaining components based on the properties of the neighbors until you get the required number of components (= number of characters). If you want to recognize a character, you can calculate the functions for each character and enter them in the classifier, which is usually created with supervised learning.
All of the above steps are just one way to do this, of course.
By the way, I generated the images above using OpenCV + Python, which is a great combination for computer vision.