There are many ways to do this, but the one that I like best is to create a set of functions for each object and then match it in the image.
You can use SIFT to create a vector of key points for each object. Approaching SIFT to each picture, yo will receive a set of descriptors for each image (say, an image, an object, ...).
When you get the image you want to process, use FAST to detect points and cvMatchTemplate () for each different set of descriptors. The one who is most likely to tell you who objected to you. If all the probabilities are too low, then you probably don't have an image in the image.
This is just one approach that I like, but it is quite modern, accurate, fast.
source share