I have listed some of the image datasets that we worked on during my PhD, but you should really find a lot of them on the Internet. From what you described, you are looking for an object recognition task or segmentation dataset using groundtruth.
You might be interested in ALOI Dataset : “ALOI is a collection of color images from thousands of objects recorded for scientific purposes. To capture sensory variation in recording objects, we systematically changed the viewing angle, the angle of illumination and the color of the backlight for each object and additionally captured wide - main stereo images. We recorded over a hundred images of each object, resulting in a collection of 110,250 images. "
A traffic sign recognition identifier may also be of interest to you. IIRC there is also a terrestrial masking of truth for road signs.
In both cases, you should be able to replace the background with what you want (if you want to make the task more complicated).
Good luck in the recognition problem (if it is still relevant).
Lockeded
source share