To get the best answer, you need to better limit the scope of your application. Here is something that can help you. I believe that your “shitty picture” input is always similar to the one you provided in the sense that it has strong edges and the color present on it doesn't matter. To solve (or, better, come closer to a solution) your problem in a simple way, you need to describe both images in terms of descriptors of scale invariants.
I take it upon myself: binarize both images, count the number of connected components (CC) inside both, drop the CCs of irrelevant size (too far from the median, average, stddev related, etc., you decide). You might want to supplement the second step to better distinguish your image from other inputs, that is, the more powerful you want your approach to be, the more discriminant descriptors you will need. At some point, you may also consider using SVM or other machine learning methods.
So, the binarization step: perform a morphological gradient and reset the weak gradient. It is very simple if the entries are similar to what was published. Here's what I get with a threshold with an intensity of 60 (I also assume that your input is in the range [0, 255]):


I quickly experimented with thresholds of up to 90, and they all worked for these images. Crop them easily, and you can also fill the background and object:


Now you can extract the connected components in white and analyze them. In this case, the easiest way to count them. For these inputs, we get 12 in the “perfect” image and 14 in the “bad” one. But, in the "bad", we have 2 components of size 1 (in each of them there is only one pixel), which are trivially eliminated. There are many other ways to compare connected components, but I hope this can get you started. If you need code for these tasks, I can enable it.
source share