This may be a bit of a tread, as others have already linked the wiki article to the actual cluster number definition, but I found that this article was too dense, so I thought I would give a short, intuitive answer:
In principle, for the number of clusters in the data set, there is no universal “correct” answer - the smaller the clusters, the shorter the description length, the higher the variance, and in all non-trivial data sets the variance will not completely go away if you do not have a gauss for each point, which makes clustering useless (this is a case of more general phenomena known as the “futility of free learning”: a student who does not make a priori assumptions about the identity of the target concept does not have a rational basis for classifying any identifiable instances).
Thus, you basically need to select some function of your dataset to maximize the number of clusters (see the inductive bias wiki article for some examples)
In other sad news in all such cases, detecting the number of clusters is known as NP-hard , so best of all you can expect is a good heuristic approach.
source share