Since you are not using any part of the marked data, you are using an uncontrolled method by definition.
"How can I then label clusters (given that I have a comparison sample)?
You can try various perturbations of the label set and save it to minimize the average error (or accuracy) in the comparison template. With clustering, you can mark your clusters the way you like. Think about it, for example, try different label assignments until you minimize the specified performance metric.
"Am I trying to turn this into a controlled learning problem when I do this?"
It depends. If you explicitly use (known) data points during the clustering process, then this is semi-controlled. If not, you simply use the labeling information to evaluate and βcompareβ with controlled approaches. This is a form of control, but not based on a set of training, but on the best expected performance (ie, "Agent" indicates the correct labels for the clusters).
"How to create a confusion matrix on a (other) test case for comparison with a controlled algorithm?"
You need a way to turn clusters into labeled classes. For a small number of clusters (e.g. C <= 5) you could create C! matrices C! and keep one that minimizes your average classification error. In your case, however, with C = 10, this is obviously impractical and hard overhead!
Alternatively, you can mark clusters (and thus get mixing matrices) using:
- Semi-supported approaches where clusters can be labeled a priori or controlled by the process of sowing data belonging to a known cluster / class.
- Ranking or searching for distances between estimated cluster centroids and truth marks. This will give each cluster the closest or most similar label.
source share