I conducted a clustering test on bypass pages (more than 25 thousand documents, a set of personal data). I did clusterdump:
$MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-1/ --output clusteranalyze.txt
At the output, after starting the cluster damper, 25 "VL-xxxxx {}" elements are displayed:
VL-24130{n=1312 c=[0:0.017, 10:0.007, 11:0.005, 14:0.017, 31:0.016, 35:0.006, 41:0.010, 43:0.008, 52:0.005, 59:0.010, 68:0.037, 72:0.056, 87:0.028, ... ] r=[0:0.442, 10:0.271, 11:0.198, 14:0.369, 31:0.421, ... ]} ... VL-24868{n=311 c=[0:0.042, 11:0.016, 17:0.046, 72:0.014, 96:0.044, 118:0.015, 135:0.016, 195:0.017, 318:0.040, 319:0.037, 320:0.036, 330:0.030, ...] ] r=[0:0.740, 11:0.287, 17:0.576, 72:0.239, 96:0.549, 118:0.273, ...]}
How to interpret this conclusion?
In short: I am looking for document identifiers belonging to a particular cluster.
What's the point:
- Vl-x?
- n = yc = [z: z ', ...]
- r = [z '': z '' ', ...]
Does 0: 0.017 mean that “0” is the identifier of the document that belongs to this cluster?
I already read mahout on wiki pages, which means CL, n, c and r. But can someone please explain them to me better or point to a resource where this is explained in more detail?
Sorry if I ask some stupid questions, but I am new to wih apache mahout and use it as part of my course for clustering.