I am trying to execute kmeans clustering algorithm from apache Spark mlib library. I have all the settings, but I'm not quite sure how I can start formatting the input. I am relatively new to machine learning, so any help would be greatly appreciated. In the data.txt sample, the data is as follows:
0.0 0.0 0.0
0.1 0.1 0.1
0.2 0.2 0.2
9.0 9.0 9.0
9.1 9.1 9.1
9.2 9.2 9.2
And the data I want to run the algorithm is in this format (json array):
[{"customer":"ddf6022","order_id":"20031-19958","asset_id":"dd1~33","price":300,"time":1411134115000,"location":"bt2"},{"customer":"ddf6023","order_id":"23899-23825","asset_id":"dd1~33","price":300,"time":1411954672000,"location":"bt2"}]
How can I convert it to something that can be used with the k-mean clustering algorithm? I use Java, I also assume that I need it to be in JavaRDD format, but I don’t know how to do it.