K-method

I am trying to program a k-dimensional algorithm in Java. I calculated several arrays, each of which contains a number of coefficients. I need to use the k-means algorithm to group all this data. Do you know any implementation of this algorithm?

thanks

+4
source share
8 answers

I didnโ€™t study the code myself, but it implements a multi-threaded implementation of K-tools, an article in this JavaWorld , which looks pretty instructive.

+4
source

Classification, clustering and grouping are well-developed areas of IR. There is a very good (Java) library / software (open source) here WEKA is called. There are several clustering algorithms. Although there is a learning curve, it can be useful when you are faced with more complex problems.

+3
source

Indeed, KMeans is a very simple algorithm. Any good reason why not manually encode it? I did this in Qt and then ported the code to a regular old STL without any problems.

I am starting to be a fan of Joel's idea: no external dependencies, so please feel free to tell me what is good in most of the software that you do not control, and others have already mentioned this is not a good piece of software /

The discussion is cheap, a real man shows his code to the world: http://github.com/elcuco/data_mining_demo

I have to clear the code a bit to be more general, and the current version is not ported to STL, but this is the beginning!

+3
source

There's a very good Python implementation of clustering K-Means in Collective Intelligence Programming . I highly recommend it.

I understand that you will need to translate to Java, but that does not look too complicated.

+2
source

OpenCV is one of the most horribly written libraries I've ever had to use. Matlab, on the other hand, does this very carefully.

If you need to code it yourself, the algorithm is incredibly simple for how efficient it is.

  • Choose the number of clusters (k)
  • Make k points (they will be centroids)
  • Randomize all these location points
  • Calculate the Euclidean distance from each point to all centroids
  • Assign "Membership" of Each Point to the Nearest Centroid
  • Install new centroids by averaging the location of all points belonging to this cluster.
  • Go to 4 Until convergence is achieved or the changes made are irrelevant.
+2
source

A very old question, but I noticed that there is no mention of the Java Machine Learning Library , which has an implementation of K-Means and includes some documentation about this use.

The project is not very active, but the latest version is relatively recent (July 2012)

+2
source

It seems that everyone who sent the message forgot to mention the defect handling library: OpenCV http://sourceforge.net/projects/opencvlibrary/ . You will need to write a JNI wrapper around OpenCV C code to make KMeans work, but an added advantage would be

  1. You know the KMeans algorithm is highly optimized.
  2. OpenCV makes extensive use of your GPU, so it starts up quickly

The main disadvantage is that you will need to write a JNI wrapper. I somehow needed a template matching procedure and came across many alternatives, but I found OpenCV to be definitely the best, although I had to write a JNI wrapper for it.

+1
source
//Aim:To implement Kmeans clustering algorithm. //Program import java.util.*; class k_means { static int count1,count2,count3; static int d[]; static int k[][]; static int tempk[][]; static double m[]; static double diff[]; static int n,p; static int cal_diff(int a) // This method will determine the cluster in which an element go at a particular step. { int temp1=0; for(int i=0;i<p;++i) { if(a>m[i]) diff[i]=am[i]; else diff[i]=m[i]-a; } int val=0; double temp=diff[0]; for(int i=0;i<p;++i) { if(diff[i]<temp) { temp=diff[i]; val=i; } }//end of for loop return val; } static void cal_mean() // This method will determine intermediate mean values { for(int i=0;i<p;++i) m[i]=0; // initializing means to 0 int cnt=0; for(int i=0;i<p;++i) { cnt=0; for(int j=0;j<n-1;++j) { if(k[i][j]!=-1) { m[i]+=k[i][j]; ++cnt; }} m[i]=m[i]/cnt; } } static int check1() // This checks if previous k ie. tempk and current k are same.Used as terminating case. { for(int i=0;i<p;++i) for(int j=0;j<n;++j) if(tempk[i][j]!=k[i][j]) { return 0; } return 1; } public static void main(String args[]) { Scanner scr=new Scanner(System.in); /* Accepting number of elements */ System.out.println("Enter the number of elements "); n=scr.nextInt(); d=new int[n]; /* Accepting elements */ System.out.println("Enter "+n+" elements: "); for(int i=0;i<n;++i) d[i]=scr.nextInt(); /* Accepting num of clusters */ System.out.println("Enter the number of clusters: "); p=scr.nextInt(); /* Initialising arrays */ k=new int[p][n]; tempk=new int[p][n]; m=new double[p]; diff=new double[p]; /* Initializing m */ for(int i=0;i<p;++i) m[i]=d[i]; int temp=0; int flag=0; do { for(int i=0;i<p;++i) for(int j=0;j<n;++j) { k[i][j]=-1; } for(int i=0;i<n;++i) // for loop will cal cal_diff(int) for every element. { temp=cal_diff(d[i]); if(temp==0) k[temp][count1++]=d[i]; else if(temp==1) k[temp][count2++]=d[i]; else if(temp==2) k[temp][count3++]=d[i]; } cal_mean(); // call to method which will calculate mean at this step. flag=check1(); // check if terminating condition is satisfied. if(flag!=1) /*Take backup of k in tempk so that you can check for equivalence in next step*/ for(int i=0;i<p;++i) for(int j=0;j<n;++j) tempk[i][j]=k[i][j]; System.out.println("\n\nAt this step"); System.out.println("\nValue of clusters"); for(int i=0;i<p;++i) { System.out.print("K"+(i+1)+"{ "); for(int j=0;k[i][j]!=-1 && j<n-1;++j) System.out.print(k[i][j]+" "); System.out.println("}"); }//end of for loop System.out.println("\nValue of m "); for(int i=0;i<p;++i) System.out.print("m"+(i+1)+"="+m[i]+" "); count1=0;count2=0;count3=0; } while(flag==0); System.out.println("\n\n\nThe Final Clusters By Kmeans are as follows: "); for(int i=0;i<p;++i) { System.out.print("K"+(i+1)+"{ "); for(int j=0;k[i][j]!=-1 && j<n-1;++j) System.out.print(k[i][j]+" "); System.out.println("}"); } } } /* Enter the number of elements 8 Enter 8 elements: 2 3 6 8 12 15 18 22 Enter the number of clusters: 3 At this step Value of clusters K1{ 2 } K2{ 3 } K3{ 6 8 12 15 18 22 } Value of m m1=2.0 m2=3.0 m3=13.5 At this step Value of clusters K1{ 2 } K2{ 3 6 8 } K3{ 12 15 18 22 } Value of m m1=2.0 m2=5.666666666666667 m3=16.75 At this step Value of clusters K1{ 2 3 } K2{ 6 8 } K3{ 12 15 18 22 } Value of m m1=2.5 m2=7.0 m3=16.75 At this step Value of clusters K1{ 2 3 } K2{ 6 8 } K3{ 12 15 18 22 } Value of m m1=2.5 m2=7.0 m3=16.75 The Final Clusters By Kmeans are as follows: K1{ 2 3 } K2{ 6 8 } K3{ 12 15 18 22 } */ 
0
source

All Articles