To give an introductory answer to my own question, I suggest using the list of distances between each point in the form as a metric for clustering.
Let me create some shapes:
shapes = np.array([[[1,4],[4,2],[11,2],[14,0]], [[4,5],[4,2],[11,2],[13,0]], [[1,3],[4,2],[11,2],[14,1.5]], [[3,5],[4,2],[10,7],[7,9]], [[5,5],[4,2],[10,7],[6,6]]]) def random_rotation(): theta = 3 * np.pi * np.random.random() rotMatrix = numpy.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]) return rotMatrix new_shapes = [] for s in shapes: rr = random_rotation() new_shapes += [[list(rr.dot(p)) + [0] for p in s]] new_shapes = np.array(new_shapes) for i, s in enumerate(new_shapes): plot(s[:,0], s[:,1], color='black') text(np.mean(s[:,0]), np.mean(s[:,1]), str(i), fontsize=14)

Then we create some helper functions and create an array containing all the inter-vertex distances for each shape ( darray ).
import itertools as it def vec_distance(v1, v2): ''' The distance between two vectors. ''' diff = v2 - v1 return math.sqrt(sum(diff * diff)) def distances(s): ''' Compute the distance array for a shape s. ''' ds = [vec_distance(p1, p2) for p1,p2 in it.combinations(s, r=2)] return np.array(ds)
Group them in two clusters using Pycluster .
import Pycluster as pc clust = pc.kcluster(darray,2) print clust
And let's see that we get three records in the first cluster, and two in the other.
(array([0, 0, 0, 1, 1], dtype=int32), 4.576996142441375, 1)
But what forms do they correspond to?
import brewer2mpl dark2 = brewer2mpl.get_map('Dark2', 'qualitative', 4).mpl_colors for i,(s,c) in enumerate(zip(new_shapes, clust[0])): plot(s[:,0], s[:,1], color=dark2[c]) text(np.mean(s[:,0]), np.mean(s[:,1]), str(i), fontsize=14)

Looks nice! The problem is that as the size increases, the dimensional array grows in quadratic time with respect to the number of vertices. I found a presentation that describes this problem and offers some solutions (e.g. SVD for what I believe is a form of dimensional reduction) to speed it up.
I am not going to accept my answer yet, because I am interested in any other ideas or thoughts on how to approach this simple problem.