This problem is a one-dimensional instance of the k-medians problem, which can be formulated as follows. Given the set of points x_1 ... x_n, we divide these points into k sets S_1 ... S_k and choose k locations y_1 ... y_k in such a way as to minimize the sum over all x_i | x_i - y_f (i) |, where y_f (i) is the corresponding location of the set to which x_i is assigned.
Due to the fact that the median is the population minimizer for the absolute distance (i.e., the norm L_1) , it follows that each location y_j will be the median of the elements x in the corresponding set S_j (hence the name of the k-median). Since you are looking at integer values, it is technically possible that if S_j contains an even number of elements, the median may not be an integer, but in such cases, choosing either the next integer above or below the median will give the same amount of absolute distances.
The standard heuristic for solving k-medians (and the related and more general k-means problem) is iterative, but it is not guaranteed to get an optimal or even good solution. The solution of the k-median problem for general metric spaces is NP-complicated, and finding effective approximations for k-medians is an open research problem. For example, the approximation of "k-medians" in Google, for example, will lead to a handful of articles giving approximation schemes. http://www.cis.upenn.edu/~sudipto/mypapers/kmedian_jcss.pdf http://graphics.stanford.edu/courses/cs468-06-winter/Papers/arr-clustering.pdf
In one dimension, everything becomes simpler, and you can use a dynamic programming approach. The DP solution for the related one-dimensional k-means problem is described in this article , and the source code in R is available here . See the document for details, but the idea is essentially the same as that proposed by @SajalJain, and it can be easily adapted to solve the problem of k-medians, not k-means. For j <= k and m <= n, let D (j, m) denote the cost of the optimal solution of j-medians for x_1 ... x_m, where x_i is assumed in sorted order. We have recurrence
D(j,m) = min (D(j-1,q) + Cost(x_{q+1},...,x_m)
where q varies from j-1 to m-1 and Cost is equal to the sum of the absolute distances from the median. With a naive implementation of O (n) Cost this would provide an O (n ^ 3k) DP solution for the entire problem. However, this can be improved to O (n ^ 2k) due to the fact that the cost can be updated in constant time, and not be calculated from scratch every time, using the fact that for an ordered sequence:
Cost(x_1,...,x_h) = Cost(x_2,...,x_h) + median(x_1...x_h)-x_1 if h is odd Cost(x_1,...,x_h) = Cost(x_2,...,x_h) + median(x_2...x_h)-x_1 if h is even
See the entry for more details. Except for the fact that updating the Cost function is different, the implementation will be the same for k-medians, as for k-means. http://journal.r-project.org/archive/2011-2/RJournal_2011-2_Wang+Song.pdf