Here is one way:
DeleteDuplicates[list, First@
EDIT
Please note that the timings and discussion below are based on the M7
After thinking a bit, I found a solution that would be (at least) an order of magnitude faster for large lists, and sometimes two orders of magnitude faster for this particular case (probably the best way to say that the solution below will have different computational complexity):
Clear[delDupBy]; delDupBy[nested_List, n_Integer] := Module[{parts = nested[[All, n]], ord, unpos}, ord = Ordering[parts]; unpos = Most@Accumulate @Prepend[Map[Length, Split@parts [[ord]]], 1]; nested[[ Sort@ord [[unpos]]]]];
Landmarks:
In[406]:= largeList = RandomInteger[{1,15},{50000,2}]; In[407]:= delDupBy[largeList,1]//Timing Out[407]= {0.016,{{13,4},{12,1},{1,6},{6,13},{10,12},{7,15},{8,14}, {14,4},{4,1},{11,9},{5,11},{15,4},{2,7},{3,2},{9,12}}} In[408]:= DeleteDuplicates[largeList, First@
This is especially noteworthy because DeleteDuplicates is a built-in function. I can make a blind assumption that DeleteDuplicates with a user test uses a pair-square-time comparison algorithm, and delDupBy n*log n in the size of the list.
I think this is an important lesson: when using custom tests, you should pay attention to built-in functions such as Union , Sort , DeleteDuplicates , etc. I discussed this in more detail in this Mathgroup Thread, where there are other insightful answers as well.
Finally, let me mention that it was this question that was asked (with a focus on efficiency) before here . I will reproduce here the solution that I gave for the case when the first (or, generally speaking, n -th) elements are positive integers (generalization to arbitrary integers is simple) .:
Clear[sparseArrayElements]; sparseArrayElements[HoldPattern[SparseArray[u___]]] := {u}[[4, 3]] Clear[deleteDuplicatesBy]; Options[deleteDuplicatesBy] = {Ordered -> True, Threshold -> 1000000}; deleteDuplicatesBy[data_List, n_Integer, opts___?OptionQ] := Module[{fdata = data[[All, n]], parr, rlen = Range[Length[data], 1, -1], preserveOrder = Ordered /. Flatten[{opts}] /. Options[deleteDuplicatesBy], threshold = Threshold /. Flatten[{opts}] /. Options[deleteDuplicatesBy], dim}, dim = Max[fdata]; parr = If[dim < threshold, Table[0, {dim}], SparseArray[{}, dim, 0]]; parr[[fdata[[rlen]]]] = rlen; parr = sparseArrayElements@If [dim < threshold, SparseArray@parr , parr]; data[[If[preserveOrder, Sort@parr , parr]]] ];
How it works is to use the first (or, as a rule, n -th) elements as positions in some huge tables, which we pre-distribute using the fact that they are positive integers). In some cases, this can lead to crazy results. Note:
In[423]:= hugeList = RandomInteger[{1,1000},{500000,2}]; In[424]:= delDupBy[hugeList,1]//Short//Timing Out[424]= {0.219,{{153,549},{887,328},{731,825},<<994>>,{986,150},{92,581},{988,147}}} In[430]:= deleteDuplicatesBy[hugeList,1]//Short//Timing Out[430]= {0.032,{{153,549},{887,328},{731,825},<<994>>,{986,150},{92,581},{988,147}}}