What the hell am I joining the party. Here is my version:
Flatten/@Flatten[Thread/@ Transpose@ {
It should be fast enough, I think.
EDIT
In response to the criticism @ Mr.Wizard (my first decision was to reorder the list) and to study a little the high-performance angle of the problem, here are two alternative solutions:
getMeans[e_] := Module[{temp = ConstantArray[0, Max[#[[All, 1, 1]]]]}, temp[[#[[All, 1, 1]]]] = Mean /@ #[[All, All, 2]]; List /@ temp[[e[[All, 1]]]]] &[GatherBy[e, First]]; getMeansSparse[e_] := Module[{temp = SparseArray[{Max[#[[All, 1, 1]]] -> 0}]}, temp[[#[[All, 1, 1]]]] = Mean /@ #[[All, All, 2]]; List /@ Normal@temp [[e[[All, 1]]]]] &[GatherBy[e, First]];
The first is the fastest trading memory for speed and can be used when the keys are integers, and the maximum value of the "key" (2 in your example) is not too large. The second solution does not contain the last limitation, but slower. Here is a great list of pairs:
In[303]:= tst = RandomSample[
The keys here can be from 1 to 1000, of which 500, and for each key - 300 random numbers. Now a few tests:
In[314]:= (res0 = getMeans[tst]); // Timing Out[314]= {0.109, Null} In[317]:= (res1 = getMeansSparse[tst]); // Timing Out[317]= {0.219, Null} In[318]:= (res2 = tst[[All, {1}]] /. Reap[Sow[#2, #] & @@@ tst, _, # -> Mean@ #2 &][[2]]); // Timing Out[318]= {5.687, Null} In[319]:= (res3 = tst[[All, {1}]] /. Dispatch[ Reap[Sow[#2, #] & @@@ tst, _, # -> Mean@ #2 &][[2]]]); // Timing Out[319]= {0.391, Null} In[320]:= res0 === res1 === res2 === res3 Out[320]= True
We see that getMeans is the fastest here, getMeansSparse second fastest, and @ Mr.Wizard's solution is slightly slower, but only when we use Dispatch , otherwise it is much slower. Mine and @ Mr.Wizard's solutions (with Dispatch) are similar in spirit, the difference in speed is due to the fact that array indexing (more rare) is more efficient than hash search. Of course, all this matters only when your list is really big.
EDIT 2
Here is a version of getMeans that uses Compile with a C object and returns numeric values โโ(rather than rational ones). This is about twice as fast as getMeans , and the fastest of my solutions.
getMeansComp = Compile[{{e, _Integer, 2}}, Module[{keys = e[[All, 1]], values = e[[All, 2]], sums = {0.} , lengths = {0}, , i = 1, means = {0.} , max = 0, key = -1 , len = Length[e]}, max = Max[keys]; sums = Table[0., {max}]; lengths = Table[0, {max}]; means = sums; Do[key = keys[[i]]; sums[[key]] += values[[i]]; lengths[[key]]++, {i, len}]; means = sums/(lengths + (1 - Unitize[lengths])); means[[keys]]], CompilationTarget -> "C", RuntimeOptions -> "Speed"] getMeansC[e_] := List /@ getMeansComp[e];
Code 1 - Unitize[lengths] protects against division by zero for unused keys. We need each number in a separate sublist, so we should call getMeansC , not getMeansComp directly. Here are a few measurements:
In[180]:= (res1 = getMeans[tst]); // Timing Out[180]= {0.11, Null} In[181]:= (res2 = getMeansC[tst]); // Timing Out[181]= {0.062, Null} In[182]:= N@res1 == res2 Out[182]= True
This can probably be considered as a highly optimized numerical solution. The fact that @ Mr.Wizardโs completely general, concise and beautiful solution is about 6-8 times slower speaks very well for the last general short solution, so if you donโt want to squeeze every microsecond out of it, I would stick with @Mr. Wizard alone (with Dispatch ). But itโs important to know how to optimize the code, and also to what extent it can be optimized (what you can expect).