Mathematics: Conditional List Operations

I would like to average the "Rows" columns. These are rows that have the same value in another column.

For instance:

e= {{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2}, {69, 7, 30, 38, 16, 70, 97, 50, 97, 31, 81, 96, 60, 52, 35, 6, 24, 65, 76, 100}} 

enter image description here

I would like to average all the values โ€‹โ€‹in the second column that have the same value in the first.

Here: The average value for Col 1 = 1 and Col 1 = 2

And then create a third column with the result of this operation. Therefore, the values โ€‹โ€‹in these columns should be the same for the first 10 rows of the next 10.

Thanks so much for any help you could provide!

LA

Output File Format:

enter image description here

+4
source share
5 answers

An interesting problem. This is the first thing that occurred to me:

 e[[All, {1}]] /. Reap[Sow[#2, #] & @@@ e, _, # -> Mean@ #2 &][[2]]; ArrayFlatten[{{e, %}}] // TableForm 

To round, you can simply add Round@ to Mean in the above code: Round@Mean @#2

Here is a slightly faster method, but I prefer Sow / Reap above:

 #[[1, 1]] -> Round@Mean @#[[All, 2]] & /@ GatherBy[e, First]; ArrayFlatten[{{e, e[[All, {1}]] /. %}}] // TableForm 

If the first column has many different elements, any of the above solutions can be made faster by applying Dispatch to the rule that is created, before replacing ( /. ). This command tells Mathematica to create and use an optimized internal format for the rule list.

Here is an option that is slower, but I still like to share:

 Module[{q}, Reap[{#, Sow[#2,#], q@ #} & @@@ e, _, ( q@ # = Mean@ #2) &][[1]] ] 

In addition, general tips, you can replace:

Table[RandomInteger[{1, 100}], {20}] with RandomInteger[{1, 100}, 20]

and Join[{c}, {d}] // Transpose using Transpose[{c, d}] .

+5
source

What the hell am I joining the party. Here is my version:

 Flatten/@Flatten[Thread/@ Transpose@ {#,Mean/@#[[All,All,2]]}&@GatherBy[e,First],1] 

It should be fast enough, I think.

EDIT

In response to the criticism @ Mr.Wizard (my first decision was to reorder the list) and to study a little the high-performance angle of the problem, here are two alternative solutions:

 getMeans[e_] := Module[{temp = ConstantArray[0, Max[#[[All, 1, 1]]]]}, temp[[#[[All, 1, 1]]]] = Mean /@ #[[All, All, 2]]; List /@ temp[[e[[All, 1]]]]] &[GatherBy[e, First]]; getMeansSparse[e_] := Module[{temp = SparseArray[{Max[#[[All, 1, 1]]] -> 0}]}, temp[[#[[All, 1, 1]]]] = Mean /@ #[[All, All, 2]]; List /@ Normal@temp [[e[[All, 1]]]]] &[GatherBy[e, First]]; 

The first is the fastest trading memory for speed and can be used when the keys are integers, and the maximum value of the "key" (2 in your example) is not too large. The second solution does not contain the last limitation, but slower. Here is a great list of pairs:

 In[303]:= tst = RandomSample[#, Length[#]] &@ Flatten[Map[Thread[{#, RandomInteger[{1, 100}, 300]}] &, RandomSample[Range[1000], 500]], 1]; In[310]:= Length[tst] Out[310]= 150000 In[311]:= tst[[;; 10]] Out[311]= {{947, 52}, {597, 81}, {508, 20}, {891, 81}, {414, 47}, {849, 45}, {659, 69}, {841, 29}, {700, 98}, {858, 35}} 

The keys here can be from 1 to 1000, of which 500, and for each key - 300 random numbers. Now a few tests:

 In[314]:= (res0 = getMeans[tst]); // Timing Out[314]= {0.109, Null} In[317]:= (res1 = getMeansSparse[tst]); // Timing Out[317]= {0.219, Null} In[318]:= (res2 = tst[[All, {1}]] /. Reap[Sow[#2, #] & @@@ tst, _, # -> Mean@ #2 &][[2]]); // Timing Out[318]= {5.687, Null} In[319]:= (res3 = tst[[All, {1}]] /. Dispatch[ Reap[Sow[#2, #] & @@@ tst, _, # -> Mean@ #2 &][[2]]]); // Timing Out[319]= {0.391, Null} In[320]:= res0 === res1 === res2 === res3 Out[320]= True 

We see that getMeans is the fastest here, getMeansSparse second fastest, and @ Mr.Wizard's solution is slightly slower, but only when we use Dispatch , otherwise it is much slower. Mine and @ Mr.Wizard's solutions (with Dispatch) are similar in spirit, the difference in speed is due to the fact that array indexing (more rare) is more efficient than hash search. Of course, all this matters only when your list is really big.

EDIT 2

Here is a version of getMeans that uses Compile with a C object and returns numeric values โ€‹โ€‹(rather than rational ones). This is about twice as fast as getMeans , and the fastest of my solutions.

 getMeansComp = Compile[{{e, _Integer, 2}}, Module[{keys = e[[All, 1]], values = e[[All, 2]], sums = {0.} , lengths = {0}, , i = 1, means = {0.} , max = 0, key = -1 , len = Length[e]}, max = Max[keys]; sums = Table[0., {max}]; lengths = Table[0, {max}]; means = sums; Do[key = keys[[i]]; sums[[key]] += values[[i]]; lengths[[key]]++, {i, len}]; means = sums/(lengths + (1 - Unitize[lengths])); means[[keys]]], CompilationTarget -> "C", RuntimeOptions -> "Speed"] getMeansC[e_] := List /@ getMeansComp[e]; 

Code 1 - Unitize[lengths] protects against division by zero for unused keys. We need each number in a separate sublist, so we should call getMeansC , not getMeansComp directly. Here are a few measurements:

 In[180]:= (res1 = getMeans[tst]); // Timing Out[180]= {0.11, Null} In[181]:= (res2 = getMeansC[tst]); // Timing Out[181]= {0.062, Null} In[182]:= N@res1 == res2 Out[182]= True 

This can probably be considered as a highly optimized numerical solution. The fact that @ Mr.Wizardโ€™s completely general, concise and beautiful solution is about 6-8 times slower speaks very well for the last general short solution, so if you donโ€™t want to squeeze every microsecond out of it, I would stick with @Mr. Wizard alone (with Dispatch ). But itโ€™s important to know how to optimize the code, and also to what extent it can be optimized (what you can expect).

+4
source

A naive approach could be:

 Table[ Join[ i, {Select[Mean /@ SplitBy[e, First], First@ # == First@i &][[1, 2]]}] , {i, e}] // TableForm (* 1 59 297/5 1 72 297/5 1 90 297/5 1 63 297/5 1 77 297/5 1 98 297/5 1 3 297/5 1 99 297/5 1 28 297/5 1 5 297/5 2 87 127/2 2 80 127/2 2 29 127/2 2 70 127/2 2 83 127/2 2 75 127/2 2 68 127/2 2 65 127/2 2 1 127/2 2 77 127/2 *) 

You can also create your original list using, for example:

 e = Array[{Ceiling[#/10], RandomInteger[{1, 100}]} &, {20}] 

Edit

Reply to @Mr. comments

If the list is not sorted by its first element, you can do:

 Table[Join[ i, {Select[ Mean /@ SplitBy[SortBy[e, First], First], First@ # == First@i &][[1,2]]}], {i, e}] //TableForm 

But this is not necessary in your example.

+3
source

Why not get down?

I thought this was the simplest / easiest to read answer, although not the fastest. But it's really amazing how many ways you can come up with such a problem in Mathematica.

Mr. Wizard is obviously very cool, as others have noted.

@Nasser, your solution does not generalize to n-classes, although this can easily be changed for this.

 meanbygroup[table_] := Join @@ Table[ Module[ {sublistmean}, sublistmean = Mean[sublist[[All, 2]]]; Table[Append[item, sublistmean], {item, sublist}] ] , {sublist, GatherBy[table, #[[1]] &]} ] (* On this dataset: *) meanbygroup[e] 
+2
source

Wow, the answers here are so advanced and cool that they need more time to learn them.

Here is my answer, Iโ€™m still a matrix / vector / Matlab'ish guy in the process of recovery and transition, so my solution does not work as the solution of experts here, I look at the data as matrices and vectors (easier for me than to search for them like lists of lists, etc.), so here


 sizeOfList=10; (*given from the problem, along with e vector*) m1 = Mean[e[[1;;sizeOfList,2]]]; m2 = Mean[e[[sizeOfList+1;;2 sizeOfList,2]]]; r = {Flatten[{a,b}], d , Flatten[{Table[m1,{sizeOfList}],Table[m2,{sizeOfList}]}]} //Transpose; MatrixForm[r] 

Clearly, this is not a good solution as functional.

Ok, now I will go and get rid of functional programmers :)

- Nasser

+1
source

All Articles