Best way to count unique items?

Question

Best way to count unique items?

I have a very long DataArray of rows, and I would like to create a DataFrame in which one column is all unique rows and the second is the number of occurrences. Right now i'm doing something like

using DataFrames df = DataFrame() df[:B]=[ "a", "c", "c", "D", "E"] uniqueB = unique(df[:B]) println(uniqueB) howMany=zeros(size(uniqueB)) for i=1:size(uniqueB,1) howMany[i] = count(j->(j==uniqueB[i]), df[:B]) end answer = DataFrame() answer[:Letters] = uniqueB answer[:howMany] = howMany answer

but it seems that there should be a much simpler way to do this, possibly with a single line. (I know that I could do this a little faster with a bit more code, looking for the result at each iteration, and not at the source.) Perhaps the related question is here , but it doesn't seem like the histogram is overloaded for non-numeric bunkers. Any thoughts?

+7

julia-lang

ARM Apr 2 '15 at 0:20

source share

1 answer

DSM · Accepted Answer · 2015-04-02T00:32:18+0000

If you need a full frame, you can group by B and call nrow for each group:

 julia> by(df, :B, nrow) 4x2 DataFrames.DataFrame | Row | B | x1 | |-----|-----|----| | 1 | "D" | 1 | | 2 | "E" | 1 | | 3 | "a" | 1 | | 4 | "c" | 2 |

Even outside the context of a DataFrame, you can always use DataStructures.counter , rather than overriding it yourself:

 julia> using DataStructures julia> counter(df[:B]) DataStructures.Accumulator{ASCIIString,Int32}(Dict("D"=>1,"a"=>1,"c"=>2,"E"=>1))

Best way to count unique items?

More articles: