Best way to count unique items?

I have a very long DataArray of rows, and I would like to create a DataFrame in which one column is all unique rows and the second is the number of occurrences. Right now i'm doing something like

using DataFrames df = DataFrame() df[:B]=[ "a", "c", "c", "D", "E"] uniqueB = unique(df[:B]) println(uniqueB) howMany=zeros(size(uniqueB)) for i=1:size(uniqueB,1) howMany[i] = count(j->(j==uniqueB[i]), df[:B]) end answer = DataFrame() answer[:Letters] = uniqueB answer[:howMany] = howMany answer 

but it seems that there should be a much simpler way to do this, possibly with a single line. (I know that I could do this a little faster with a bit more code, looking for the result at each iteration, and not at the source.) Perhaps the related question is here , but it doesn't seem like the histogram is overloaded for non-numeric bunkers. Any thoughts?

+7
julia-lang
source share
1 answer

If you need a full frame, you can group by B and call nrow for each group:

 julia> by(df, :B, nrow) 4x2 DataFrames.DataFrame | Row | B | x1 | |-----|-----|----| | 1 | "D" | 1 | | 2 | "E" | 1 | | 3 | "a" | 1 | | 4 | "c" | 2 | 

Even outside the context of a DataFrame, you can always use DataStructures.counter , rather than overriding it yourself:

 julia> using DataStructures julia> counter(df[:B]) DataStructures.Accumulator{ASCIIString,Int32}(Dict("D"=>1,"a"=>1,"c"=>2,"E"=>1)) 
+7
source share

All Articles