Vectorized in function in julia?

I often want to get stuck in a long array or column in a data framework, and for each element, see if it is a member of another array. Instead of doing

giant_list = ["a", "c", "j"] good_letters = ["a", "b"] isin = falses(size(giant_list,1)) for i=1:size(giant_list,1) isin[i] = giant_list[i] in good_letters end 

Is there any vectorized (double vectorized?) Way to do this in julia? By analogy with the main operators, I want to do something like

 isin = giant_list .in good_letters 

I understand that this may not be possible, but I just wanted to make sure that I did not miss something. I know that I could use DefaultDict from DataStructures to do this, but I know nothing in the database.

+7
vectorization dataframe julia-lang
source share
3 answers

The indexin function does something similar to what you want:

indexin(a, b)

Returns a vector containing the highest index in b for each value in a , which is a member of b . The output vector contains 0, where a not a member of b .

Since you want to have a boolean for each element in giant_list (instead of an index in good_letters ), you can simply do:

 julia> indexin(giant_list, good_letters) .> 0 3-element BitArray{1}: true false false 

The implementation of indexin very simple and points out the way to how you can optimize this if you don't care about indexes in b :

 function vectorin(a, b) bset = Set(b) [i in bset for i in a] end 

Only a limited set of names can be used as infix operators, so it cannot be used as an infix operator.

+7
source share

You can easily highlight in in Julia v0.6 using the syntax syntax .

 julia> in.(giant_list, (good_letters,)) 3-element Array{Bool,1}: true false false 

Pay attention to scalarification good_letters with a singleton tuple. Alternatively, you can use a Scalar type, such as the one introduced in StaticArrays.jl.

Julia v0.5 supports the same syntax, but requires a specialized function for scalarization (or the Scalar type mentioned earlier):

 scalar(x) = setindex!(Array{typeof(x)}(), x) 

after which

 julia> in.(giant_list, scalar(good_letters)) 3-element Array{Bool,1}: true false false 
+4
source share

findin() does not give you a logical mask, but you can easily use it to subset an array / DataFrame for the values ​​contained in another array:

 julia> giant_list[findin(giant_list, good_letters)] 1-element Array{String,1}: "a" 
+1
source share

All Articles