What is the “symbol” in Julia?

In particular: I am trying to use the Julia DataFrames package, in particular the readtable () function with the names parameter, but this requires a character vector.

  • What is a symbol?
  • why do they choose this over a row vector?

So far, I have found only a few references to the word symbol in Julia's language. It seems that the characters are represented by ": var", but it is far from clear to me what they are.

Except: I can run

df = readtable( "table.txt", names = [symbol("var1"), symbol("var2")] ) 

My two labeled questions still stand.

+106
julia-lang julia
May 05 '14 at 19:58
source share
1 answer

The characters in Julia are the same as in Lisp, Scheme, or Ruby. However, the answers to these related questions are not really satisfactory , in my opinion. If you read these answers, it seems that the reason the character is different from the string is because the strings are mutable and the characters are immutable, and the characters are also “interned” - whatever that means. Strings do change in Ruby and Lisp, but they are not in Julia, and this difference is actually a red herring. The fact that the characters are interned, i.e. Hashed by a language implementation for quick comparison comparisons, is also an irrelevant implementation detail. You may have an implementation that is not trainee characters, and the language will be exactly the same.

So what is a symbol? The answer is that Julia and Lisp have a common ability - to represent language code as a data structure in the language itself. Some people call this "homoiconicity" ( Wikipedia ), but others hope. It seems that one seems to be enough for the language to be homocyconic. But the terminology doesn't really matter. The fact is that when a language can represent its own code, it needs a way to represent things such as assignments, function calls, things that can be written as literal values, etc. He also needs a way to represent his own variables. You need a way to represent - as data - foo on the left side of this:

 foo == "foo" 

Now we get to the bottom of the question: the difference between a character and a string is the difference between foo on the left side of this comparison and "foo" on the right side. On the left, foo is an identifier, and it evaluates the value associated with the foo variable in the current scope. To the right, "foo" is a string literal, and it evaluates the string value of "foo". The symbol of both Lisp and Julia is how you represent a variable as data. The string simply represents itself. You can see the difference by applying eval to them:

 julia> eval(:foo) ERROR: foo not defined julia> foo = "hello" "hello" julia> eval(:foo) "hello" julia> eval("foo") "foo" 

What is evaluated by the symbol :foo depends on what - if anything - is associated with the variable foo , while "foo" always just evaluates the value of "foo". If you want to build expressions in Julia that use variables, you use characters (whether you know it or not). For example:

 julia> ex = :(foo = "bar") :(foo = "bar") julia> dump(ex) Expr head: Symbol = args: Array{Any}((2,)) 1: Symbol foo 2: String "bar" typ: Any 

Something that unloaded shows, among other things, that there is an object :foo character inside the expression object that you get by quoting the code foo = "bar" . Here is another example: building an expression with a :foo symbol stored in a sym variable:

 julia> sym = :foo :foo julia> eval(sym) "hello" julia> ex = :($sym = "bar"; 1 + 2) :(begin foo = "bar" 1 + 2 end) julia> eval(ex) 3 julia> foo "bar" 

If you try to do this when sym bound to the string "foo" , this will not work:

 julia> sym = "foo" "foo" julia> ex = :($sym = "bar"; 1 + 2) :(begin "foo" = "bar" 1 + 2 end) julia> eval(ex) ERROR: syntax: invalid assignment location ""foo"" 

It’s pretty clear why this doesn’t work - if you tried to set "foo" = "bar" manually, it won’t work either.

This is the essence of the symbol: the symbol is used to represent a variable in metaprogramming. Of course, if you have characters as a data type, it becomes tempting to use them for other things, such as hash keys. But this is an occasional, opportunistic use of a data type that has a different underlying purpose.

Note that I stopped talking about Ruby a while ago. This is because Ruby is not homoiconic: Ruby does not represent its expressions as Ruby objects. Thus, the Ruby symbol type is a kind of rudimentary organ - a residual adaptation inherited from Lisp but no longer used for its original purpose. Ruby characters were co-opted for other purposes — as hash keys to pull methods from method tables, but characters in Ruby are not used to represent variables.

As for the characters that are used in DataFrames, and not in rows, this is because a common template in DataFrames associates the column values ​​with the variables inside the expressions provided by the user. Therefore, it is natural that column names are characters, because characters are exactly what you use to represent variables as data. Currently, you need to write df[:foo] to access the foo column, but in the future you can access it as df.foo . When this is possible, only columns whose names are valid identifiers will be available with this convenient syntax.

See also:

+191
May 05 '14 at 21:30
source share



All Articles