Is the parse tree the same as the bytecode?

Question

Is the parse tree the same as the bytecode?

What is the difference between bytecode and the parse tree, in particular the one used by Perl? Do they really refer to the same concept or are there differences?

I am familiar with the concept of bytecode from Python and Java, but as I read about Perl, I learned that it supposedly runs a parse tree (instead of bytecode) in its interpreter.

If there really is a difference, what are the reasons Perl doesn't use bytecode (or Python doesn't use parsing trees)? Is it mostly historical, or are there differences between languages that require a different compilation / execution model? Can Perl (with reasonable effort and execution) implement with a bytecode interpreter?

+8

perl

lxgr May 02, '12 at 15:21

source share

2 answers

The parse tree is program tokens stored in a structure that shows their nesting (which arguments belong to which function calls, which operators are inside which cycles, etc.), while bytecode is program code, Converted to binary notation for faster execution on a virtual machine. For example, if you had the following code in an imaginary language:

 loop i from 1 to 10 { print i }

The parsing tree might look like this:

 loop variable i integer 1 integer 10 block print variable i

Meanwhile, the bytecode in source and symbolic form compiled for the stack-oriented virtual machine might look like this:

 0x01 0x01 PUSH 1 START: 0x02 DUP 0x03 PRINT 0x05 INCREMENT 0x02 DUP 0x01 0x0a PUSH 10 0x04 LESSTHAN 0x06 0xf9 JUMPCOND START

When compiling a program, you must first analyze the source code (usually by creating a parse tree), and then convert it to byte code. It’s easier to skip the second step and execute directly from the parse tree. In addition, if the syntax of the language is very hairy (for example, it allows you to change the code), then creating byte code becomes more complicated. If a function of type eval exists to execute any code, the entire compiler must be distributed with the application to use the virtual machine for such code. Enabling only the parser is easier.

Perl 6, the next version of perl, assumes that code is compiled into bytecode and runs on a Parrot virtual machine. It is expected to improve performance. Bytecode is simple enough to compile additional processor instructions (this is called a JIT compiler) to approximate the speed of compiled languages such as C.

+7

jjrv May 02, '12 at 15:25

source share

ikegami · Accepted Answer · 2012-05-02T15:35:42+0000

What Perl uses is not a parse tree , at least not how Wikipedia defines it. This is a tree of operation codes.

>perl -MO=Concise -E"for (1..10) { say $i }" g <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 49 -e:1) v:%,{,2048 ->3 f <2> leaveloop vK/2 ->g 7 <{> enteriter(next->c last->f redo->8) lKS/8 ->d - <0> ex-pushmark s ->3 - <1> ex-list lK ->6 3 <0> pushmark s ->4 4 <$> const[IV 1] s ->5 5 <$> const[IV 10] s ->6 6 <#> gv[*_] s ->7 - <1> null vK/1 ->f e <|> and(other->8) vK/1 ->f d <0> iter s ->e - <@> lineseq vK ->- 8 <;> nextstate(main 47 -e:1) v:%,2048 ->9 b <@> say vK ->c 9 <0> pushmark s ->a - <1> ex-rv2sv sK/1 ->b a <#> gvsv[*i] s ->b c <0> unstack v ->d -e syntax OK

Furthermore, although it is called a tree, it is not really a tree. Pay attention to the arrows? This is because it is actually a list-like graph of operation codes (like any other executable file).

 >perl -MO=Concise,-exec -E"for (1..10) { say $i }" 1 <0> enter 2 <;> nextstate(main 49 -e:1) v:%,{,2048 3 <0> pushmark s 4 <$> const[IV 1] s 5 <$> const[IV 10] s 6 <#> gv[*_] s 7 <{> enteriter(next->c last->f redo->8) lKS/8 d <0> iter s e <|> and(other->8) vK/1 8 <;> nextstate(main 47 -e:1) v:%,2048 9 <0> pushmark s a <#> gvsv[*i] s b <@> say vK c <0> unstack v goto d f <2> leaveloop vK/2 g <@> leave[1 ref] vKP/REFC -e syntax OK

The difference between Perl opcodes and Java bytecodes is that Java bytecodes are for serialization (stored in a file).

Is the parse tree the same as the bytecode?

More articles: