Does Pharo provide tail call optimization?

Question

Does Pharo provide tail call optimization?

Integer>>#factorial implementation Integer>>#factorial in Pharo:

 factorial "Answer the factorial of the receiver." self = 0 ifTrue: [^ 1]. self > 0 ifTrue: [^ self * (self - 1) factorial]. self error: 'Not valid for negative integers'

This is a tail recursive definition. However, I can evaluate 10000 factorial without errors in the workspace.

Does Pharo perform tail call optimization under any circumstances, does it do some other optimization, or just use a very deep stack?

+7

smalltalk tail-recursion pharo

Wilfred hugs May 07 '16 at 14:33

source share

2 answers

There is no secret in Faro's execution model. Recursive fragment

 ^ self * (self - 1) factorial

which happens in the second ifTrue: is compiled into the following sequence of bytecodes:

 39 <70> self ; receiver of outer message * 40 <70> self ; receiver of inner message - 41 <76> pushConstant: 1 ; argument of self - 1 42 <B1> send: - ; subtract 43 <D0> send: factorial ; send factorial (nothing special here!) 44 <B8> send: * ; multiply 45 <7C> returnTop ; return

Note that on line 43, nothing special happens. The code simply sends factorial in the same way as if the selector were any other. In particular, we see that there are no special manipulations with the stack.

This does not mean that there can be no optimizations in the base native code. But this is another discussion. It is the execution model that matters to the programmer, since any optimization under bytecodes is designed to support this model at a conceptual level.

UPDATE

Interestingly, the non-recursive version

 factorial2 | f | f := 1. 2 to: self do: [:i | f := f * i]. ^f

slightly slower than recursive (Pharo). The reason should be that the overhead associated with increasing i is slightly larger than the recursive sending mechanism.

Here are the expressions I tried:

 [25000 factorial] timeToRun [25000 factorial2] timeToRun

+7

Leandro caniglia May 07 '16 at 15:51

source share

Bert freudenberg · Accepted Answer · 2016-05-09T15:19:56+0000

This is a really deep stack. Rather, there is no stack at all.

Pharo is a descendant of Squeak, which inherits execution semantics directly from Smalltalk-80. There is no linear stack of fixed size, instead, each method call creates a new MethodContext object, which provides space for arguments and temporary variables in each recursive call. It also points to the sending context (for later return), creating a linked list of contexts (which appears just like the stack in the debugger). Context objects are allocated on the heap, like any other object. This means that call chains can be very deep, since all available memory can be used. You can check thisContext to see the currently active method context.

Highlighting all of these context objects is expensive. For speed, modern virtual machines (such as the Cog VM used by Pharo) actually use the stack inside, which consists of linked pages, so it can be arbitrarily large. Context objects are created only on demand (for example, during debugging) and refer to hidden frames of the stack and vice versa. This mechanism behind the scenes is quite complicated, but, fortunately, is hidden from the Smalltalk programmer.

Does Pharo provide tail call optimization?

More articles: