Implementing "Generator" support in a custom language

I have a little fetish for language design, and I'm playing with my own hobby language right now. ( http://rogeralsing.com/2010/04/14/playing-with-plastic/ )

One thing that really makes my brain bleed is the “generators” and the keyword “yield”. I know that C # uses AST conversion to convert enumerator methods to statemachines.

But how does it work in other languages? Is there a way to get generator support in a language without AST transformation? For example, are languages ​​such as Python or Ruby used for AST transformations to solve this problem?

(The question is how generators are implemented under the hood in different languages, and not how to write a generator in one of them)

+3
generator yield dsl
source share
1 answer

Generators are mostly semi-shorts with some annoying limitations. So, you can implement them using half-shorts (and, of course, full coroutines).

If you do not have coroutines, you can use any of the other universal control flows. There are many control flow constructs that are “universal” in the sense that each control flow construct (including all other universal control flow constructs), including coroutines and, therefore, generators, can be (more or less) trivially converted only to this universal build.

The most famous of these is probably GOTO . With just GOTO you can build any other control flow construct: IF-THEN-ELSE , WHILE , FOR , REPEAT-UNTIL , FOREACH , exceptions, threads, subprogram calls, method calls, function calls, etc., and of course coroutines and generators.

Almost all processors support GOTO (although they usually call it jmp on the CPU). In fact, in many processors, GOTO is the only control flow construct, although at least support for calls to routines ( call ) and, perhaps, some primitive form of exception handling and / or primitive concurrency (cf. And -swap) are supported, usually also built in.

Another well-known control flow primitive is continuations. Continuations are basically a more structured, better managed, and less evil version of GOTO , especially popular in functional languages. But there are some low-level languages ​​that base their control flow on continuations, for example, Parrot Virtual Machine uses continuations for a control flow, and I think that in some research laboratory there are some processors based on continuations.

C has a kind of “crap” continuation form ( setjmp and longjmp ) that are much less efficient and less easy to use than “real” continuations, but they are powerful enough to implement generators (and, in fact, can be used to implement full continuations) .

On the Unix platform, setcontext can be used as a more efficient and higher alternative to the setjmp / longjmp level.

Another well-known control flow construct, but probably it does not matter spring, since the low-level substrate creates other control flow constructs from above are exceptions. There is paper that shows that exceptions can be more powerful than continuations, which makes exceptions essentially equivalent to GOTO and, therefore, universally powerful. In fact, exceptions are sometimes used as universal control flow constructs: the Microsoft Volta project, which compiled .NET bytecode for JavaScript, used JavaScript exceptions to implement .NET flows and generators.

Not universal, but probably powerful enough to implement generators, it's just tail call optimization. (Maybe I'm wrong. Unfortunately, I have no evidence.) I think you can turn the generator into a set of mutually tail recursive functions. I know that state machines can be implemented using tail calls, so I am sure that generators can also, since C # implements generators as state machines. (I think this works especially well with lazy pricing.)

Last but not least, in a language with a raid call stack (for example, most Smalltalks, for example), you can build almost any control flow constructs you want. (In fact, a raid call stop is basically a procedural low-level equivalent to a high-level functional continuation.)

So what do other generator implementations look like?

Lua does not have generators as such, but it has full asymmetric coroutines. The main C implementation uses setjmp / longjmp to implement them.

Ruby also does not have generators as such, but it has Enumerator s that can be used as generators. Enumerator are not part of the language, they are a library. MRI implements Enumerator using extensions, which in turn are implemented using setjmp / longjmp . YARV implements Enumerator using Fiber (since Ruby pronounces "coroutines"), and they are implemented using setjmp / longjmp . I believe that JRuby currently implements Enumerator with threads, but they want to switch to something better as soon as the JVM gets some improved control flow constructs.

Python has generators that are actually more or less full-blown coroutines. CPython implements them using setjmp / longjmp .

+5
source share

All Articles