Is it possible / useful to translate Scala to golang?

Scala native was recently released, but the garbage collector that they used (at the moment) is extremely rudimentary and makes it unsuitable for serious use.

So I wonder: why not just translate Scala to Go (a la Scala.js)? It will be fast, portable lead time. And their GC is getting better and better. Not to mention the inheritance of the large concurrency model: channels and goroutines.

  • So why did scala -native decide to go so low with LLVM?
  • What will be the catch with the golang translator?
+5
source share
1 answer

There are two types of languages ​​that are good targets for compilers:

  • Languages ​​whose semantics are closely related to the semantics of the source language.
  • Languages ​​that have very low and, therefore, very general semantics (or it can be argued: there is no semantics at all).

Examples for # 1 include: compiling ECMAScript 2015 into ECMAScript 5 (most of the language add-ons were specifically designed as syntactic sugar for existing functions, you just need to remove them), compiling CoffeeScript into ECMAScript , compiling TypeScript into ECMAScript (basically, after type checking just uninstall types, and you're done), compiling Java for JVM byte code , compiling C♯ into CLI bytecode CIL , compiling Python bytecode in CPython , compiling Python bytecode for PyPy , compiling Ruby to YARV bytecode , compiling bytecode Ruby to Rubinius code by compiling ECMAScript into SpiderMonkey bytecode .

Examples for # 2 include: machine code for general purpose CPUs ( RISC even more ), C-- , LLVM .

Scala to Go compilation does not fit either of the two. Their semantics are very different.

You need a language with powerful low-level semantics as the target language, so you can create your own semantics from above, or you need a language with close matching semantics so that you can match your own semantics with the target language.

In fact, even the JVM bytecode is already too high-level! It has constructs such as classes that do not correspond to constructs such as Scala features, so it should be quite complicated to code attributes into classes and interfaces. Similarly, before invokedynamic it was virtually impossible to imagine dynamic dispatch of structural types in JVM bytecode. The Scala compiler had to resort to reflection or, in other words, intentionally go beyond the semantics of the JVM bytecode (which led to terrible operational costs for sending methods by structural types compared to sending methods to other types of classes, although both are the same) .

Good tail calls are another example: we would like to have them in Scala, but because the JVM bytecode is not powerful enough to express them without a very complex display (basically you should refuse to use the JVM call stack in general and manage your own by a stack that destroys Java performance and compatibility), it was decided not to have them in the language.

Go has some of the same problems: to implement Scala expressive non-local flow control constructs, such as exceptions or threads, we need the same expressive non-local flow control construct that we need to map to. For typical target languages, this “expressive nonlocal flow control construct” is either a sequel or a venerable GOTO . Go has GOTO , but it is intentionally limited in its "non-locality." For people to write code, limiting the expressive power of GOTO is good, but not so much for the target compiler language.

Most likely, it is possible to establish a powerful control flow using goroutines and channels, but now we leave the convenient limitations of only matching Scala semantics to Go semantics and start building Scala high-level semantics on top of Go's high-level semantics that were not intended for such use. Goroutines were not designed as a general control flow design to create other types of flow control. This is not what they can do!

So why did scala -native decide to go so low with LLVM?

Because that's exactly what LLVM was designed and good for.

What will be the catch with the golang translator?

The semantics of the two languages ​​are too different for direct display, and the Go semantics are not intended to build another language semantics on top.

their gc is getting better and better

So maybe scala is native. As far as I understand, the choice for the current use of the Boehm-Dehmers-Weiser is basically one of laziness: it is there, it works, you can paste it into your code, and it will just do its job.

Note that changing the GC is in the discussion . There are other GCs that are designed as transitions and not closely related to the layout of the host VM object. For instance. IBM is currently in the process of restructuring J9, its high-performance JVM, into a set of loosely coupled, independently rewritable runtime components, and is releasing them under an open source licensing license.

The project is called "Eclipse OMR" ( source on GitHub ), and it is already a production process, ready: the implementation of Java 8 IBM J9 was completely built from OMR components. There is a Ruby + OMR project that demonstrates how components can be easily integrated into an existing runtime because the components themselves do not imply language semantics and do not have a specific memory or layout object. commit , which replaces the GC and adds the JIT and profiler hours of just over 10,000 lines. It is not ready for production, but it downloads and runs Rails. They also have a similar project for CPython (not yet open).

Why not just translate Scala to Go (a la Scala.js)?

Please note that Scala.JS has many of the same problems that I mentioned above. But they do it anyway, because the payoff is huge: you get access to every web browser on the planet. There is no comparable gain for the hypothetical Scala.go.

There is a reason that there are initiatives to get low-level semantics in the browser, such as asm.js and WebAssembly , precisely because compiling a high-level language to another high-level language always has this “semantic gap” that you need to overcome.

In fact, note that even for low-level languages ​​that were specifically designed as compilation targets for a particular language, you can still run into problems. For instance. Java has generics, JVM bytecode is not. Java has inner classes, JVM bytecode does not. Java has anonymous classes, JVM bytecode does not. All of them must be encoded in some way, and, in particular, the coding (or rather, not coding) of generics caused all kinds of pains.

+18
source

All Articles