How does the Roslyn release version implement immutable trees?

I understand that the preview version of Roslyn implemented immutable trees, as described in this excellent blog post by Eric Lippert. However, this post ends with:

"The cost is that this system is complex and can consume a lot of memory if the" red "facades become large. We are currently conducting experiments to see if we can cut some of the costs without losing the benefits."

I would like to ask what was the result of the release version. I started exploring Roslyn sources but the code is pretty complicated.

I'm interested in low-level design results relative to the costs mentioned above.

  • What is the cost of a single edit in terms of red / green nodes?
  • What optimizations have been made for other operations (e.g. deletion, cancellation)?
+7
immutability c # tree roslyn
source share
2 answers

There is a fantastic look at Roslyn's performance and the introduction of VSadov immutable trees in the discussion forums here: https://roslyn.codeplex.com/discussions/541953

At a high level, he discusses concurrency, deduplication of green nodes, deduplication of terminals, and deduplication of strings in abstract trees.

He also speaks of laziness of red trees. Instead of calculating a mahogany after each text change, a mahogany is only calculated when someone asks for it.

Finally, he discusses mahogany and the fact that he is weak. I have never used or never seen the WeakReference class, and VSadov gives a good overview of how it is used in mahogany. In principle, the garbage collector is allowed to clean parts of mahogany and can be recreated later if necessary. I am not familiar with the implementation, but Eric Lippert notes that the red facade of a tree can lead to a large amount of memory. I believe that these WeakReferences help to some extent mitigate this problem.

+6
source share

Currently, we are still doing the same thing that Eric described. We tried some experiments, but found that the cost of API clarity is too high to pay. Instead, the main change we made was to reduce heap allocations and GC costs, turning SyntaxToken into a structure in the red model. This is a significant savings, since in average source files about half of the nodes in the syntax tree are terminals ( SyntaxToken s). Green nodes are not structures, but on the other, and since they do not contain parental pointers or positions, they are interned - we use the same instance of a green node for all the same nodes (the same little things and text).

I'm not sure what you mean by the "cost" of editing. Time / space / complexity / etc .. In general, we have an incremental lexer that rescans the area affected by editing. Then we have an incremental parser that at best revises the statement that intersects with the recently lexed tokens, and then re-builds the spine of the green tree back to the root, while reusing the rest of the green nodes in the tree. By definition, none of the red nodes can be reused. There is a new root node for the new tree, and it is accessible from any other node.

There were no other optimizations in the compiler. We reuse the cached tree in the IDE after cancellation, but I'm not sure.

+2
source share

All Articles