Recursive data structures in rust

Question

Recursive data structures in rust

I am trying to implement a script-like data structure in Rust.

I would like to get the equivalent of this C ++ code:

struct Node { Node* parent; // should be mutable, and nullable (no parent) std::vector<Node*> child; virtual ~Node() { for(auto it = child.begin(); it != child.end(); ++it) { delete *it; } } void addNode(Node* newNode) { if (newNode->parent) { newNode->parent.child.erase(newNode->parent.child.find(newNode)); } newNode->parent = this; child.push_back(newNode); } }

Expressed in reliable rust:

with a parent who takes responsibility for their children
with externally accessible nodes through a link of some kind

EDIT

Thanks to everyone for the useful templates that I did not know. I think I can use them to solve this problem, I need some time to digest them.

I was unable to specify another restriction:

I need events that relate to one Node in order to potentially mutate the entire tree.

So, I think the example from DK is closest to what I had in mind.

+6

rust

nulleight Mar 22 '16 at 23:32

source share

3 answers

Francis gagné · Answer 1 · 2016-03-23T02:47:05+0000

Rust tries to keep your memory safe by preventing you from doing things that could be potentially unsafe. Since this analysis is performed at compile time, the compiler can only speculate on a subset of manipulations that are known to be safe.

In Rust, you can easily save either a reference to the parent (borrowing the parent, thereby preventing mutations), or a list of child nodes (by owning them, which gives you more freedom), but not both (without using unsafe ). This is especially problematic for your addNode implementation, which requires modified access to the given node parent. However, if you store the parent pointer as a mutable reference, then since only one mutable reference to a specific object can be used at a time, the only way to access the parent is through the child of the node, and you can only have one child of the node. otherwise, you have two mutable links to the same parent node.

If you want to avoid unsafe code, there are many alternatives, but all of them will require some sacrifice.

The simplest solution is to simply remove the parent field. We can define a separate data structure in order to remember the parent element of the node while we are crossing the tree, rather than saving it in the node itself.

First, define Node :

 #[derive(Debug)] struct Node<T> { data: T, children: Vec<Node<T>>, } impl<T> Node<T> { fn new(data: T) -> Node<T> { Node { data: data, children: vec![] } } fn add_child(&mut self, child: Node<T>) { self.children.push(child); } }

(I added the data field because the tree is not super useful without data in nodes!)

Now let's define another structure for tracking the parent during navigation:

 #[derive(Debug)] struct NavigableNode<'a, T: 'a> { node: &'a Node<T>, parent: Option<&'a NavigableNode<'a, T>>, } impl<'a, T> NavigableNode<'a, T> { fn child(&self, index: usize) -> NavigableNode<T> { NavigableNode { node: &self.node.children[index], parent: Some(self) } } } impl<T> Node<T> { fn navigate<'a>(&'a self) -> NavigableNode<T> { NavigableNode { node: self, parent: None } } }

This solution works fine if you don’t need to mutate the tree while moving it, and you can save NavigableNode objects around it (which works great for a recursive algorithm, but doesn't work too well if you want to save NavigableNode in some other data structure and save it). The second limitation can be alleviated by using something other than a borrowed pointer to hold the parent; if you want maximum versatility, you can use Borrow trait to allow direct values, borrowed pointers, Box es, Rc , etc.

Now let's talk about lightning . In functional programming, lightnings are used to "focus" on a particular element of the data structure (list, tree, map, etc.), so access to this element takes a constant time, while retaining all the data in this data structure. If you need to move around the tree and mutate while navigating while still being able to navigate the tree, you can turn the tree into a zipper and make changes through the zipper. p>

Here we can implement the zipper for Node described above:

 #[derive(Debug)] struct NodeZipper<T> { node: Node<T>, parent: Option<Box<NodeZipper<T>>>, index_in_parent: usize, } impl<T> NodeZipper<T> { fn child(mut self, index: usize) -> NodeZipper<T> { // Remove the specified child from the node children. // A NodeZipper shouldn't let its users inspect its parent, // since we mutate the parents // to move the focused nodes out of their list of children. // We use swap_remove() for efficiency. let child = self.node.children.swap_remove(index); // Return a new NodeZipper focused on the specified child. NodeZipper { node: child, parent: Some(Box::new(self)), index_in_parent: index, } } fn parent(self) -> NodeZipper<T> { // Destructure this NodeZipper let NodeZipper { node, parent, index_in_parent } = self; // Destructure the parent NodeZipper let NodeZipper { node: mut parent_node, parent: parent_parent, index_in_parent: parent_index_in_parent, } = *parent.unwrap(); // Insert the node of this NodeZipper back in its parent. // Since we used swap_remove() to remove the child, // we need to do the opposite of that. parent_node.children.push(node); let len = parent_node.children.len(); parent_node.children.swap(index_in_parent, len - 1); // Return a new NodeZipper focused on the parent. NodeZipper { node: parent_node, parent: parent_parent, index_in_parent: parent_index_in_parent, } } fn finish(mut self) -> Node<T> { while let Some(_) = self.parent { self = self.parent(); } self.node } } impl<T> Node<T> { fn zipper(self) -> NodeZipper<T> { NodeZipper { node: self, parent: None, index_in_parent: 0 } } }

To use this lightning, you need to own the root of the tree node. By observing nodes, lightning can move things around to avoid copying or cloning nodes. When we move the zipper, we actually discard the old zipper and create a new one (although we could also do this by changing self , but I thought it was more understandable, plus this allows you to chain method calls).

If the above parameters are not satisfactory, and you should completely save the parent element of the node in node, then the next best option is to use Rc<RefCell<Node<T>>> to refer to the parent and children. Rc provides sharing, but adds overhead to perform reference counting at runtime. RefCell provides internal volatility, but adds overhead to track active roles at runtime. See DK. answer for implementation using Rc and RefCell .

DK. · Answer 2 · 2016-03-23T02:31:03+0000

The problem is that this data structure is inherently unsafe; it has no direct equivalent in Rust, which does not use unsafe . This is by design.

If you want to translate this into a secure Rust code, you need to know more specifically what exactly you want from it. I know that you specified some properties above, but often people coming to Rust will say: "I want everything I have in this C / C ++ code," to which the direct answer is "well, you can’t "

You also inevitably have to change how you approach this. The given example has pointers without semantics of ownership, mutable smoothing and loops; all of which Rust won't let you just ignore how C ++ does.

The simplest solution is to simply get rid of the parent pointer and maintain it externally (for example, the path to the file system). This also goes well with borrowing, as there are no loops anywhere:

 pub struct Node1 { children: Vec<Node1>, }

If you need parent pointers, you can go halfway and use identifiers instead:

 use std::collections::BTreeMap; type Id = usize; pub struct Tree { descendants: BTreeMap<Id, Node2>, root: Option<Id>, } pub struct Node2 { parent: Id, children: Vec<Id>, }

BTreeMap effectively is your "address space", bypassing the problems of borrowing and smoothing, without directly using memory addresses.

Of course, this introduces the problem that the given Id not tied to a specific tree, which means that the node to which it belongs can be destroyed, and now you have what is actually a dangling pointer. But, what is the price you pay for the overlay and mutation. It is also less direct.

Or, you can go whole burs and use link checking and dynamic borrow checking:

 use std::cell::RefCell; use std::rc::{Rc, Weak}; // Note: do not derive Clone to make this move-only. pub struct Node3(Rc<RefCell<Node3_>>); pub type WeakNode3 = Weak<RefCell<Node3>>; pub struct Node3_ { parent: Option<WeakNode3>, children: Vec<Node3>, } impl Node3 { pub fn add(&self, node: Node3) { // No need to remove from old parent; move semantics mean that must have // already been done. (node.0).borrow_mut().parent = Some(Rc::downgrade(&self.0)); self.children.push(node); } }

Here you can use Node3 to transfer ownership of the node between parts of the tree and WeakNode3 for external links. Or you can make Node3 cloneable and add the logic back to add to make sure that the given node did not accidentally leave the child of the wrong parent.

This is not entirely better than the second option, because this design is absolutely unable to take advantage of static borrowing verification. The second one can at least prevent you from mutating the graph from two places at the same time during compilation; here, if that happens, you just crash.

The fact is that you cannot just have everything. You must decide which operations you really need to support. At this point, this is usually just a case of selecting types that give you the necessary properties.

Shepmaster · Answer 3 · 2017-07-14T18:38:08+0000

In some cases, you can also use the arena. The arena guarantees that the values stored in it will have the same lifespan as the arena itself. This means adding more values will not invalidate any existing lives, but moving the arena will. Thus, such a solution is not viable if you need to return the tree.

This solves the problem by removing ownership from the nodes themselves.

Here is an example that also uses internal mutability to allow w ^ 30> to be mutated after its creation. In other cases, you can remove this variability if the tree is built once and then just moves.

 extern crate typed_arena; use std::cell::{Cell, RefCell}; use std::fmt; use typed_arena::Arena; struct Tree<'a, T: 'a> { nodes: Arena<Node<'a, T>>, } impl<'a, T> Tree<'a, T> { fn new() -> Tree<'a, T> { Self { nodes: Arena::new() } } fn new_node(&'a self, data: T) -> &'a mut Node<'a, T> { self.nodes.alloc(Node { data, tree: self, parent: Cell::new(None), children: RefCell::new(Vec::new()), }) } } struct Node<'a, T: 'a> { data: T, tree: &'a Tree<'a, T>, parent: Cell<Option<&'a Node<'a, T>>>, children: RefCell<Vec<&'a Node<'a, T>>>, } impl<'a, T> Node<'a, T> { fn add_node(&'a self, data: T) -> &'a Node<'a, T> { let child = self.tree.new_node(data); child.parent.set(Some(self)); self.children.borrow_mut().push(child); child } } impl<'a, T> fmt::Debug for Node<'a, T> where T: fmt::Debug, { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { write!(f, "{:?}", self.data)?; write!(f, " (")?; for c in self.children.borrow().iter() { write!(f, "{:?}, ", c)?; } write!(f, ")") } } fn main() { let tree = Tree::new(); let head = tree.new_node(1); let left = head.add_node(2); let right = head.add_node(3); println!("{:?}", head); // 1 (2 (), 3 (), ) }

Recursive data structures in rust

More articles: