Convert a string to a tree view using rules

I have to do a simple text parsing in RTF format, I need to fix iss. Given the following line:

{aaaaaaa\}aaaa\{aaaaa{bbbbbbbb{ccccc\{cccc}bbb{eeeee}{{gggg}ffff}bbbbbb}aaaaa} 

Where:

 \ means ignore next character { means expand } means collapse up to parent 

At any point in the line, any previous character can affect the state, with the exception of characters in private tags. for example {gggg} will not affect ffff, but aaaaaaa} aaa .. will affect bbbb, ccc, eee, ggg, fff , etc.

From this we can divide it into only meaningful blocks

 A1 = aaaaaaa\}aaaa\{aaaaa B1 = bbbbbbbb C = ccccc\{cccc B2 = bbb E = eeeee G = gggg F = ffff B3 = bbbbbb A2 = aaaaa 

Yielding:

 {A1{B1{C}B2{E}{{G}F}B3}A2} 

To describe the dependency, I used X> Y means that Y depends on X (as in X it can change the value of Y)

 A1 A1 > B1 A1 > B1 > C A1 > B1 > B2 A1 > B1 > B2 > E A1 > B1 > B2 > G A1 > B1 > B2 > F A1 > B1 > B2 > B3 A1 > B1 > B2 > A2 A1 > A2 

So, if we have a node that can make a difference and an ordered list of helper values. To make the value tree look like this:

 A1 - B1 - - C - - B2 - - - E - - - G - - - F - - - B3 - A2 

Then, to get characters that affect any node, I can simply step over each parent recursively.

What I keep getting stuck is trying to parse a string in my node class:

 public class myNode { public myNode Parent; public string Value; public List<myNode> subNodes; } 

I read a string character by character when I encounter \ I am incremented by two. When I encounter { , I save the previous text section as a node value and am in a child, and when I encounter } I, we exit down.

But I keep spoiling the logic, especially for G and A2 . This is easy to do on paper, but when I try to make the actual logic to go down, I continue to mess it up.

Is there a more direct way to make this structure? (or is there a better structure I should use). I would think that there should be some library that allows you to convert strings to trees, but I can not find them.

+7
source share
1 answer

Use the state machine approach, where the state is the current node and exit flag:

 string rtf = @"{aaaaaaa\}aaaa\{aaaaa{bbbbbbbb{ccccc\{cccc}bbb{eeeee}{{gggg}ffff}bbbbbb}aaaaa}"; Node root = new Node { Parent = null, Value = "root", SubNodes = new List<Node>() }; Node node = root; bool escape = false; foreach (char c in rtf) { if (escape) { node.Value += c; escape = false; } else { switch (c) { case '{': node = new Node { Parent = node, Value = String.Empty, SubNodes = new List<Node>() }; node.Parent.SubNodes.Add(node); break; case '}': node = new Node { Parent = node.Parent.Parent, Value = String.Empty, SubNodes = new List<Node>() }; if (node.Parent != null) node.Parent.SubNodes.Add(node); break; case '\\': escape = true; break; default: node.Value += c; break; } } } PrintNode(root, String.Empty); 

Node class (just renamed):

 public class Node { public Node Parent; public string Value; public List<Node> SubNodes; } 

To display:

 private static void PrintNode(Node node, string level) { if (node.Value.Length > 0) Console.WriteLine(level + node.Value); foreach (Node n in node.SubNodes) { PrintNode(n, level + " "); } } 

Output:

 root aaaaaaa}aaaa{aaaaa bbbbbbbb ccccc{cccc bbb eeeee gggg ffff bbbbbb aaaaa 

Note that the G Node is not a child of an E node, but a child of a Node with an empty value.

Then, of course, you also need to add error handling.

+5
source

All Articles