Convert string to tree structure in python

I have a line in python like this:

line a line b line ba line bb line bba line bc line c line ca line caa line d 

You can get this idea. In fact, it takes a very similar form to the python code itself, since there is a line, and below this line, attachments indicate the part of the block headed by the very last line of the lesser indentation.

What I need to do is parse this code in a tree structure, so that each root level line is a dictionary key, and its value is a dictionary representing all sublins. therefore above would be:

 { 'line a' => {}, 'line b' => { 'line ba' => {}, 'line bb' => { 'line bba' => {} }, 'line bc' => {} }, 'line c' => { 'line ca' => { 'line caa' => {} }, }, 'line d' => {} } 

here is what i have:

 def parse_message_to_tree(message): buf = StringIO(message) return parse_message_to_tree_helper(buf, 0) def parse_message_to_tree_helper(buf, prev): ret = {} for line in buf: line = line.rstrip() index = len(line) - len(line.lstrip()) print (line + " => " + str(index)) if index > prev: ret[line.strip()] = parse_message_to_tree_helper(buf, index) else: ret[line.strip()] = {} return ret 

The listing shows the lines that remained separated and the indices 0. I did not think that lstrip() was a mutator, but in any case, the index must be accurate.

Any suggestions are helpful.

EDIT: Not sure what happened before, but I tried again, and he is closer to work, but still not quite right. Here is what I have now:

 {'line a': {}, 'line b': {}, 'line ba': {'line bb': {}, 'line bba': {'line bc': {}, 'line c': {}, 'line ca': {}, 'line caa': {}, 'line d': {}}}} 
+5
source share
3 answers

As previously noted, str.lstrip() not a mutator, the index also becomes accurate on my system.

But the problem is that by the time you realize that the index for the row has increased, line actually points to the enlarged index row, for example, in the first case, we note that the index for the row is incremented by line ba , so line indicates on line ba and then in your if state you do -

 ret[line.strip()] = parse_message_to_tree_helper(buf, index) 

This is not true because you must set everything that returns from parse_message_to_tree_helper() to line ba , and not its actual parent.

In addition, after you recurs inside the function, you do not exit if the file has not been completely read, but the level at which a particular line is stored in the dictionary depends on what it leaves the recursion when the indent has decreased.

I'm not sure if there are built-in libraries that will help you do this, but the code I could come up with (based on your code) -

 def parse_message_to_tree(message): buf = StringIO(message) return parse_message_to_tree_helper(buf, 0, None)[0] def parse_message_to_tree_helper(buf, prev, prevline): ret = {} index = -1 for line in buf: line = line.rstrip() index = len(line) - len(line.lstrip()) print (line + " => " + str(index)) if index > prev: ret[prevline.strip()],prevline,index = parse_message_to_tree_helper(buf, index, line) if index < prev: return ret,prevline,index continue elif not prevline: ret[line.strip()] = {} else: ret[prevline.strip()] = {} if index < prev: return ret,line,index prevline = line if index == -1: ret[prevline.strip()] = {} return ret,None,index if prev == index: ret[prevline.strip()] = {} return ret,None,0 

Example / Demo -

 >>> print(s) line a line b line ba line bb line bba line bc line c line ca line caa >>> def parse_message_to_tree(message): ... buf = StringIO(message) ... return parse_message_to_tree_helper(buf, 0, None)[0] ... >>> def parse_message_to_tree_helper(buf, prev, prevline): ... ret = {} ... index = -1 ... for line in buf: ... line = line.rstrip() ... index = len(line) - len(line.lstrip()) ... print (line + " => " + str(index)) ... if index > prev: ... ret[prevline.strip()],prevline,index = parse_message_to_tree_helper(buf, index, line) ... if index < prev: ... return ret,prevline,index ... continue ... elif not prevline: ... ret[line.strip()] = {} ... else: ... ret[prevline.strip()] = {} ... if index < prev: ... return ret,line,index ... prevline = line ... if index == -1: ... ret[prevline.strip()] = {} ... return ret,None,index ... if prev == index: ... ret[prevline.strip()] = {} ... return ret,None,0 ... >>> pprint.pprint(parse_message_to_tree(s)) line a => 0 line b => 0 line ba => 2 line bb => 2 line bba => 4 line bc => 2 line c => 0 line ca => 2 line caa => 4 {'line a': {}, 'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}}, 'line c': {'line ca': {'line caa': {}}}} >>> s = """line a ... line b ... line ba ... line bb ... line bba ... line bc ... line c ... line ca ... line caa ... line d""" >>> pprint.pprint(parse_message_to_tree(s)) line a => 0 line b => 0 line ba => 2 line bb => 2 line bba => 4 line bc => 2 line c => 0 line ca => 2 line caa => 4 line d => 0 {'line a': {}, 'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}}, 'line c': {'line ca': {'line caa': {}}}, 'line d': {}} 

You will need to check the code for any errors or some missing cases.

+3
source

lstrip() not a mutator, see documentation :

string.lstrip (s [, chars])

Returns a copy of a string with leading characters removed. If characters are omitted or None, whitespace is removed. If given and not None, the characters must be a string; characters in the line will be deleted from the beginning of the line, this method.

And your code seems to be working with this sample text on my machine.

+1
source

Another answer using stack instead of recursion. It took several iterations for this version, and it seems to handle several possible input scenarios, but cannot guarantee a complete absence of errors! This is a really difficult problem. I hope my comments illustrate the correct line of thought. Thanks for sharing the issue.

 text = '''line a line b line ba line bb line bba line bc line c line ca line caa line d''' root_tree = {} stack = [] prev_indent, prev_tree = -1, root_tree for line in text.splitlines(): # compute current line indent and strip the line origlen = len(line) line = line.lstrip() indent = origlen - len(line) print indent, line # no matter what, every line has its own tree, so let create it. tree = {} # where to attach this new tree is dependent on indent, prev_indent # assume: stack[-1] was the right attach point for the previous line # then: let adjust the stack to make that true for the current line if indent < prev_indent: while stack[-1][0] >= indent: stack.pop() elif indent > prev_indent: stack.append((prev_indent, prev_tree)) # at this point: stack[-1] is the right attach point for the current line parent_indent, parent_tree = stack[-1] assert parent_indent < indent # attach the current tree parent_tree[line] = tree # update state prev_indent, prev_tree = indent, tree print len(stack) print stack print root_tree 
+1
source

All Articles