As previously noted, str.lstrip() not a mutator, the index also becomes accurate on my system.
But the problem is that by the time you realize that the index for the row has increased, line actually points to the enlarged index row, for example, in the first case, we note that the index for the row is incremented by line ba , so line indicates on line ba and then in your if state you do -
ret[line.strip()] = parse_message_to_tree_helper(buf, index)
This is not true because you must set everything that returns from parse_message_to_tree_helper() to line ba , and not its actual parent.
In addition, after you recurs inside the function, you do not exit if the file has not been completely read, but the level at which a particular line is stored in the dictionary depends on what it leaves the recursion when the indent has decreased.
I'm not sure if there are built-in libraries that will help you do this, but the code I could come up with (based on your code) -
def parse_message_to_tree(message): buf = StringIO(message) return parse_message_to_tree_helper(buf, 0, None)[0] def parse_message_to_tree_helper(buf, prev, prevline): ret = {} index = -1 for line in buf: line = line.rstrip() index = len(line) - len(line.lstrip()) print (line + " => " + str(index)) if index > prev: ret[prevline.strip()],prevline,index = parse_message_to_tree_helper(buf, index, line) if index < prev: return ret,prevline,index continue elif not prevline: ret[line.strip()] = {} else: ret[prevline.strip()] = {} if index < prev: return ret,line,index prevline = line if index == -1: ret[prevline.strip()] = {} return ret,None,index if prev == index: ret[prevline.strip()] = {} return ret,None,0
Example / Demo -
>>> print(s) line a line b line ba line bb line bba line bc line c line ca line caa >>> def parse_message_to_tree(message): ... buf = StringIO(message) ... return parse_message_to_tree_helper(buf, 0, None)[0] ... >>> def parse_message_to_tree_helper(buf, prev, prevline): ... ret = {} ... index = -1 ... for line in buf: ... line = line.rstrip() ... index = len(line) - len(line.lstrip()) ... print (line + " => " + str(index)) ... if index > prev: ... ret[prevline.strip()],prevline,index = parse_message_to_tree_helper(buf, index, line) ... if index < prev: ... return ret,prevline,index ... continue ... elif not prevline: ... ret[line.strip()] = {} ... else: ... ret[prevline.strip()] = {} ... if index < prev: ... return ret,line,index ... prevline = line ... if index == -1: ... ret[prevline.strip()] = {} ... return ret,None,index ... if prev == index: ... ret[prevline.strip()] = {} ... return ret,None,0 ... >>> pprint.pprint(parse_message_to_tree(s)) line a => 0 line b => 0 line ba => 2 line bb => 2 line bba => 4 line bc => 2 line c => 0 line ca => 2 line caa => 4 {'line a': {}, 'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}}, 'line c': {'line ca': {'line caa': {}}}} >>> s = """line a ... line b ... line ba ... line bb ... line bba ... line bc ... line c ... line ca ... line caa ... line d""" >>> pprint.pprint(parse_message_to_tree(s)) line a => 0 line b => 0 line ba => 2 line bb => 2 line bba => 4 line bc => 2 line c => 0 line ca => 2 line caa => 4 line d => 0 {'line a': {}, 'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}}, 'line c': {'line ca': {'line caa': {}}}, 'line d': {}}
You will need to check the code for any errors or some missing cases.