Extract string inside nested brackets

I need to extract lines from nested brackets as follows:

[ this is [ hello [ who ] [what ] from the other side ] slim shady ] 

Result (Order does not matter) :

 This is slim shady Hello from the other side Who What 

Note that a string can have N brackets, and they will always be valid, but may or may not be nested. In addition, the line should not begin with a bracket.

The solutions I found on the Internet on a similar problem offer a regex, but I'm not sure if it will work in this case.

I was thinking of implementing this, similar to how we check if a string has all valid parentheses:

Go through the line. If we see [we push its index onto the stack, if we see], we adjust from there to the current place.

However, we need to remove this substring from the original string so that we do not get it as part of any of the outputs. So, instead of just pushing the index onto the stack, I was thinking of creating a LinkedList as we move forward, and when we find [we put this Node into the LinkedList. This will allow us to easily remove the substring from LinkedList.

Would this be a good approach or is there a cleaner, more well-known solution?

EDIT:

 '[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]' 

Must be returned (Order does not matter) :

 this is slim shady hello from the other who what side oh my g a w d 

White spaces do not matter which are trivial for subsequent removal. It is important to be able to distinguish between different contents in parentheses. Either separating them in new lines, or having a list of lines.

+5
source share
4 answers

This code scans text by character and pushes the empty list stack list the stack for each open [ and pops the last pressed list from the stack for each close ] .

 text = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]' def parse(text): stack = [] for char in text: if char == '[': #stack push stack.append([]) elif char == ']': yield ''.join(stack.pop()) else: #stack peek stack[-1].append(char) print(tuple(parse(text))) 

Output;

 (' who ', 'what ', ' hello from the other side ', ' this is slim shady ') (' who ', 'what ', 'side', ' hello from the other ', ' this is slim shady ', 'd', 'w', 'a', 'g', 'oh my ') 
+5
source

This can be comfortably solved using a regular expression:

 import re s= '[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]' result= [] pattern= r'\[([^[\]]*)\]' #regex pattern to find non-nested square brackets while '[' in s: #while brackets remain result.extend(re.findall(pattern, s)) #find them all and add them to the list s= re.sub(pattern, '', s) #then remove them result= filter(None, (t.strip() for t in result)) #strip whitespace and drop empty strings #result: ['who', 'what', 'side', 'd', 'hello from the other', 'w', 'this is slim shady', 'a', 'g', 'oh my'] 
+5
source

You can represent your matches using a tree structure.

 class BracketMatch: def __init__(self, refstr, parent=None, start=-1, end=-1): self.parent = parent self.start = start self.end = end self.refstr = refstr self.nested_matches = [] def __str__(self): cur_index = self.start+1 result = "" if self.start == -1 or self.end == -1: return "" for child_match in self.nested_matches: if child_match.start != -1 and child_match.end != -1: result += self.refstr[cur_index:child_match.start] cur_index = child_match.end + 1 else: continue result += self.refstr[cur_index:self.end] return result # Main script haystack = '''[ this is [ hello [ who ] [what ] from the other side ] slim shady ]''' root = BracketMatch(haystack) cur_match = root for i in range(len(haystack)): if '[' == haystack[i]: new_match = BracketMatch(haystack, cur_match, i) cur_match.nested_matches.append(new_match) cur_match = new_match elif ']' == haystack[i]: cur_match.end = i cur_match = cur_match.parent else: continue # Here we built the set of matches, now we must print them nodes_list = root.nested_matches # So we conduct a BFS to visit and print each match... while nodes_list != []: node = nodes_list.pop(0) nodes_list.extend(node.nested_matches) print("Match: " + str(node).strip()) 

The output of this program will be:

Match: this is a subtle shadow
Match: hello on the other hand
Match: who Match: what

+1
source
 a = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]' lvl = -1 words = [] for i in a: if i == '[' : lvl += 1 words.append('') elif i == ']' : lvl -= 1 else: words[lvl] += i for word in words: print ' '.join(word.split()) 

This gives o / p -

it's thin shadow

hello on the other hand

who what

+1
source

All Articles