These are three questions that I just canβt understand, I hope someone can help me.
import pandas as pd data = {'Col1': ['ONE, ONE, NULL', 'ONE, TWO, THREE', 'TWO, NULL, TEN']} index = pd.Index(['d1','d2','d3']) data = pd.DataFrame(data,index=index) pattern = 'ONE|TWO' <----QUESTION1 data['Col1'].str.findall(pattern) <----QUESTION2
Question1: How to change this regular expression so that 'ONE' is found only once in d1? As of now, each ONE instance will be returned, as shown below.
d1 [ONE, ONE] d2 [ONE, TWO] d3 [TWO]
I want it
d1 [ONE] d2 [ONE, TWO] d3 [TWO]
Question2:
I want to take the list d1, d2 and d3 and make it into one list containing only unique values. This is something like this:
set(d1 + d2 + d3) ---> ['ONE', 'TWO']
Question3:
If I did something like this:
data['Col2'] = data['Col1'].str.findall(pattern)
How can I iterate over each row in Col2 to get the same results as in Question2?