Split pandas data column based on number of digits

Question

Split pandas data column based on number of digits

I have a pandas framework that has two columns and a value and the value always consists of an 8 digit number that looks like

>df1 key value 10 10000100 20 10000000 30 10100000 40 11110000

Now I need to take a column of values and break it into the digits present so that my result is a new data frame

 >df_res key 0 1 2 3 4 5 6 7 10 1 0 0 0 0 1 0 0 20 1 0 0 0 0 0 0 0 30 1 0 1 0 0 0 0 0 40 1 1 1 1 0 0 0 0

I can’t change the input format, the most common thing that I thought was to convert the value to a string and loop through each char digit and put it in a list, however I am looking for something more elegant and quick, kind help .

EDIT: the input is not in a string, it is an integer.

+5

python pandas dataframe data-manipulation

john smith Jul 13 '16 at 16:30

source share

4 answers

This should work:

 df.value.astype(str).apply(list).apply(pd.Series).astype(int)

+9

piRSquared Jul 13 '16 at 16:46

source share

Assuming your input is stored as strings, and all have the same length (8, as indicated), then the following works:

 df1 = pd.concat([df1,pd.DataFrame(columns=range(8))]) df1[list(range(8))] = df1['Value'].apply(lambda x: pd.Series(list(str(x)),index=range(8)))

+3

Drtrd Jul 13 '16 at 16:45

source share

The updated version will be:

 df['value'].astype(str).str.join(' ').str.split(' ', expand=True)

This first introduces spaces between characters and then breaks. This is just a workaround to use str.split (perhaps, not necessarily, not sure). But this is pretty fast:

 df = pd.DataFrame({'value': np.random.randint(10**7, 10**8, 10**4)}) %timeit df['value'].astype(str).str.join(' ').str.split(' ', expand=True) 10 loops, best of 3: 25.5 ms per loop %timeit df.value.astype(str).apply(list).apply(pd.Series).astype(int) 1 loop, best of 3: 1.27 s per loop %timeit df['value'].apply(lambda x: pd.Series(list(str(x)),index=range(8))) 1 loop, best of 3: 1.33 s per loop %%timeit arr = df.value.values.astype('S8') pd.DataFrame(np.fromstring(arr, dtype=np.uint8).reshape(-1,8)-48) 1000 loops, best of 3: 1.14 ms per loop

Update: Divakar's solution seems to be the fastest.

+2

ayhan Jul 13 '16 at 16:53

source share

Divakar · Accepted Answer · 2016-07-13T16:53:21+0000

One approach could be -

 arr = df.value.values.astype('S8') df = pd.DataFrame(np.fromstring(arr, dtype=np.uint8).reshape(-1,8)-48)

Run Example -

 In [58]: df Out[58]: key value 0 10 10000100 1 20 10000000 2 30 10100000 3 40 11110000 In [59]: arr = df.value.values.astype('S8') In [60]: pd.DataFrame(np.fromstring(arr, dtype=np.uint8).reshape(-1,8)-48) Out[60]: 0 1 2 3 4 5 6 7 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 2 1 0 1 0 0 0 0 0 3 1 1 1 1 0 0 0 0

Split pandas data column based on number of digits

More articles: