Delete rows of non json object from python dataframe column

Question

Delete rows of non json object from python dataframe column

I have a dataframe, so the column contains both json objects and rows. I want to get rid of strings that do not contain json objects.

Below is what my data framework looks like:

import pandas as pd

df = pd.DataFrame({'A': ["hello","world",{"a":5,"b":6,"c":8},"usa","india",{"a":9,"b":10,"c":11}]})

print(df)

How to delete rows containing only rows, so after deleting these rows, I can apply the rows below this column to convert the json object to separate dataframe columns:

from pandas.io.json import json_normalize
df = json_normalize(df['A'])
print(df)

+2

json python object pandas dataframe

Nikita Gupta Oct 20 '17 at 19:44

source share

3 answers

df[df.applymap(np.isreal).sum(1).gt(0)]
Out[794]: 
                            A
2    {'a': 5, 'b': 6, 'c': 8}
5  {'d': 9, 'e': 10, 'f': 11}

+1

Wen 20 . '17 19:59

If you need an ugly solution that also works ... here the function I created finds columns containing only rows and returns df minus those rows. (since your df has only one column, you will just be a dataframe containing 1 column with all dicts). Then from there you will want to use df = json_normalize(df['A'].values) df = json_normalize(df['A']).

For a single dataframe column ...

import pandas as pd
import numpy as np
from pandas.io.json import json_normalize
def delete_strings(df):
    nrows = df.shape[0]
    rows_to_keep = []
    for row in np.arange(nrows):
        if type(df.iloc[row,0]) == dict:
            rows_to_keep.append(row) #add the row number to list of rows 
                                     #to keep if the row contains a dict
    return df.iloc[rows_to_keep,0] #return only rows with dicts
df = pd.DataFrame({'A': ["hello","world",{"a":5,"b":6,"c":8},"usa","india",
                         {"a":9,"b":10,"c":11}]})
df = delete_strings(df)
df = json_normalize(df['A'].values)
print(df)
#0      {'a': 5, 'b': 6, 'c': 8}
#1    {'a': 9, 'b': 10, 'c': 11}

For multi-column df (also works with single df column):

def delete_rows_of_strings(df):
    rows = df.shape[0] #of rows in df
    cols = df.shape[1] #of coluns in df
    rows_to_keep = [] #list to track rows to keep
    for row in np.arange(rows): #for every row in the dataframe
        #num_string will count the number of strings in the row
        num_string = 0
        for col in np.arange(cols):  #for each column in the row...
            #if the value is a string, add one to num_string
            if type(df.iloc[row,col]) == str:
                num_string += 1
        #if num_string, the number of strings in the column,
        #isn't equal to the number of columns in the row...
        if num_string != cols: #...add that row number to the list of rows to keep
            rows_to_keep.append(row)
    #return the df with rows containing at least one non string
    return(df.iloc[rows_to_keep,:])


df = pd.DataFrame({'A': ["hello","world",{"a":5,"b":6,"c":8},"usa","india"],
                        'B' : ['hi',{"a":5,"b":6,"c":8},'sup','america','china']})
#                          A                         B
#0                     hello                        hi
#1                     world  {'a': 5, 'b': 6, 'c': 8}
#2  {'a': 5, 'b': 6, 'c': 8}                       sup
print(delete_rows_of_strings(df))
#                          A                         B
#1                     world  {'a': 5, 'b': 6, 'c': 8}
#2  {'a': 5, 'b': 6, 'c': 8}                       sup

0

David Rosenman Oct 20 '17 at 20:19

source share

Andy Hayden · Accepted Answer · 2017-10-20T20:13:34+0000

I would prefer to use a check isinstance:

In [11]: df.loc[df.A.apply(lambda d: isinstance(d, dict))]
Out[11]:
                            A
2    {'a': 5, 'b': 6, 'c': 8}
5  {'d': 9, 'e': 10, 'f': 11}

If you want to include numbers too, you can do:

In [12]: df.loc[df.A.apply(lambda d: isinstance(d, (dict, np.number)))]
Out[12]:
                            A
2    {'a': 5, 'b': 6, 'c': 8}
5  {'d': 9, 'e': 10, 'f': 11}

, ...

, json_normalize json, - ( KeyError), :

In [21]: df1 = df.loc[df.A.apply(lambda d: isinstance(d, (dict, np.number)))]

In [22]: json_normalize(list(df1["A"]))
Out[22]:
     a    b    c    d     e     f
0  5.0  6.0  8.0  NaN   NaN   NaN
1  NaN  NaN  NaN  9.0  10.0  11.0

Delete rows of non json object from python dataframe column

More articles: