Convert string to dict, then passkey: values ​​??? How to access data in <class 'dict'> for Python?

I'm having problems accessing data inside a dictionary.

Sys: Macbook 2012
Python: Python 3.5.1 :: Continuum Analytics, Inc.

I am working with dask.dataframe created using csv.

Change question

How did I get to this point

Suppose I start with the Pandas series:

df.Coordinates 130 {u'type': u'Point', u'coordinates': [-43.30175... 278 {u'type': u'Point', u'coordinates': [-51.17913... 425 {u'type': u'Point', u'coordinates': [-43.17986... 440 {u'type': u'Point', u'coordinates': [-51.16376... 877 {u'type': u'Point', u'coordinates': [-43.17986... 1313 {u'type': u'Point', u'coordinates': [-49.72688... 1734 {u'type': u'Point', u'coordinates': [-43.57405... 1817 {u'type': u'Point', u'coordinates': [-43.77649... 1835 {u'type': u'Point', u'coordinates': [-43.17132... 2739 {u'type': u'Point', u'coordinates': [-43.19583... 2915 {u'type': u'Point', u'coordinates': [-43.17986... 3035 {u'type': u'Point', u'coordinates': [-51.01583... 3097 {u'type': u'Point', u'coordinates': [-43.17891... 3974 {u'type': u'Point', u'coordinates': [-8.633880... 3983 {u'type': u'Point', u'coordinates': [-46.64960... 4424 {u'type': u'Point', u'coordinates': [-43.17986... 

The problem is that this is not a real dictionary framework. Instead, it is a column full of rows that look like dictionaries. Doing this:

 df.Coordinates.apply(type) 130 <class 'str'> 278 <class 'str'> 425 <class 'str'> 440 <class 'str'> 877 <class 'str'> 1313 <class 'str'> 1734 <class 'str'> 1817 <class 'str'> 1835 <class 'str'> 2739 <class 'str'> 2915 <class 'str'> 3035 <class 'str'> 3097 <class 'str'> 3974 <class 'str'> 3983 <class 'str'> 4424 <class 'str'> 

My goal . Access the coordinates key and value in the dictionary. It. But this is a str

I converted strings to dictionaries using eval .

 new = df.Coordinates.apply(eval) 130 {'coordinates': [-43.301755, -22.990065], 'typ... 278 {'coordinates': [-51.17913026, -30.01201896], ... 425 {'coordinates': [-43.17986794, -22.91000096], ... 440 {'coordinates': [-51.16376782, -29.95488677], ... 877 {'coordinates': [-43.17986794, -22.91000096], ... 1313 {'coordinates': [-49.72688407, -29.33757253], ... 1734 {'coordinates': [-43.574057, -22.928059], 'typ... 1817 {'coordinates': [-43.77649254, -22.86940539], ... 1835 {'coordinates': [-43.17132318, -22.90895217], ... 2739 {'coordinates': [-43.1958313, -22.98755333], '... 2915 {'coordinates': [-43.17986794, -22.91000096], ... 3035 {'coordinates': [-51.01583481, -29.63593292], ... 3097 {'coordinates': [-43.17891379, -22.96476163], ... 3974 {'coordinates': [-8.63388008, 41.14594453], 't... 3983 {'coordinates': [-46.64960938, -23.55902666], ... 4424 {'coordinates': [-43.17986794, -22.91000096], ... 

Then I text the type of the object and get:

 130 <class 'dict'> 278 <class 'dict'> 425 <class 'dict'> 440 <class 'dict'> 877 <class 'dict'> 1313 <class 'dict'> 1734 <class 'dict'> 1817 <class 'dict'> 1835 <class 'dict'> 2739 <class 'dict'> 2915 <class 'dict'> 3035 <class 'dict'> 3097 <class 'dict'> 3974 <class 'dict'> 3983 <class 'dict'> 4424 <class 'dict'> 

If I try to access my dictionaries: new.apply (lambda x: x ['coordinates']

 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-71-c0ad459ed1cc> in <module>() ----> 1 dfCombined.Coordinates.apply(coord_getter) /Users/linwood/anaconda/envs/dataAnalysisWithPython/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds) 2218 else: 2219 values = self.asobject -> 2220 mapped = lib.map_infer(values, f, convert=convert_dtype) 2221 2222 if len(mapped) and isinstance(mapped[0], Series): pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:62658)() <ipython-input-68-748ce2d8529e> in coord_getter(row) 1 import ast 2 def coord_getter(row): ----> 3 return (ast.literal_eval(row))['coordinates'] TypeError: 'bool' object is not subscriptable 

This is some kind of class, because when I run dir , I get this for a single object:

 new.apply(lambda x: dir(x))[130] 130 __class__ 130 __contains__ 130 __delattr__ 130 __delitem__ 130 __dir__ 130 __doc__ 130 __eq__ 130 __format__ 130 __ge__ 130 __getattribute__ 130 __getitem__ 130 __gt__ 130 __hash__ 130 __init__ 130 __iter__ 130 __le__ 130 __len__ 130 __lt__ 130 __ne__ 130 __new__ 130 __reduce__ 130 __reduce_ex__ 130 __repr__ 130 __setattr__ 130 __setitem__ 130 __sizeof__ 130 __str__ 130 __subclasshook__ 130 clear 130 copy 130 fromkeys 130 get 130 items 130 keys 130 pop 130 popitem 130 setdefault 130 update 130 values Name: Coordinates, dtype: object 

My problem . I just want to access the dictionary. But the object is <class 'dict'> . How can I hide this before the usual dictate or just access the key: value pairs?

Any ideas?

+10
python dictionary pandas data-manipulation dask
source share
3 answers

My first instinct is to use json.loads to translate strings in dicts. But the example you posted doesn't follow the json standard as it uses single, not double quotes. So you need to convert the strings first.

The second option is to simply use regex to parse the strings. If the dict strings in your actual DataFrame do not exactly match my examples, I expect the regex method to be more reliable, since lat / long coordinates are pretty standard.

 import re import pandasd as pd df = pd.DataFrame(data={'Coordinates':["{u'type': u'Point', u'coordinates': [-43.30175, 123.45]}", "{u'type': u'Point', u'coordinates': [-51.17913, 123.45]}"], 'idx': [130, 278]}) ## # Solution 1- use json.loads ## def string_to_dict(dict_string): # Convert to proper json format dict_string = dict_string.replace("'", '"').replace('u"', '"') return json.loads(dict_string) df.CoordDicts = df.Coordinates.apply(string_to_dict) df.CoordDicts[0]['coordinates'] #>>> [-43.30175, 123.45] ## # Solution 2 - use regex ## def get_lat_lon(dict_string): # Get the coordinates string with regex rs = re.search("(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)", dict_string).group() # Cast to floats coords = [float(x) for x in rs.split(',')] return coords df.Coords = df.Coordinates.apply(get_lat_lon) df.Coords[0] #>>> [-43.30175, 123.45] 
+4
source share

Looks like you finished something like this

 s = pd.Series([ dict(type='Point', coordinates=[1, 1]), dict(type='Point', coordinates=[1, 2]), dict(type='Point', coordinates=[1, 3]), dict(type='Point', coordinates=[1, 4]), dict(type='Point', coordinates=[1, 5]), dict(type='Point', coordinates=[2, 1]), dict(type='Point', coordinates=[2, 2]), dict(type='Point', coordinates=[2, 3]), ]) s 0 {u'type': u'Point', u'coordinates': [1, 1]} 1 {u'type': u'Point', u'coordinates': [1, 2]} 2 {u'type': u'Point', u'coordinates': [1, 3]} 3 {u'type': u'Point', u'coordinates': [1, 4]} 4 {u'type': u'Point', u'coordinates': [1, 5]} 5 {u'type': u'Point', u'coordinates': [2, 1]} 6 {u'type': u'Point', u'coordinates': [2, 2]} 7 {u'type': u'Point', u'coordinates': [2, 3]} dtype: object 

Decision

 df = s.apply(pd.Series) df 

enter image description here

then get the coordinates

 df.coordinates 0 [1, 1] 1 [1, 2] 2 [1, 3] 3 [1, 4] 4 [1, 5] 5 [2, 1] 6 [2, 2] 7 [2, 3] Name: coordinates, dtype: object 

Or even

 df.coordinates.apply(pd.Series) 

enter image description here

0
source share

Just ran into this problem. My decision:

 import ast import pandas as pd df = pd.DataFrame(["{u'type': u'Point', u'coordinates': [-43,144]}","{u'type': u'Point', u'coordinates': [-34,34]}","{u'type': u'Point', u'coordinates': [-102,344]}"],columns=["Coordinates"]) df = df["Coordinates"].astype('str') df = df.apply(lambda x: ast.literal_eval(x)) df = df.apply(pd.Series) 
0
source share

All Articles