An easy way to convert a string to a dictionary

What is the easiest way to convert a word string = values โ€‹โ€‹to a dictionary, for example the following line:

name="John Smith", age=34, height=173.2, location="US", avatar=":,=)" 

into the following python dictionary:

 {'name':'John Smith', 'age':34, 'height':173.2, 'location':'US', 'avatar':':,=)'} 

The key 'avatar' is intended only to show that strings can contain =, and therefore a simple โ€œsplitโ€ will not do. Any ideas? Thanks!

+6
python string dictionary
source share
10 answers

This works for me:

 # get all the items matches = re.findall(r'\w+=".+?"', s) + re.findall(r'\w+=[\d.]+',s) # partition each match at '=' matches = [m.group().split('=', 1) for m in matches] # use results to make a dict d = dict(matches) 
+9
source share

Change Since the csv module does not work with quotes inside fields, a little more work is required to implement this function.

 import re quoted = re.compile(r'"[^"]*"') class QuoteSaver(object): def __init__(self): self.saver = dict() self.reverser = dict() def preserve(self, mo): s = mo.group() if s not in self.saver: self.saver[s] = '"%d"' % len(self.saver) self.reverser[self.saver[s]] = s return self.saver[s] def expand(self, mo): return self.reverser[mo.group()] x = 'name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"' qs = QuoteSaver() y = quoted.sub(qs.preserve, x) kvs_strings = y.split(',') kvs_pairs = [kv.split('=') for kv in kvs_strings] kvs_restored = [(k, quoted.sub(qs.expand, v)) for k, v in kvs_pairs] def converter(v): if v.startswith('"'): return v.strip('"') try: return int(v) except ValueError: return float(v) thedict = dict((k.strip(), converter(v)) for k, v in kvs_restored) for k in thedict: print "%-8s %s" % (k, thedict[k]) print thedict 

I inform thedict twice to show exactly how and why it differs from the desired result; exit:

 age 34 location US name John Smith avatar :,=) height 173.2 {'age': 34, 'location': 'US', 'name': 'John Smith', 'avatar': ':,=)', 'height': 173.19999999999999} 

As you can see, the output for a floating point value is requested when it is directly emitted using print , but it is not and cannot (since there is no floating point value that display 173.2 in this case! -) when print is applied to everything dict (because it inevitably uses repr for keys and values, and repr of 173.2 takes that form, given the usual problems of how floating point values โ€‹โ€‹are stored in binary rather than decimal, etc. etc. .). You can define a dict subclass that overrides __str__ for special floating point values, I think if this is really a requirement.

But I hope that this distraction will not interfere with the main idea - as long as double quotes are properly balanced (and there are no double letters-inside-double letters), this code performs the required task of preserving "special characters" (commas and equal characters in this case) accepted in their normal sense if they are inside double quotes, even if double quotes start inside the field, and not at the beginning of the field ( csv refers only to the last condition). Insert a few intermediate prints if the way the code works is not obvious - first it changes all the โ€œdouble quotesโ€ to a specially simple form ( "0" , "1" , etc.), while separately recording what the actual contents corresponding to these simple forms; in the end, simple forms are replaced back to the original content. The double quote (for strings) and the conversion of non-quote strings to integers or floats are finally handled by the simple converter function.

+4
source share

Here is a more detailed approach to the problem using pyparsing. Pay attention to parsing actions that perform automatic type conversion from strings to int or float. Also, the QuotedString class implicitly removes quotation marks from the specified value. In conclusion, the Dict class takes each group "key = val" in a comma-separated list and assigns result names using key tokens and values.

 from pyparsing import * key = Word(alphas) EQ = Suppress('=') real = Regex(r'[+-]?\d+\.\d+').setParseAction(lambda t:float(t[0])) integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0])) qs = QuotedString('"') value = real | integer | qs dictstring = Dict(delimitedList(Group(key + EQ + value))) 

Now parse the original text string by storing the results in dd. Pyparsing returns an object of type ParseResults, but this class has many features similar to dict (support for keys (), items (), in, etc.) or it can emit a true Python dict by calling asDict (). The dump call () shows all the tokens in the original parsed list, plus all the named items. The last two examples show how to access named elements in ParseResults as if they were attributes of a Python object.

 text = 'name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"' dd = dictstring.parseString(text) print dd.keys() print dd.items() print dd.dump() print dd.asDict() print dd.name print dd.avatar 

Print

 ['age', 'location', 'name', 'avatar', 'height'] [('age', 34), ('location', 'US'), ('name', 'John Smith'), ('avatar', ':,=)'), ('height', 173.19999999999999)] [['name', 'John Smith'], ['age', 34], ['height', 173.19999999999999], ['location', 'US'], ['avatar', ':,=)']] - age: 34 - avatar: :,=) - height: 173.2 - location: US - name: John Smith {'age': 34, 'height': 173.19999999999999, 'location': 'US', 'avatar': ':,=)', 'name': 'John Smith'} John Smith :,=) 
+2
source share

The following code creates the correct behavior, but it is a little longer! I added space in the avatar to show that it goes well with commas and spaces and equal characters inside the line. Any suggestions for reducing it?

 import hashlib string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":, =)"' strings = {} def simplify(value): try: return int(value) except: return float(value) while True: try: p1 = string.index('"') p2 = string.index('"',p1+1) substring = string[p1+1:p2] key = hashlib.md5(substring).hexdigest() strings[key] = substring string = string[:p1] + key + string[p2+1:] except: break d = {} for pair in string.split(', '): key, value = pair.split('=') if value in strings: d[key] = strings[value] else: d[key] = simplify(value) print d 
+1
source share

Here is an approach with eval , I thought it was unreliable, but its working for your example.

 >>> import re >>> >>> s='name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"' >>> >>> eval("{"+re.sub('(\w+)=("[^"]+"|[\d.]+)','"\\1":\\2',s)+"}") {'age': 34, 'location': 'US', 'name': 'John Smith', 'avatar': ':,=)', 'height': 173.19999999999999} >>> 

Update:

It is better to use the one that Chris Lutz indicated in the comment, I think it is more reliable, because even there are (single / double) quotes in the dict values, it can work.

+1
source share

Here is a slightly more reliable version of regexp:

 import re keyval_re = re.compile(r''' \s* # Leading whitespace is ok. (?P<key>\w+)\s*=\s*( # Search for a key followed by.. (?P<str>"[^"]*"|\'[^\']*\')| # a quoted string; or (?P<float>\d+\.\d+)| # a float; or (?P<int>\d+) # an int. )\s*,?\s* # Handle comma & trailing whitespace. |(?P<garbage>.+) # Complain if we get anything else! ''', re.VERBOSE) def handle_keyval(match): if match.group('garbage'): raise ValueError("Parse error: unable to parse: %r" % match.group('garbage')) key = match.group('key') if match.group('str') is not None: return (key, match.group('str')[1:-1]) # strip quotes elif match.group('float') is not None: return (key, float(match.group('float'))) elif match.group('int') is not None: return (key, int(match.group('int'))) 

It automatically converts float and int to the desired type; processes single and double quotes; handles extraneous spaces in different places; and complains that a poorly formatted string was sent

 >>> s='name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"' >>> print dict(handle_keyval(m) for m in keyval_re.finditer(s)) {'age': 34, 'location': 'US', 'name': 'John Smith', 'avatar': ':,=)', 'height': 173.19999999999999} 
+1
source share

I would suggest a lazy way to do this.

 test_string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"' eval("dict({})".format(test_string)) 

{'age': 34, 'location': 'US', 'avatar': ':, =)', 'name': 'John Smith', 'height': 173.2}

Hope this helps someone!

+1
source share

It seems to me that you just need to set maxsplit = 1, for example, the following should work.

 string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":, =)"' newDict = dict(map( lambda(z): z.split("=",1), string.split(", ") )) 

Edit (see comment):

I did not notice that "," was the value under the avatar, the best option would be to escape, "" wherever you generate data. Even better would be something like JSON;). However, as an alternative to regex, you can try using shlex, which I think creates cleaner code.

 import shlex string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":, =)"' lex = shlex.shlex ( string ) lex.whitespace += "," # Default whitespace doesn't include commas lex.wordchars += "." # Word char should include . to catch decimal words = [ x for x in iter( lex.get_token, '' ) ] newDict = dict ( zip( words[0::3], words[2::3]) ) 
0
source share

take it step by step

 d={} mystring='name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'; s = mystring.split(", ") for item in s: i=item.split("=",1) d[i[0]]=i[-1] print d 
0
source share

Always separated by commas? Use the CSV module to split the line into parts (not verified):

 import csv import cStringIO parts=csv.reader(cStringIO.StringIO(<string to parse>)).next() 
-2
source share

All Articles