Python 3 reading / writing compressed json objects from / to gzip file

For Python3, I executed @Martijn Pieters code with this:

import gzip import json # writing with gzip.GzipFile(jsonfilename, 'w') as fout: for i in range(N): uid = "whatever%i" % i dv = [1, 2, 3] data = json.dumps({ 'what': uid, 'where': dv}) fout.write(data + '\n') 

but this leads to an error:

 Traceback (most recent call last): ... File "C:\Users\Think\my_json.py", line 118, in write_json fout.write(data + '\n') File "C:\Users\Think\Anaconda3\lib\gzip.py", line 258, in write data = memoryview(data) TypeError: memoryview: a bytes-like object is required, not 'str' 

Any thoughts on what's going on?

+23
source share
2 answers

There are four stages of transformation.

  • Python data structure (nested voice recorders, lists, strings, numbers, booleans)
  • Python string containing a serialized representation of this data structure ("JSON")
  • list of bytes containing a representation of this string ("UTF-8")
  • list of bytes containing a representation of this previous list of bytes ("gzip")

So do these steps one by one.

 import gzip import json data = [] for i in range(N): uid = "whatever%i" % i dv = [1, 2, 3] data.append({ 'what': uid, 'where': dv }) # 1. data json_str = json.dumps(data) + "\n" # 2. string (ie JSON) json_bytes = json_str.encode('utf-8') # 3. bytes (ie UTF-8) with gzip.GzipFile(jsonfilename, 'w') as fout: # 4. gzip fout.write(json_bytes) 

Note that adding "\n" is completely redundant here. It does not violate anything, but beyond that it is useless.

Reading works the exact same way:

 with gzip.GzipFile(jsonfilename, 'r') as fin: # 4. gzip json_bytes = fin.read() # 3. bytes (ie UTF-8) json_str = json_bytes.decode('utf-8') # 2. string (ie JSON) data = json.loads(json_str) # 1. data print(data) 

Of course, the steps can be combined:

 with gzip.GzipFile(jsonfilename, 'w') as fout: fout.write(json.dumps(data).encode('utf-8')) 

and

 with gzip.GzipFile(jsonfilename, 'r') as fin: data = json.loads(fin.read().decode('utf-8')) 
+68
source

The solution mentioned in fooobar.com/questions/15408367 / ... (thanks, @Rafe) has a big advantage: since coding is done on the fly, you are not creating two complete intermediate string objects generated by json. With large objects, this saves memory.

In addition to the post mentioned, the decryption is also simple:

 with gzip.open(filename, 'rt', encoding='ascii') as zipfile: my_object = json.load(zipfile) 
+1
source

All Articles