Google Datastore - Blob or Text

The 2 possible ways to store large strings in a Google data warehouse are Text and Blob data types.

In terms of memory consumption, which of the two is recommended? The same question in terms of protobuf serialization and the prospect of deserialization.

+6
google-app-engine google-cloud-datastore
source share
1 answer

There is no significant difference in performance between the two - just use what works best for your data. BlobProperty should be used to store binary data (e.g. str objects), while TextProperty should be used to store any text data (e.g. unicode or str objects). Note that if you store str in TextProperty , it should only contain ASCII bytes (less than 128 hexadecimal or decimal) (unlike BlobProperty ).

Both of these properties are derived from UnindexedProperty , as you can see in source .

Here is an example application that shows that for these ASCII or UTF-8 strings there is no difference in storage costs:

 import struct from google.appengine.ext import db, webapp from google.appengine.ext.webapp.util import run_wsgi_app class TestB(db.Model): v = db.BlobProperty(required=False) class TestT(db.Model): v = db.TextProperty(required=False) class MainPage(webapp.RequestHandler): def get(self): self.response.headers['Content-Type'] = 'text/plain' # try simple ASCII data and a bytestring with non-ASCII bytes ascii_str = ''.join([struct.pack('>B', i) for i in xrange(128)]) arbitrary_str = ''.join([struct.pack('>2B', 0xC2, 0x80+i) for i in xrange(64)]) u = unicode(arbitrary_str, 'utf-8') t = [TestT(v=ascii_str), TestT(v=ascii_str*1000), TestT(v=u*1000)] b = [TestB(v=ascii_str), TestB(v=ascii_str*1000), TestB(v=arbitrary_str*1000)] # demonstrate error cases try: err = TestT(v=arbitrary_str) assert False, "should have caused an error: can't store non-ascii bytes in a Text" except UnicodeDecodeError: pass try: err = TestB(v=u) assert False, "should have caused an error: can't store unicode in a Blob" except db.BadValueError: pass # determine the serialized size of each model (note: no keys assigned) fEncodedSz = lambda o : len(db.model_to_protobuf(o).Encode()) sz_t = tuple([fEncodedSz(x) for x in t]) sz_b = tuple([fEncodedSz(x) for x in b]) # output the results self.response.out.write("text: 1=>%dB 2=>%dB 3=>%dB\n" % sz_t) self.response.out.write("blob: 1=>%dB 2=>%dB 3=>%dB\n" % sz_b) application = webapp.WSGIApplication([('/', MainPage)]) def main(): run_wsgi_app(application) if __name__ == '__main__': main() 

And here is the conclusion:

 text: 1=>172B 2=>128047B 3=>128047B blob: 1=>172B 2=>128047B 3=>128047B 
+4
source share

All Articles