Wrapping a python class around JSON data, which is better?

Preamble . I am writing a python API for a service that provides JSON. Files are stored in JSON format on disk for caching values. The API must have cool access to JSON data, so IDEs and users can understand what attributes (read-only) are in the object before execution, and also provide some convenient functions.

Question : I have two possible implementations, I would like to know which is better or "pythonic". Although I like both, I am open to suggestions if you come up with a better solution.

First solution: defining and inheriting JSONWrapper while pleasant, it is rather verbose and repetitive.

class JsonDataWrapper: def __init__(self, json_data): self._data = json_data def get(self, name): return self._data[name] class Course(JsonDataWrapper): def __init__(self, data): super().__init__(data) self._users = {} # class omitted self._groups = {} # class omitted self._assignments = {} @property def id(self): return self.get('id') @property def name(self): return self.get('full_name') @property def short_name(self): return self.get('short_name') @property def users(self): return self._users @users.setter def users(self, data): users = [User(u) for u in data] for user in users: self.users[user.id] = user # self.groups = user # this does not make much sense without the rest of the code (It works, but that decision will be revised :D) 

Second solution: using lambda for shorter syntax. While running and short, this doesn't quite look right (see edit1 below.)

 def json(name): return property(lambda self: self.get(name)) class Group(JsonDataWrapper): def __init__(self, data): super().__init__(data) self.group_members = [] # elements are of type(User). edit1, was self.members = [] id = json('id') description = json('description') name = json('name') description_format = json('description_format') 

(Naming this function "json" is not a problem since I am not importing json there.)

I have a possible third solution in mind that I cannot completely wrap my head: to override the built-in property, so I can define a decorator that wraps the return name of the field to look for:

 @json # just like a property fget def short_name(self): return 'short_name' 

It might be a little shorter if you don't make the code better.

Disqualified Solutions (IMHO):

  • JSON {De, En} coder: kills all flexibility, doesn't provide any read-only attributes
  • __{get,set}attr__ : cannot determine attributes before runtime. Although it shortens self.get('id') to self['id'] , it will also further complicate the situation when the attribute was not in the json base data.

Thanks for reading!

Edit 1: 2016-07-20T08: 26Z

To clarify (@SuperSaiyan) why I don’t quite like the second solution: I feel that the lambda function is completely disconnected from the rest of the class semantics (which is also the reason why it is shorter: D). I think I can help myself more to my liking by correctly documenting the solution in code. The first solution is easy to understand for anyone who understands the meaning of @property without any further explanation.

In the second comment by @SuperSaiyan: Your question is why I put the Group.members attribute Group.members ? The list stores object types (User), perhaps not the way you think I modified this example.

@jwodder: I will use Code Review next time, I don’t know what it was.

(Also: I really think that Group.members dropped some of you, I edited the code to make it more obvious: the members of the group are the users who will be added to the list.

The full code is on github , and undocumented might be interesting for someone. Keep in mind: this is all WIP: D)

+6
source share
4 answers

(note: this is an update, now I use data classes with type enforcement at runtime. see below: 3)

So, a year has passed, and I'm going to answer my question. I don’t really like to answer it myself, but: it will mark the stream as allowed, which in itself can help others.

On the other hand, I want to document and explain why I chose my solution instead of the suggested answers. Not to prove that I am right, but to emphasize various compromises.

I just realized that it turned out pretty long, so:

TL; dr

collections.abc contains powerful abstractions, and you should use them if you have access to them (cpython> = 3.3). @property to use, makes it easy to add documentation and provides read-only access. Nested classes look weird, but perfectly reproduce the structure of deeply nested JSON.

Suggested Solutions

Python meta classes

So firstly: I love the concept. I reviewed many applications where they prove to be useful, especially when:

  1. writing a plug-in API in which meta-classes ensure the correct use of derived classes and features of their implementation
  2. have a fully automated registry of classes that are derived from the meta class.

On the other hand, the logic of the Python meta class seemed incomprehensible to wrap my head (it took me at least three days to figure this out). Although basically everything is simple, the devil is in the details. So, I decided against this, simply because I can abandon the project in the near future, and others should be able to understand where I left off.

namedtuple

collections.namedtuple very efficient and concise enough to reduce my solution to multiple lines instead of the current 800+ lines. My IDE will also be able to analyze the possible members of the generated class.

Cons: The shortness of namedtuple leaves much less room for much-needed documentation of the returned API values. So with less insane APIs you might get away with just that. It also seems weird to embed class objects in a named tuple, but this is just a personal preference.

What i went with

So, in the end, I decided to stick with my first original solution with the addition of a few small details, if you find the details interesting, you can look at the source on github .

collections.abc

When I started the project, my Python knowledge was almost equal, so I used what I knew about Python ("everything is dictate") and wrote such code. For example: classes that work like a dict, but have a file structure at the bottom (this was before pathlib ).

Looking through the Python code, I noticed how they implement and implement the "features" of the container through abstract base classes, which sounds much more complicated than it actually is in Python.

the basics

The following is really very simple, but we will build from there.

 from collections import Mapping, Sequence, Sized class JsonWrapper(Sized): def __len__(self): return len(self._data) def __init__(self, json): self._data = json @property def raw(self): return self._data 

The most basic class I could come up with, it will just let you call len on the container. You can also access read-only through raw if you really want to use a basic dictionary.

So why am I inheriting from Sized instead of just starting from scratch and def __len__ just like that?

  1. no __len__ override __len__ not be accepted by the python interpreter. I forgot exactly when, but AFAIR is when you import a module that contains a class, so you don't mess up at runtime.
  2. Although Sized does not provide mixin methods, the following two abstractions provide them. I will explain there.

After that, we have only two main cases in JSON lists and comments.

Lists

So, with the API I had to worry about, we are not always sure what we got; so I wanted to check if I get a list when we initialize the wrapper class, mainly to interrupt before the "object has no member" during more complex processes.

Extracting from Sequence will override __getitem__ and __len__ (which is already implemented in JsonWrapper ).

 class JsonListWrapper(JsonWrapper, Sequence): def __init__(self, json_list): if type(json_list) is not list: raise TypeError('received type {}, expected list'.format(type(json_list))) super().__init__(json_list) def __getitem__(self, index): return self._data[index] def __iter__(self): raise NotImplementedError('__iter__') def get(self, index): try: return self._data[index] except Exception as e: print(index) raise e 

So you might notice that I decided not to implement __iter__ . I wanted the iterator to produce typed objects so that my IDE could autocomplete. Illustrate:

 class CourseListResponse(JsonListWrapper): def __iter__(self): for course in self._data: yield self.Course(course) class Course(JsonDictWrapper): pass # for now 

__contains__ implementing abstract Sequence methods, the __contains__ __reversed__ mixin methods are __contains__ , __reversed__ , index and count , so you don’t have to worry about possible side effects.

Dictionaries

To complete the basic types to argue JSON, here is a class derived from Mapping :

 class JsonDictWrapper(JsonWrapper, Mapping): def __init__(self, json_dict): super().__init__(json_dict) if type(self._data) is not dict: raise TypeError('received type {}, expected dict'.format(type(json_dict))) def __iter__(self): return iter(self._data) def __getitem__(self, key): return self._data[key] __marker = object() def get(self, key, default=__marker): try: return self._data[key] except KeyError: if default is self.__marker: raise else: return default 

Mapping provides only __iter__ , __getitem__ and __len__ . To avoid confusion: there is also MutableMapping which will provide writing methods. But this is neither necessary nor necessary here.

Without using abstract methods, python provides __contains__ , keys , items , values , get , __eq__ and __ne__ based on them.

I'm not sure why I decided to override get mixin, I could update the post when it returns to me. __marker serves as __marker to determine if the keyword was set to default . If someone decided to call get(*args, default=None) you will not be able to detect this otherwise.

So, to take the previous example:

 class CourseListResponse(JsonListWrapper): # [...] class Course(JsonDictWrapper): # Jn is just a class that contains the keys for JSON, so I only mistype once. @property def id(self): return self[Jn.id] @property def short_name(self): return self[Jn.short_name] @property def full_name(self): return self[Jn.full_name] @property def enrolled_user_count(self): return self[Jn.enrolled_user_count] # [...] you get the idea 

Properties provide read-only access to members and can be documented as a function definition. Despite the verbosity, for the main means of access, you can easily define a template in your editor, so writing it is not so tedious.

Properties also allow you to abstract from magic numbers and optional JSON return values ​​to provide default values, instead KeyError for KeyError everywhere:

  @property def isdir(self): return 1 == self[Jn.is_dir] @property def time_created(self): return self.get(Jn.time_created, 0) @property def file_size(self): return self.get(Jn.file_size, -1) @property def author(self): return self.get(Jn.author, "") @property def license(self): return self.get(Jn.license, "") 

class nesting

It seems a little strange to invest classes in others. I decided to do this because the API uses the same name for different objects with different attributes, depending on which remote function you called.

Another advantage: new people can easily understand the structure of the returned JSON.

The end of the file contains various aliases for nested classes for easier access from outside the module.

adding logic

Now that we have encapsulated most of the return values, I wanted to have more data logic to add some convenience. It also seemed necessary to combine some data into a more complete tree containing all the data collected using several API calls:

  1. get all the "tasks". Each task contains many applications, therefore:
  2. for (assignment in assignments) to get all the "views"
  3. Combine the views into the appropriate task.
  4. Now I get ratings for the materials presented and so on ...

I decided to implement them separately, so I just inherited from the "dumb" access methods ( full source ):

So in this class

 class Assignment(MoodleAssignment): def __init__(self, data, course=None): super().__init__(data) self.course = course self._submissions = {} # accessed via submission.id self._grades = {} # are accessed via user_id 

these properties make the merge

  @property def submissions(self): return self._submissions @submissions.setter def submissions(self, data): if data is None: self.submissions = {} return for submission in data: sub = Submission(submission, assignment=self) if sub.has_content: self.submissions[sub.id] = sub @property def grades(self): return self._grades @grades.setter def grades(self, data): if data is None: self.grades = {} return grades = [Grade(g) for g in data] for g in grades: self.grades[g.user_id] = g 

and they implement some logic that can be abstracted from the data.

  @property def is_due(self): now = datetime.now() return now > self.due_date @property def due_date(self): return datetime.fromtimestamp(super().due_date) 

Although setters hide disputes, it's nice to write and use them: so it's just a compromise.

Caution: the logical implementation is not quite what I want, there is a lot of interdependence where it should not be. I grew up because I didn't know enough Python for the right abstraction and for something to be done, so that I can do the real work with a tedious task. Now that I know what could be done: I look at some of these spaghetti, and well ... you know this feeling.

Conclusion

Encapsulating JSON in classes has proven very useful to me and the project structure, and I'm quite happy with that. The rest of the project is fine and working, although some parts are just terrible: D Thanks to everyone for the feedback, I will be there for questions and comments.

update: 2019-05-02

As @RickTeachey notes in the comments, you can also use python data classes (DC) here. And I forgot to put the update here, since I already did this some time ago and expanded it using the typing function of pythons: D

The reason for this: I’m tired of manually checking the correctness of the API documentation from which I abstracted, or the incorrect implementation. With dataclasses.fields I can check if the answer matches my schema; and now I can find changes in the external API much faster, since the assumptions are checked at runtime when creating the instance.

__post_init__(self) domains provide __post_init__(self) for some post-processing after the successful completion of __init__ . Pythons type hints are used only to provide hints to static controllers. I built a small system that enforces types for data classes in the post-initialization phase.

Here is BaseDC from which all other DCs are inherited (abbreviated)

 import dataclasses as dc @dataclass class BaseDC: def _typecheck(self): for field in dc.fields(self): expected = field.type f = getattr(self, field.name) actual = type(f) if expected is list or expected is dict: log.warning(f'untyped list or dict in {self.__class__.__qualname__}: {field.name}') if expected is actual: continue if is_generic(expected): return self._typecheck_generic(expected, actual) # Subscripted generics cannot be used with class and instance checks if issubclass(actual, expected): continue print(f'mismatch {field.name}: should be: {expected}, but is {actual}') print(f'offending value: {f}') def __post_init__(self): for field in dc.fields(self): castfunc = field.metadata.get('castfunc', False) if castfunc: attr = getattr(self, field.name) new = castfunc(attr) setattr(self, field.name, new) if DEBUG: self._typecheck() 

Fields have an additional attribute that allows you to store arbitrary information; I use it to store functions that convert the response value; But more on that later.

The basic response shell looks like this:

 @dataclass class DCcore_enrol_get_users_courses(BaseDC): id: int # id of course shortname: str # short name of course fullname: str # long name of course enrolledusercount: int # Number of enrolled users in this course idnumber: str # id number of course visible: int # 1 means visible, 0 means hidden course summary: Optional[str] = None # summary summaryformat: Optional[int] = None # summary format (1 = HTML, 0 = MOODLE, 2 = PLAIN or 4 = MARKDOWN) format: Optional[str] = None # course format: weeks, topics, social, site showgrades: Optional[int] = None # true if grades are shown, otherwise false lang: Optional[str] = None # forced course language enablecompletion: Optional[int] = None # true if completion is enabled, otherwise false category: Optional[int] = None # course category id progress: Optional[float] = None # Progress percentage startdate: Optional[int] = None # Timestamp when the course start enddate: Optional[int] = None # Timestamp when the course end def __str__(self): return f'{self.fullname[0:39]:40} id:{self.id:5d} short: {self.shortname}' core_enrol_get_users_courses = destructuring_list_cast(DCcore_enrol_get_users_courses) 

The answers, which were just lists, were a List[DCcore_enrol_get_users_courses] to me at first, because I couldn’t force them with a simple List[DCcore_enrol_get_users_courses] . This is where destructuring_list_cast solves this problem for me, which is a bit more complicated. We enter the territory of a higher order function:

 T = typing.TypeVar('T') def destructuring_list_cast(cls: typing.Callable[[dict], T]) -> typing.Callable[[list], T]: def cast(data: list) -> List[T]: if data is None: return [] if not isinstance(data, list): raise SystemExit(f'listcast expects a list, you sent: {type(data)}') try: return [cls(**entry) for entry in data] except TypeError as err: # here is more code that explains errors raise SystemExit(f'listcast for class {cls} failed:\n{err}') return cast 

This expects a Callable, which takes a dict and returns an instance of the class type T that you expect from a constructor or factory. This returns a Callable that will accept the list, here it is cast . return [cls(**entry) for entry in data] does all the work here, creating a list of data classes when you call core_enrol_get_users_courses(response.json()) . (Throwing SystemExit , but it is handled in the upper layers, so it works for me; I want it to fail quickly and quickly.)

In another case, it is used to define nested fields, then the answers are deeply nested: remember field.metadata.get('castfunc', False) in BaseDC ? This is where these two shortcuts come in:

 # destructured_cast_field def dcf(cls): return dc.field(metadata={'castfunc': destructuring_list_cast(cls)}) def optional_dcf(cls): return dc.field(metadata={'castfunc': destructuring_list_cast(cls)}, default_factory=list) 

They are used in the following cases (see below):

 @dataclass class core_files_get_files(BaseDC): @dataclass class parent(BaseDC): contextid: int # abbrev ... @dataclass class file(BaseDC): contextid: int component: str timecreated: Optional[int] = None # Time created # abbrev ... parents: List[parent] = dcf(parent) files: Optional[List[file]] = optional_dcf(file) 
+1
source

Have you considered using a metaclass?

 class JsonDataWrapper(object): def __init__(self, json_data): self._data = json_data def get(self, name): return self._data[name] class JsonDataWrapperMeta(type): def __init__(self, name, base, dict): for mbr in self.members: prop = property(lambda self: self.get(mbr)) setattr(self, mbr, prop) # You can use the metaclass inside a class block class Group(JsonDataWrapper): __metaclass__ = JsonDataWrapperMeta members = ['id', 'description', 'name', 'description_format'] # Or more programmatically def jsonDataFactory(name, members): d = {"members":members} return JsonDataWrapperMeta(name, (JsonDataWrapper,), d) Course = jsonDataFactory("Course", ["id", "name", "short_name"]) 
+1
source

When developing such an API in which all members are read-only (this means that you do not want them to be overwritten, but can still have mutable data structures as members), I often considered using collections.namedtuple until so far I have no reason for this. It is fast and requires minimal code.

 from collections import namedtuple as nt Group = nt('Group', 'id name shortname users') g = Group(**json) 

Simple

If your json more data than will be used in the object, just filter it:

 g = Group(**{k:v for k,v in json.items() if k in Group._fields}) 

If you want data to be missing by default, you can also do this:

 Group.__new__.__defaults__ = (0, 'DefaultName', 'DefN', None) # now this works: g = Group() # and now this will still work even if some keys are missing; g = Group(**{k:v for k,v in json.items() if k in Group._fields}) 

One of the methods using the above method of setting default values: do not set a default value for one of the members of any mutable object, for example list , because it will be the same modified shared object in all instances:

 # don't do this: Group.__new__.__defaults__(0, 'DefaultName', 'DefN', []) g1 = Group() g2 = Group() g1.users.append(user1) g2.users # output: [user1] <-- whoops! 

Instead, collapse everything into a nice factory that will create a new list (or dict or any user-defined data structure that you need) for the members that need them:

 # jsonfactory.py new_list = Object() def JsonClassFactory(name, *args, defaults=None): '''Produces a new namedtuple class. Any members intended to default to a blank list should be set to the new_list object. ''' cls = nt(name, *args) if defaults is not None: cls.__new__.__defaults__ = tuple(([] if d is new_list else d) for d in defaults) 

Now some json object is defined that defines the fields you want to represent:

 from jsonfactory import JsonClassFactory, new_list MyJsonClass = JsonClassFactory(MyJsonClass, *json_definition, defaults=(0, 'DefaultName', 'DefN', new_list)) 

And as before:

 obj = MyJsonClass(**json) 

OR, if there is additional data:

 obj = MyJsonClass(**{k:v for k,v in json.items() if k in MyJsonClass._fields}) 

If you want the default container to be something other than a list, it's simple enough - just replace the new_list watchdog timer new_list any other desire. If necessary, you can have several sentries at the same time.

And if you still need additional functionality, you can always expand your MyJsonClass :

 class ExtJsonClass(MyJsonClass): __slots__ = () # optional- needed if you want the low memory benefits of namedtuple def __new__(cls, *args, **kwargs): self = super().__new__(cls, *args, **{k:v for k,v in kwargs.items() if k in cls._fields}) return self def add_user(self, user): self.users.append(user) 

The __new__ method above takes care of the missing data problem forever. So now you can always do this:

 obj = ExtJsonClass(**json) 

Simple

+1
source

I myself am new to python, and so excuse me if I seem naive. One solution might be to use __dict__ , as described in the following article:

https://www.safaribooksonline.com/library/view/python-cookbook-3rd/9781449357337/ch06s02.html

Of course, this solution will create problems if there are objects inside the class that relate to another class below and need to be serialized or de-serialized. I would like to hear the opinion of experts here about this decision and about various restrictions.

Any feedback on jsonpickle .

Update:

I just saw your objection to serialization and how you don't like it, since it's all the execution time. I get it. Many thanks.

Below is the code I wrote to get around this. Stretches a bit, but works well, and I don't need to add get / set every time !!!

 import json class JSONObject: exp_props = {"id": "", "title": "Default"} def __init__(self, d): self.__dict__ = d for key in [x for x in JSONObject.exp_props if x not in self.__dict__]: setattr(self, key, JSONObject.exp_props[key]) @staticmethod def fromJSON(s): return json.loads(s, object_hook=JSONObject) def toJSON(self): return json.dumps(self.__dict__, indent=4) s = '{"name": "ACME", "shares": 50, "price": 490.1}' anObj = JSONObject.fromJSON(s) print("Name - {}".format(anObj.name)) print("Shares - {}".format(anObj.shares)) print("Price - {}".format(anObj.price)) print("Title - {}".format(anObj.title)) sAfter = anObj.toJSON() print("Type of dumps is {}".format(type(sAfter))) print(sAfter) 

Results below

 Name - ACME Shares - 50 Price - 490.1 Title - Default Type of dumps is <type 'str'> { "price": 490.1, "title": "Default", "name": "ACME", "shares": 50, "id": "" } 
0
source

All Articles