(note: this is an update, now I use data classes with type enforcement at runtime. see below: 3)
So, a year has passed, and I'm going to answer my question. I don’t really like to answer it myself, but: it will mark the stream as allowed, which in itself can help others.
On the other hand, I want to document and explain why I chose my solution instead of the suggested answers. Not to prove that I am right, but to emphasize various compromises.
I just realized that it turned out pretty long, so:
TL; dr
collections.abc contains powerful abstractions, and you should use them if you have access to them (cpython> = 3.3). @property to use, makes it easy to add documentation and provides read-only access. Nested classes look weird, but perfectly reproduce the structure of deeply nested JSON.
Suggested Solutions
Python meta classes
So firstly: I love the concept. I reviewed many applications where they prove to be useful, especially when:
- writing a plug-in API in which meta-classes ensure the correct use of derived classes and features of their implementation
- have a fully automated registry of classes that are derived from the meta class.
On the other hand, the logic of the Python meta class seemed incomprehensible to wrap my head (it took me at least three days to figure this out). Although basically everything is simple, the devil is in the details. So, I decided against this, simply because I can abandon the project in the near future, and others should be able to understand where I left off.
namedtuple
collections.namedtuple very efficient and concise enough to reduce my solution to multiple lines instead of the current 800+ lines. My IDE will also be able to analyze the possible members of the generated class.
Cons: The shortness of namedtuple leaves much less room for much-needed documentation of the returned API values. So with less insane APIs you might get away with just that. It also seems weird to embed class objects in a named tuple, but this is just a personal preference.
What i went with
So, in the end, I decided to stick with my first original solution with the addition of a few small details, if you find the details interesting, you can look at the source on github .
collections.abc
When I started the project, my Python knowledge was almost equal, so I used what I knew about Python ("everything is dictate") and wrote such code. For example: classes that work like a dict, but have a file structure at the bottom (this was before pathlib ).
Looking through the Python code, I noticed how they implement and implement the "features" of the container through abstract base classes, which sounds much more complicated than it actually is in Python.
the basics
The following is really very simple, but we will build from there.
from collections import Mapping, Sequence, Sized class JsonWrapper(Sized): def __len__(self): return len(self._data) def __init__(self, json): self._data = json @property def raw(self): return self._data
The most basic class I could come up with, it will just let you call len on the container. You can also access read-only through raw if you really want to use a basic dictionary.
So why am I inheriting from Sized instead of just starting from scratch and def __len__ just like that?
- no
__len__ override __len__ not be accepted by the python interpreter. I forgot exactly when, but AFAIR is when you import a module that contains a class, so you don't mess up at runtime. - Although
Sized does not provide mixin methods, the following two abstractions provide them. I will explain there.
After that, we have only two main cases in JSON lists and comments.
Lists
So, with the API I had to worry about, we are not always sure what we got; so I wanted to check if I get a list when we initialize the wrapper class, mainly to interrupt before the "object has no member" during more complex processes.
Extracting from Sequence will override __getitem__ and __len__ (which is already implemented in JsonWrapper ).
class JsonListWrapper(JsonWrapper, Sequence): def __init__(self, json_list): if type(json_list) is not list: raise TypeError('received type {}, expected list'.format(type(json_list))) super().__init__(json_list) def __getitem__(self, index): return self._data[index] def __iter__(self): raise NotImplementedError('__iter__') def get(self, index): try: return self._data[index] except Exception as e: print(index) raise e
So you might notice that I decided not to implement __iter__ . I wanted the iterator to produce typed objects so that my IDE could autocomplete. Illustrate:
class CourseListResponse(JsonListWrapper): def __iter__(self): for course in self._data: yield self.Course(course) class Course(JsonDictWrapper): pass
__contains__ implementing abstract Sequence methods, the __contains__ __reversed__ mixin methods are __contains__ , __reversed__ , index and count , so you don’t have to worry about possible side effects.
Dictionaries
To complete the basic types to argue JSON, here is a class derived from Mapping :
class JsonDictWrapper(JsonWrapper, Mapping): def __init__(self, json_dict): super().__init__(json_dict) if type(self._data) is not dict: raise TypeError('received type {}, expected dict'.format(type(json_dict))) def __iter__(self): return iter(self._data) def __getitem__(self, key): return self._data[key] __marker = object() def get(self, key, default=__marker): try: return self._data[key] except KeyError: if default is self.__marker: raise else: return default
Mapping provides only __iter__ , __getitem__ and __len__ . To avoid confusion: there is also MutableMapping which will provide writing methods. But this is neither necessary nor necessary here.
Without using abstract methods, python provides __contains__ , keys , items , values , get , __eq__ and __ne__ based on them.
I'm not sure why I decided to override get mixin, I could update the post when it returns to me. __marker serves as __marker to determine if the keyword was set to default . If someone decided to call get(*args, default=None) you will not be able to detect this otherwise.
So, to take the previous example:
class CourseListResponse(JsonListWrapper):
Properties provide read-only access to members and can be documented as a function definition. Despite the verbosity, for the main means of access, you can easily define a template in your editor, so writing it is not so tedious.
Properties also allow you to abstract from magic numbers and optional JSON return values to provide default values, instead KeyError for KeyError everywhere:
@property def isdir(self): return 1 == self[Jn.is_dir] @property def time_created(self): return self.get(Jn.time_created, 0) @property def file_size(self): return self.get(Jn.file_size, -1) @property def author(self): return self.get(Jn.author, "") @property def license(self): return self.get(Jn.license, "")
class nesting
It seems a little strange to invest classes in others. I decided to do this because the API uses the same name for different objects with different attributes, depending on which remote function you called.
Another advantage: new people can easily understand the structure of the returned JSON.
The end of the file contains various aliases for nested classes for easier access from outside the module.
adding logic
Now that we have encapsulated most of the return values, I wanted to have more data logic to add some convenience. It also seemed necessary to combine some data into a more complete tree containing all the data collected using several API calls:
- get all the "tasks". Each task contains many applications, therefore:
- for (assignment in assignments) to get all the "views"
- Combine the views into the appropriate task.
- Now I get ratings for the materials presented and so on ...
I decided to implement them separately, so I just inherited from the "dumb" access methods ( full source ):
So in this class
class Assignment(MoodleAssignment): def __init__(self, data, course=None): super().__init__(data) self.course = course self._submissions = {}
these properties make the merge
@property def submissions(self): return self._submissions @submissions.setter def submissions(self, data): if data is None: self.submissions = {} return for submission in data: sub = Submission(submission, assignment=self) if sub.has_content: self.submissions[sub.id] = sub @property def grades(self): return self._grades @grades.setter def grades(self, data): if data is None: self.grades = {} return grades = [Grade(g) for g in data] for g in grades: self.grades[g.user_id] = g
and they implement some logic that can be abstracted from the data.
@property def is_due(self): now = datetime.now() return now > self.due_date @property def due_date(self): return datetime.fromtimestamp(super().due_date)
Although setters hide disputes, it's nice to write and use them: so it's just a compromise.
Caution: the logical implementation is not quite what I want, there is a lot of interdependence where it should not be. I grew up because I didn't know enough Python for the right abstraction and for something to be done, so that I can do the real work with a tedious task. Now that I know what could be done: I look at some of these spaghetti, and well ... you know this feeling.
Conclusion
Encapsulating JSON in classes has proven very useful to me and the project structure, and I'm quite happy with that. The rest of the project is fine and working, although some parts are just terrible: D Thanks to everyone for the feedback, I will be there for questions and comments.
update: 2019-05-02
As @RickTeachey notes in the comments, you can also use python data classes (DC) here. And I forgot to put the update here, since I already did this some time ago and expanded it using the typing function of pythons: D
The reason for this: I’m tired of manually checking the correctness of the API documentation from which I abstracted, or the incorrect implementation. With dataclasses.fields I can check if the answer matches my schema; and now I can find changes in the external API much faster, since the assumptions are checked at runtime when creating the instance.
__post_init__(self) domains provide __post_init__(self) for some post-processing after the successful completion of __init__ . Pythons type hints are used only to provide hints to static controllers. I built a small system that enforces types for data classes in the post-initialization phase.
Here is BaseDC from which all other DCs are inherited (abbreviated)
import dataclasses as dc @dataclass class BaseDC: def _typecheck(self): for field in dc.fields(self): expected = field.type f = getattr(self, field.name) actual = type(f) if expected is list or expected is dict: log.warning(f'untyped list or dict in {self.__class__.__qualname__}: {field.name}') if expected is actual: continue if is_generic(expected): return self._typecheck_generic(expected, actual)
Fields have an additional attribute that allows you to store arbitrary information; I use it to store functions that convert the response value; But more on that later.
The basic response shell looks like this:
@dataclass class DCcore_enrol_get_users_courses(BaseDC): id: int # id of course shortname: str # short name of course fullname: str # long name of course enrolledusercount: int # Number of enrolled users in this course idnumber: str # id number of course visible: int # 1 means visible, 0 means hidden course summary: Optional[str] = None # summary summaryformat: Optional[int] = None # summary format (1 = HTML, 0 = MOODLE, 2 = PLAIN or 4 = MARKDOWN) format: Optional[str] = None # course format: weeks, topics, social, site showgrades: Optional[int] = None # true if grades are shown, otherwise false lang: Optional[str] = None # forced course language enablecompletion: Optional[int] = None # true if completion is enabled, otherwise false category: Optional[int] = None # course category id progress: Optional[float] = None # Progress percentage startdate: Optional[int] = None # Timestamp when the course start enddate: Optional[int] = None # Timestamp when the course end def __str__(self): return f'{self.fullname[0:39]:40} id:{self.id:5d} short: {self.shortname}' core_enrol_get_users_courses = destructuring_list_cast(DCcore_enrol_get_users_courses)
The answers, which were just lists, were a List[DCcore_enrol_get_users_courses] to me at first, because I couldn’t force them with a simple List[DCcore_enrol_get_users_courses] . This is where destructuring_list_cast solves this problem for me, which is a bit more complicated. We enter the territory of a higher order function:
T = typing.TypeVar('T') def destructuring_list_cast(cls: typing.Callable[[dict], T]) -> typing.Callable[[list], T]: def cast(data: list) -> List[T]: if data is None: return [] if not isinstance(data, list): raise SystemExit(f'listcast expects a list, you sent: {type(data)}') try: return [cls(**entry) for entry in data] except TypeError as err:
This expects a Callable, which takes a dict and returns an instance of the class type T that you expect from a constructor or factory. This returns a Callable that will accept the list, here it is cast . return [cls(**entry) for entry in data] does all the work here, creating a list of data classes when you call core_enrol_get_users_courses(response.json()) . (Throwing SystemExit , but it is handled in the upper layers, so it works for me; I want it to fail quickly and quickly.)
In another case, it is used to define nested fields, then the answers are deeply nested: remember field.metadata.get('castfunc', False) in BaseDC ? This is where these two shortcuts come in:
# destructured_cast_field def dcf(cls): return dc.field(metadata={'castfunc': destructuring_list_cast(cls)}) def optional_dcf(cls): return dc.field(metadata={'castfunc': destructuring_list_cast(cls)}, default_factory=list)
They are used in the following cases (see below):
@dataclass class core_files_get_files(BaseDC): @dataclass class parent(BaseDC): contextid: int # abbrev ... @dataclass class file(BaseDC): contextid: int component: str timecreated: Optional[int] = None # Time created # abbrev ... parents: List[parent] = dcf(parent) files: Optional[List[file]] = optional_dcf(file)