Serialization (also known as marshalling) and its inverse is a common task for many programmers. For the unfamiliar, serialization is, for the purposes of this post, the process of taking some object in memory, and creating a serial representation of it in some standard format for the purposes of transferring the object to some other process, where the opposite (serial => memory) will take place. Some popular serialization formats include JSON, Google’s Protocol Buffers, YAML, XML, Python’s Pickle, and many others.
In the above list, there can be found two broad categories of serialization formats: human-readable (JSON, YAML, XML), and machine-readable (Protocol Buffers, Pickle). Machine-readable formats are great if you want really fast de/serialization. They’re pretty terrible for humans to read, though. For example, take a look at how Pickle serializes a simple dictionary:
While it’s possible to kind of make out what’s going on, it’s clear to see that once the objects start to become more complicated, and have multiple nested properties, a human reader would quickly be overwhelmed.
Human-readable serialization formats, on the other hand, are expectedly quite nice for humans to read. In JSON, favorite_color would look like this:
YAML is even easier for humans to read:
JSON and YAML are essentially textual representations of arbitrarily deeply-nested trees. Python represents such trees as
json module is included in Python’s standard library, and
py-yaml is easily installed with pip. These modules expose simple APIs that suck in some valid JSON/YAML and spit out a sweet sweet
dict. So for this post, we will work with in-memory
With that, let’s explore a technique to easily define complex Python objects which may be deserialized from nested dictionaries.
To set the stage, here is the problem we would like to solve. Let’s say we have the following complex nested dictionary (as expressed in YAML for ease-of-reading):
We would like to define a set of Python objects to reprsent this nested structure. We would like to be able to access the various attributes naturally as properties, as such:
We would also like for it to be easy to intialize the object, given a dictionary, and we would like that initialization to be typed. That is to say, if we are given a
str instead of an
tyrion’s age, we would like an Exception to be raised.
We should also be able to define methods on each of these nested objects. Perhaps one such example might be
We can start with a naive way to just represent the classes we want, along with some basic (and quite ugly IMO) type checking. You really have to work in order to get make things type-checked in Python.
Now that we’ve got a class structure, how do we get from a dictionary to a
GoTPerson? We would like to write as little additional code as possible, because as you can see, the type checks already added a lot of overhead!
It’d be nice to use something like
Why doesn’t this work? We are assigning a the
dict house to
tyrion’s house attribute! We need to recursively create a
House from that dict (and create any necessary objects from dicts if House has nested objects)…
So we are forced to make something like this for each of our classes…
This works… But it’s pretty ugly, and really verbose! There must be a better way.
Awesome Pythonic Implementation of Greatness
There is a better way! The scaffolding of this implementation is heavily inspired from The Python Cookbook, a fabulous resource for getting a large variety stuff done in a really nice way.
First, we define a base class.
Now, let’s make
Awesome! We can now initialize a GoTPerson as such
Structure class takes care of type checking, as well as recursively initializing any nested objects from dictionaries. This technique has massively simplified the process of deserializing JSON into Python objects in my own code, and I hope it does the same for yours!
There is one final challenge you might want to undertake. Try to cleanly establish type checks on the items of lists, which are not present in our current implementation.