Upgrading to Django REST Framework 3

A client recently asked me to help them with a ticket

At this point I'd been working with Django and Django REST Framework for years, so thought to myself: Upgrading a 3rd party library from 2to3, easy pickings! Change a few field names and maybe some Serializer method names. I like Serializers. Serializers are cool.

A first

$ pip install -U djangorestframework
$ python manage.py runserver

revealed a large amount of assertion errors coming from the DRF 3 source code.

Code like this:

class Field(object):
    def __init__(...):
        # Some combinations of keyword arguments do not make sense.
        assert not (read_only and write_only), NOT_READ_ONLY_WRITE_ONLY
        assert not (read_only and required), NOT_READ_ONLY_REQUIRED
        assert not (required and default is not empty), NOT_REQUIRED_DEFAULT
        assert not (read_only and self.__class__ == Field), USE_READONLYFIELD

was causing errors like

    AssertionError: Instantiate ReadOnlyField instead of Field(read_only=True)

and

    Field.to_representation() must be implemented.
    If you are upgrading from REST framework version 2
    you might want `ReadOnlyField`

Cool, s/Field/ReadOnlyField/ should take care of that.

Time to set some aggressive estimates:

It wasn't until the next day that I ran the entire test suite:

=========================== short test summary info ============================
 341 failed, 4068 passed, 511 skipped, 2 warnings, 533 error in 993.89 seconds =

Hmm, 341 failed + 533 error.

Wait, how many custom Serializers do we have. 104??!

...

A week later I was still sitting on 409 failed + 29 error.

This was another beast altogether.

Every code base is unique, but I've compiled a list of ideas that helped me through the process:

  • Read the announcement. You'd think this would be obvious and the announcement actually tells you to read it too, but one thing that would've helped me tremendeously was if it had done so in bold red. Read the page! -- Yes. That is better.
  • Familiarise yourself with the DRF3 source before starting. What exactly do BaseSerializers, Serializers, ModelSerializers do and when do you want to sub-class each one? What's the difference between .data and .validated_data (hint: it's not the validation). How does validation work now anyway? Why does DRF3 not call Model.clean() anymore when de-serializing? Should you instantiate a throw-away in order to runModel(**data).clean() or maybe move the validation code into a more generalised spot? The announcement and DRF3 source code (which is well documented) will help with such questions.
  • Set all the compatibility flags in the beginning and work backwards. DRF3 coerces beautiful native Date, Time, DateTime and Decimal objects into strings in Serializer output ... setting DATETIME_FORMAT, DATE_FORMAT, TIME_FORMAT to None and COERCE_DECIMAL_TO_STRING to False respectively will revert such behaviour.
  • Get to green as quickly as possible. I started working through the test failures on a submodule-by-submodule basis but midway through went rambo and started to leave small optimisations, refactors & cleanups by the way-side and went into hack-mode spraying code and TODO's wherever I went in an effort to "just get to green" on my test build and have a solid basis to work from. After this was done, I also had a better overview of the which problems kept reoccuring and required careful refactoring and which ones were one-offs that could be left as special cases.
  • Leave bugs that confuse you for later. When all of a sudden completely unrelated code starts breaking, it might be a good idea to leave it and focus on the more concrete stuff first. This ties in with Get to green as quickly as possible. Some things are obviously wrong behaviour, while others can simply be a special case that was intended but undocumented, or a new DRF3 behaviour, or an old DRF2 behaviour that's not possible anymore, or a different output format causing something else to behave differently in subtle ways, and so on and so forth.

Here's the example of a create flow for one of the Serializers

You would probably be right to assume that this can and will break in many places if you completely re-architecture the APIs and under-lying philosophies of the code the white and gold blue and purple rows were built upon.

Imagine the CustomAddressField throws a ValidationError. After tracing through the code you find while the value originates in gen_contact_details(), it sometimes gets modified in fill_project_data(), a utility function used widely throughout the 300kLOC code base. What is this mysterious value supposed to be? Searching through the code base some other function tries to cast this value into an int() after calling fill_project_data() but leaves it be incase it can't do it. This means it must be str orint in most cases, maybe a float. Since we're using dynamically typed Python, there's a lot of guess-work involved. Leaving hard-to-debug errors for later gives you more time to get to know your own code base and those of DRF2 and DRF3.

A lot of the time opaque high-level failures will resolve themselves when you get to the more fine-grained tests and are able to quickly pin-point & solve issues at a lower level.

  •  Ration your willpower. Carefully stepping through nested upon nested levels of testing + view + serializer + DRF3 source code is mentally taxing work and I decided to reserve my most productive phases of the day for the heavier mental lifting while picking out easier tasks otherwise. There's no shame in switching to a more pedestrian JavaScript bug late in the day, when the same time tomorrow could also be spent working on something tougher.
  •  Don't be afraid to go against what you see in the DRF3 source. A lot of the implicit behaviors in DRF 2 have been replaced by stricter and more explicit ones in DRF3. I enjoy code-philosophical discourse as much as the next guy, but sometimes re-implementing a bit of woo-woo and magic can be an easier way to keep API consistency than to try and shove a DRF2-grown cactus into a DRF3-shaped pot.

Was it worth it?

For me personally: Yes. While altogether a rather bumpy & long ride, I believe it's often the things we don't enjoy at first that help us grow. Serializers were one of the areas I was less familiar with and I now - quite literally - know every nook and cranny of the small little beasts.

Like other big projects the reasons for upgrading from version 2 to 3 are not quite obvious. Under the hood it's a lot nicer. And there's quite a few goodies. At the same time all this goes against the maxim of "If it ain't broke, don't fix it".

I wonder if there's something about working on large open-source code-bases, where a v3makes developers & maintainers snap and spiral into a we-do-things-like-this-now frenzy because working with the old code day-in-day-out simply becomes too intellectually & aesthetically insulting.