React.js tips and tricks

After working at gorgias I've now been lead developer for 3 React.js projects at 3 Series A startups. It's been a lot of fun, indeed I'm incredibly lucky to have gotten the chance to use React.js to implement RhodeCode's ControlCenter when it just got released by Facebook. Indeed React.js is blowing up, growing 300% just in 2015.

Below are a few of the things I've learned along the way, hopefully it'll provide you Denkanstoß (food for thought).

 

Avoid jQuery components

The ecosystem of jQuery is incredibly rich. Especially if you are looking for a component library to base your brand-spankin'-new web app on, you can choose between Semantic UI, Foundation, and Bootstrap. All 3 of which are rich & mature, and have been used in production by hundreds of companies.  

For Semantic UI alone the list of ready-to-use JS modules is: 

For Semantic UI alone the list of ready-to-use JS modules is: Accordion, Checkbox, Dimmer, Dropdown, Embed, Modal, Nag, Popup, Progress, Rating, Search, Shape, Sidebar, Sticky, Tab, Transition

Who wants to implement all that?

In most of these jQuery-based libraries, browser edge cases are already taken care. Mark-up and styling are of consistent quality & philosophy. The temptation is great to simply grab one 

<div id="magicDropdown" ></div>

and a

$('#magicDropdown').dropdown()

you can be on your way, right?

*pause the pregnancy*

No. No, no. NooO!

After too many times trying to leverage existing jQuery libraries I can confidently say: Don't do it.

In addition, since React wants to have total control over the DOM, your app will generally stop working properly any time jQuery modifies the DOM.

Strangely Facebook seems to advertise its' compatibility by including a jquery example in their official react repository.

And on some level I agree that you could do this since modals generally render into a completely separate container directly descendant of <body>. There's a clear set of events that your 

But that's also about where it stops. Anything more complex, such as implementing form elements inside your React DOM tree that is constantly changing will generally lead to memory leaks (which doesn't sound too bad until you have to deal with it), and strange, unexpected behavior.

Any time savings you gain at the start will be offset by hair pulling later.

 

Roll your own components

I am usually someone who tries very hard to reuse other people's code. I don't like to reinvent the wheel. I do not want to invent here.

Let's say you want to implement an infinite list in React.js. (let's also assume that infinite lists are solid UX choice)

You now have the choice between

Which one will you choose?

I can't tell you. After implementing the first 3 in our project I said "screw it" and decided to roll my own. I kept running into problems such as pre-defined markup & style structures that made it hard to style the component in line with existing set of components and with seatgeek's implementation in particular -- strange performance problems.

The interesting distinction here seems to be that while jQuery defines things as changesets, which generally "work well enough", with React.js you have to define the exact state at any point in time. You are tightly bound by React's render cycle and can't simply "bind a few events handlers to make it do that thing when that other thing happens". 

"Cartman, put away the event handler..!"

"Cartman, put away the event handler..!"

Since the app you're working on will generally develop it's very own look & feel, it can be hard to integrate 3rd party components. Especially as you customize the behavior down the line, it makes a lot of sense to bite the bullet and roll your own.

It's been my ambition for some time to push forward a unified UI framework under which React developers can rally to concentrate their efforts. I'm happy to see CloudFlare push forward in this space with cf-ui, and hope that more people will push competitors as well as extensions to it.


React is not a religion

React & Redux generally front-loads a lot of pain.

Wanna have a thing that changes on the page. OK, please define exactly how I should render the entire page under all circumstances.

Want to have a new button? OK, please define a new action identifier, a new action, put the button state into the reducer, as well as the initial state, and now pass the new state through the entire application to where you want it.

Want to have a drag-and-drop interaction? OK, please set aside a day to familiarize yourself with react-dnd and its DragSource, DragLayer, DropTarget & DragDropContext components. Then you can define a collect function that connects the drag source and sets isDragging on the monitor as well as a new ItemType for whatever you want to drag as well as an itemSource which is a plain object with a beginDrag function on it. Now you can use those to instantiate a new DragSource and directly call the class of React component you want to drag around with that.

There's a reason why in raw numbers, jQuery & jQuery UI are still more popular:

var counter = 0
$('.box').click(function() {
  counter += 1
  $(this).text(counter)
})

$('.box').draggable()

This snippet will not win any awards, it isn't pretty, it isn't very maintainable, it wouldn't make it past any half-decent code review. But boy does it feel good while you're hacking on an initial prototype.

React's core philosophy evolved out of the work Facebook's engineers did on XHP. A PHP extension to help them manage the massive complexity they were dealing with.

As such it only really starts to shine once your project eclipses a certain size & complexity.

And if you obfuscate your JavaScript, your friends probably won't know you are still using jQuery in 2016.

 


Redux is not a religion

Sometimes you want to `setState`

 

Use data-fetching containers

Every component does rendering, has a corresponding data fetching container (different apps, style guide, testing)

A container does data fetching and then renders its corresponding sub-component. That’s it.


StockWidgetContainer => StockWidget
TagCloudContainer => TagCloud
PartyPooperListContainer => PartyPooperList

Implement presentational containers first


Use immutable data

While there is the initial hurdle, this will make your code easier to reason about, safer to manipulate, and will save hours debugging. Immutable data structures truly are a godsend
in JavaScript's fast-and-loose data handling philosophy.

 

Upgrading to Django REST Framework 3

A client recently asked me to help them with a ticket

At this point I'd been working with Django and Django REST Framework for years, so thought to myself: Upgrading a 3rd party library from 2to3, easy pickings! Change a few field names and maybe some Serializer method names. I like Serializers. Serializers are cool.

A first

$ pip install -U djangorestframework
$ python manage.py runserver

revealed a large amount of assertion errors coming from the DRF 3 source code.

Code like this:

class Field(object):
    def __init__(...):
        # Some combinations of keyword arguments do not make sense.
        assert not (read_only and write_only), NOT_READ_ONLY_WRITE_ONLY
        assert not (read_only and required), NOT_READ_ONLY_REQUIRED
        assert not (required and default is not empty), NOT_REQUIRED_DEFAULT
        assert not (read_only and self.__class__ == Field), USE_READONLYFIELD

was causing errors like

    AssertionError: Instantiate ReadOnlyField instead of Field(read_only=True)

and

    Field.to_representation() must be implemented.
    If you are upgrading from REST framework version 2
    you might want `ReadOnlyField`

Cool, s/Field/ReadOnlyField/ should take care of that.

Time to set some aggressive estimates:

It wasn't until the next day that I ran the entire test suite:

=========================== short test summary info ============================
 341 failed, 4068 passed, 511 skipped, 2 warnings, 533 error in 993.89 seconds =

Hmm, 341 failed + 533 error.

Wait, how many custom Serializers do we have. 104??!

...

A week later I was still sitting on 409 failed + 29 error.

This was another beast altogether.

Every code base is unique, but I've compiled a list of ideas that helped me through the process:

  •  Read the announcement. You'd think this would be obvious and the announcement actually tells you to read it too, but one thing that would've helped me tremendeously was if it had done so in bold red. Read the page! -- Yes. That is better.
  •  Familiarise yourself with the DRF3 source before starting. What exactly do BaseSerializers, Serializers, ModelSerializers do and when do you want to sub-class each one? What's the difference between .data and .validated_data (hint: it's not the validation). How does validation work now anyway? Why does DRF3 not call Model.clean() anymore when de-serializing? Should you instantiate a throw-away in order to runModel(**data).clean() or maybe move the validation code into a more generalised spot? The announcement and DRF3 source code (which is well documented) will help with such questions.
  •  Set all the compatibility flags in the beginning and work backwards. DRF3 coerces beautiful native Date, Time, DateTime and Decimal objects into strings in Serializer output ... setting DATETIME_FORMAT, DATE_FORMAT, TIME_FORMAT to None andCOERCE_DECIMAL_TO_STRING to False respectively will revert such behaviour.
  •  Get to green as quickly as possible. I started working through the test failures on a submodule-by-submodule basis but midway through went rambo and started to leave small optimisations, refactors & cleanups by the way-side and went into hack-mode spraying code and TODO's wherever I went in an effort to "just get to green" on my test build and have a solid basis to work from. After this was done, I also had a better overview of the which problems kept reoccuring and required careful refactoring and which ones were one-offs that could be left as special cases.
  •  Leave bugs that confuse you for later. When all of a sudden completely unrelated code starts breaking, it might be a good idea to leave it and focus on the more concrete stuff first. This ties in with "Get to green as quickly as possible". Some things are obviously wrong behaviour, while others can simply be a special case that was intended but undocumented, or a new DRF3 behaviour, or an old DRF2 behaviour that's not possible anymore, or a different output format causing something else to behave differently in subtle ways, and so on and so forth.

Here's the example of a create flow for one of the Serializers

You would probably be right to assume that this can and will break in many places if you completely re-architecture the APIs and under-lying philosophies of the code the white and gold blue and purple rows were built upon.

Imagine the CustomAddressField throws a ValidationError. After tracing through the code you find while the value originates in gen_contact_details(), it sometimes gets modified in fill_project_data(), a utility function used widely throughout the 300kLOC code base. What is this mysterious value supposed to be? Searching through the code base some other function tries to cast this value into an int() after callingfill_project_data() but leaves it be incase it can't do it. This means it must be str orint in most cases, maybe a float. Since we're using dynamically typed Python, there's a lot of guess-work involved. Leaving hard-to-debug errors for later gives you more time to get to know your own code base and those of DRF2 and DRF3.

A lot of the time opaque high-level failures will resolve themselves when you get to the more fine-grained tests and are able to quickly pin-point & solve issues at a lower level.

  •  Ration your willpower. Carefully stepping through nested upon nested levels of testing + view + serializer + DRF3 source code is mentally taxing work and I decided to reserve my most productive phases of the day for the heavier mental lifting while picking out easier tasks otherwise. There's no shame in switching to a more pedestrian JavaScript bug late in the day, when the same time tomorrow could also be spent working on something tougher.
  •  Don't be afraid to go against what you see in the DRF3 source. A lot of the implicit behaviours in DRF 2 have been replaced by stricter and more explicit ones in DRF3. I enjoy code-philosophical discourse as much as the next guy, but sometimes re-implementing a bit of woo-woo and magic can be an easier way to keep API consistency than to try and shove a DRF2-grown cactus into a DRF3-shaped pot.

Was it worth it?

For me personally: Yes. While altogether a rather bumpy & long ride, I believe it's often the things we don't enjoy at first that help us grow. Serializers were one of the areas I was less familiar with and I now - quite literally - know every nook and cranny of the small little beasts.

Like other big projects the reasons for upgrading from version 2 to 3 are not quite obvious. Under the hood it's a lot nicer. And there's quite a few goodies. At the same time all this goes against the maxim of "If it ain't broke, don't fix it".

I wonder if there's something about working on large open-source code-bases, where a v3makes developers & maintainers snap and spiral into a we-do-things-like-this-nowfrenzy because working with the old code day-in-day-out simply becomes too intellectually & aesthetically insulting.

 

 

 

7 Python Libraries you should know about

In my years of programming in Python and roaming around GitHub's Explore section, I've come across a few libraries that stood out to me as being particularly enjoyable to use. This blog post is an attempt to further spread that knowledge.

I decided to exclude awesome libraries like requests, SQLAlchemy, Flask, fabricetc. because I think they're already pretty "main-stream". If you know what you're trying to do, it's almost guaranteed that you'll stumble over the aforementioned. This is a list of libraries that in my opinion should be better known, but aren't.

1. pyquery (with lxml)

pip install pyquery

For parsing HTML in Python, Beautiful Soup is oft recommended and it does a great job. It sports a good pythonic API and it's easy to find introductory guides on the web. All is good in parsing-land .. until you want to parse more than a dozen documents at a time and immediately run head-first into performance problems. It's - simply put - very, very slow.

Just how slow? Check out this chart from the excellent Python HTML Parser comparison Ian Bicking compiled in 2008

What immediately stands out is how fast lxml is. Compared to Beautiful Soup, the lxml docs are pretty sparse and that's what originally kept me from adopting this mustang of a parsing library. lxml is pretty clunky to use. Yeah you can learn and use Xpath or cssselect to select specific elements out of the tree and it becomes kind of tolerable. But once you've selected the elements that you actually want to get, you have to navigate the labyrinth of attributes lxml exposes, some containing the bits you want to get at, but the vast majority just returning None. This becomes easier after a couple dozen uses but it remains unintuitive.

So either slow and easy to use or fast and hard to use, right?

Wrong!

Enter PyQuery

Oh PyQuery you beautiful seductress:

from pyquery import PyQuery
page = PyQuery(some_html)

last_red_anchor = page('#container > a.red:last')

Easy as pie. It's ever-beloved jQuery but in Python!

There are some gotchas, like for example that PyQuery, like jQuery, exposes its internals upon iteration, forcing you to re-wrap:

for paragraph in page('#container > p'):
    paragraph = PyQuery(paragraph)
    text = paragraph.text()

That's a wart the PyQuery creators ported over from jQuery (where they'd fix it if it didn't break compatability). Understandable but still unfortunate for such a great library.

2. dateutil

pip install python-dateutil

Handling dates is a pain. Thank god dateutil exists. I won't even go near parsing dates without trying dateutil.parser first:

from dateutil.parser import parse

>>> parse('Mon, 11 Jul 2011 10:01:56 +0200 (CEST)')
datetime.datetime(2011, 7, 11, 10, 1, 56, tzinfo=tzlocal())

# fuzzy ignores unknown tokens

>>> s = """Today is 25 of September of 2003, exactly
...        at 10:49:41 with timezone -03:00."""
>>> parse(s, fuzzy=True)
datetime.datetime(2003, 9, 25, 10, 49, 41,
                  tzinfo=tzoffset(None, -10800))

Another thing that dateutil does for you, that would be a total pain to do manually, is recurrence:

>>> list(rrule(DAILY, count=3, byweekday=(TU,TH),
...            dtstart=datetime(2007,1,1)))
[datetime.datetime(2007, 1, 2, 0, 0),
 datetime.datetime(2007, 1, 4, 0, 0),
 datetime.datetime(2007, 1, 9, 0, 0)]

3. fuzzywuzzy

pip install fuzzywuzzy

fuzzywuzzy allows you to do fuzzy comparison on wuzzes strings. This has a whole host of use cases and is especially nice when you have to deal with human-generated data.

Consider the following code that uses the Levenshtein distance comparing some user input to an array of possible choices.

from Levenshtein import distance

countries = ['Canada', 'Antarctica', 'Togo', ...]

def choose_least_distant(element, choices):
    'Return the one element of choices that is most similar to element'
    return min(choices, key=lambda s: distance(element, s))

user_input = 'canaderp'
choose_least_distant(user_input, countries)
>>> 'Canada'

This is all nice and dandy but we can do better. The ocean of 3rd party libs in Python is so vast, that in most cases we can just import something and be on our way:

from fuzzywuzzy import process

process.extractOne("canaderp", countries)
>>> ("Canada", 97)

More has been written about fuzzywuzzy here.

4. watchdog

pip install watchdog

watchdog is a Python API and shell utilities to monitor file system events. This means you can watch some directory and define a "push-based" system. Watchdog supports all kinds of problems. A solid piece of engineering that does it much better than the 5 or so libraries I tried before finding out about it.

5. sh

pip install sh

sh allows you to call any program as if it were a function:

from sh import git, ls, wc

# checkout master branch
git(checkout="master")

# print(the contents of this directory
print(ls("-l"))

# get the longest line of this file
longest_line = wc(__file__, "-L")

6. pattern

pip install pattern

This behemoth of a library advertises itself quite modestly:

Pattern is a web mining module for the Python programming language.

... that does Data MiningNatural Language ProcessingMachine Learning and Network Analysis all in one. I myself yet have to play with it but a friend's verdict was very positive.

7. path.py

pip install path.py

When I first learned Python os.path was my least favorite part of the stdlib.

Even something as simple as creating a list of files in a directory turned out to be grating:

import os

some_dir = '/some_dir'
files = []

for f in os.listdir(some_dir):
    files.append(os.path.joinpath(some_dir, f))

That listdir is in os and not os.path is unfortunate and unexpected and one would really hope for more from such a prominent module. And then all this manual fiddling for what really should be as simple as possible.

But with the power of path, handling file paths becomes fun again:

from path import path

some_dir = path('/some_dir')

files = some_dir.files()

Done!

Other goodies include:

>>> path('/').owner
'root'

>>> path('a/b/c').splitall()
[path(''), 'a', 'b', 'c']

# overriding __div__
>>> path('a') / 'b' / 'c'
path('a/b/c')

>>> path('ab/c').relpathto('ab/d/f')
path('../d/f')

Best part of it all? path subclasses Python's str so you can use it completely guilt-free without constantly being forced to cast it to str and worrying about libraries that checkisinstance(s, basestring) (or even worse isinstance(s, str)).

That's it! I hope I was able to introduce you to some libraries you didn't know before.