Take Care of your Choropleth Maps

Over the last week I had some fun playing with choropleth maps. Thereby I analyzed the following US poverty map, which was recently published at the Guardian data blog:

To be honest, the first time I saw this map I didn’t thought much about it. Ok, poverty is highest in south central of the United States, especially near Mexican border. But recently I used the same data to demonstrate a choropleth map that I created from-scratch and I was really surprised to see a somewhat different picture:

Naturally, I wanted to know where the differences come from and spent some time to investigate. Actually, I think there are two big fails in the Guardian map (which was made using Google Fusion tables).

Don’t mess around with your class limits

The values in the poverty data range from 6.6% to 22.7% and the map shows them divided into five classes. If one would compute the exact equidistant class limits between the minimum and maximum value one would come up with the following classes (the gray bar is used to indicate the data range):

I’m not sure if this is the default behaviour of Google Fusion Tables or the editors choice, but the Guardian map used the class limits 6-9%, 9-12%, 12-15%, 15-18% and 18-23%. Due to the round numbers one might think that they are easier to understand than the fractioned numbers above, but this comes at the high price of distorted class distribution:

Note that the fifth class (which shows the poorest states) is blown up while the first class is a bit under-represented. Given the highly political topic, I’d argue that while we’re trying to map inequality, we should at least use equally distributed classes.

Don’t mess around with your class colors

The second big failure of the map is the choice of colors. This colors were used for the Guardian map:

Obviously there’s a large jump between the first and second class and an enormous jump between the fourth and fifth color. The fourth color looks like taken from a completely different gradient and is hardly distinguishable from the third color. Again, I’m not sure if this is some kind of default in Google Fusion tables, but maybe they were just hand-picked.

Instead, in my map I simply used equidistant colors from a HSV gradient:

But, as mentioned in the comments below, even equidistant HSV colors are not the best option. The problem is that humans perception of brightness differs from the arithmetical lightness of HSV colors.

To demonstrate this difference, let’s compare the equidistant HSV colors to a hand-picked color scale from colorbrewer2.org:

Quite a different picture, isn’t it?

And better think twice about your class count

Another question is why we should use five classes at all. It’s kind of interesting to see how “dramatically” the picture changes if one changes the number of classes. Given the fact that we’re living in the age of interactive maps that allow us to read data values from tooltips, there’s no more reason to be stingy with colors. At least, I think a number of seven classes should be a better trade-of between correctness and color distinguishability.

Be careful when visualizing non-area-related data using choropleth maps

At the end I just want to mention another well-known problem of choropleth maps. The visual significance of a particular geographic region depends on the color value multiplied with the area of that region. Thus, a larger but equally colored region appears more important than a smaller one. Especially when you’re dealing with non-area related data, like the poverty of human beings, this might cause additional mis-interpretations. One way to get around this is to use cartograms, which aim to resize geographic regions according to a measurement that has more relevance to the context of the data. For instance, in the next image you can see a Dorling cartogram (where circles represent regions) that sizes the states according to 2010 population. This way, we visually relate the poverty rates to the affected population instead of the affected area.

So, what to do next? For me, the clear answer is that we need better educated map makers and, perhaps more importantly, we need better open source tools for thematic mapping. That’s what I’m kind of working on right now..

Update: Jorge Camoes wrote a kind of follow-up post to this one, called The same data, the same map, different stories. Make sure to check it out as well.