# Take Care of your Choropleth Maps

Over the last week I had some fun playing with choropleth maps. Thereby I analyzed the following US poverty map, which was recently published at the Guardian data blog:

To be honest, the first time I saw this map I didn’t thought much about it. Ok, poverty is highest in south central of the United States, especially near Mexican border. But recently I used the same data to demonstrate a choropleth map that I created from-scratch and I was really surprised to see a somewhat different picture:

Naturally, I wanted to know where the differences come from and spent some time to investigate. Actually, I think there are two big fails in the Guardian map (which was made using Google Fusion tables).

## Don’t mess around with your class limits

The values in the poverty data range from 6.6% to 22.7% and the map shows them divided into five classes. If one would compute the exact equidistant class limits between the minimum and maximum value one would come up with the following classes (the gray bar is used to indicate the data range):

I’m not sure if this is the default behaviour of Google Fusion Tables or the editors choice, but the Guardian map used the class limits 6-9%, 9-12%, 12-15%, 15-18% and 18-23%. Due to the round numbers one might think that they are easier to understand than the fractioned numbers above, but this comes at the high price of distorted class distribution:

Note that the fifth class (which shows the poorest states) is blown up while the first class is a bit under-represented. Given the highly political topic, I’d argue that while we’re trying to map inequality, we should at least use equally distributed classes.

## Don’t mess around with your class colors

The second big failure of the map is the choice of colors. This colors were used for the Guardian map:

Obviously there’s a large jump between the first and second class and an enormous jump between the fourth and fifth color. The fourth color looks like taken from a completely different gradient and is hardly distinguishable from the third color. Again, I’m not sure if this is some kind of default in Google Fusion tables, but maybe they were just hand-picked.

Instead, in my map I simply used equidistant colors from a HSV gradient:

But, as mentioned in the comments below, even equidistant HSV colors are not the best option. The problem is that humans perception of brightness differs from the arithmetical lightness of HSV colors.

To demonstrate this difference, let’s compare the equidistant HSV colors to a hand-picked color scale from colorbrewer2.org:

Quite a different picture, isn’t it?

Another question is why we should use five classes at all. It’s kind of interesting to see how “dramatically” the picture changes if one changes the number of classes. Given the fact that we’re living in the age of interactive maps that allow us to read data values from tooltips, there’s no more reason to be stingy with colors. At least, I think a number of seven classes should be a better trade-of between correctness and color distinguishability.

## Be careful when visualizing non-area-related data using choropleth maps

At the end I just want to mention another well-known problem of choropleth maps. The visual significance of a particular geographic region depends on the color value multiplied with the area of that region. Thus, a larger but equally colored region appears more important than a smaller one. Especially when you’re dealing with non-area related data, like the poverty of human beings, this might cause additional mis-interpretations. One way to get around this is to use cartograms, which aim to resize geographic regions according to a measurement that has more relevance to the context of the data. For instance, in the next image you can see a Dorling cartogram (where circles represent regions) that sizes the states according to 2010 population. This way, we visually relate the poverty rates to the affected population instead of the affected area.

So, what to do next? For me, the clear answer is that we need better educated map makers and, perhaps more importantly, we need better open source tools for thematic mapping. That’s what I’m kind of working on right now..

Update: Jorge Camoes wrote a kind of follow-up post to this one, called The same data, the same map, different stories. Make sure to check it out as well.

1. Pingback: Readings: Wk 6 « Omar Bilal Akhtar

2. Michal Zimmermann

Is there easy way (open source or free) to create Dorling cartogram? Is it possible to use rectangles instead of circles?
You are probably right that 5 classes are just not enough (can’t judge it though as I haven’t seen distribution function), but Google Fusion Tables don’t let you choose more than five intervals.
I don’t think equidistant intervals are always the best choice – I would say they are not. But it depends on the structure of data you have.

3. Pingback: Week 6 Readings « keldyortiz

4. Pingback: Kartor och färger | Richard Öhrvall

5. Seth

Why have classes at all? Why not just color each state according to its data value?

As for which colors, here’s a nice demonstration of how L*a*b* space is a better choice than HSV for choosing colors for data visualization: http://davidad.net/colorviz/

6. Jeff Weir

I think the Dorling cartogram is interesting, but suffers from the problem that it’s much harder for a human to compare different sized circles as it is for them to compare say the bars on a bar chart.

By the way, I wrote a guest post on choropleths a while back here that might be interesting/relevent http://chandoo.org/wp/2009/07/24/medicare-chart-critique/

7. Ben

Informative article, thanks!

We have been recently using leaflet.js with underlying cloudmade maps and geojson overlays to produce slippy choropleth maps. We’ve taking the shape files into QGIS, simplified them there, the exported them to GeoJSON which then has the geometry and the property attributes tied in. You can then render then with 2 lines of javascript in leaflet.js to colour using ColorBrewer for instance.

Well worth having a look at those tools. Also agree that a better thematic map tool chain would be good.

Ben

8. Jorge Camoes

Nice. These are two relevant topics, and not only to map making. I agree with you regarding the use of color. Let me add my two cents regarding class limits.

I’m tempted to say that using equal interval classes in a map is like using alphabetical sort in a bar chart. I would prefer classes that minimize intra-class variation instead of the round-number principle. If you have this sequence: 103, 108, 147, 153 it doesn’t make sense to set a limit at 150.

Dividing a data range into several equal-sized classes is very dangerous. Try to do it with population density in the US and you’ll see what I mean.

This is a relevant topic. Glad you bring it up.

9. Rob Shell

I like the idea of using equidistant HSV values. I’ll definitely be using this as a resource in my next choropleth mapping project.

What’s up with Michigan on your maps? You’ve increased the “visual significance of a particular geographic region” by coloring Lake Michigan.

10. Axel

Interessting to read. I’m working on a Choropleth Map using Raphael and blank SVG maps from Wikipedia right now for a project. Do you have a hint for a convenient conversion workflow from SVG data to JSON notation?

Nevertheless, I will keep your advice in mind. Was using Google Fusion Tables, too, first, but want to avoid flash. Now I have to do the calculation myself.