Take Care of your Choropleth Maps

Over the last week I had some fun playing with choropleth maps. Thereby I analyzed the following US poverty map, which was recently published at the Guardian data blog:

To be honest, the first time I saw this map I didn’t thought much about it. Ok, poverty is highest in south central of the United States, especially near Mexican border. But recently I used the same data to demonstrate a choropleth map that I created from-scratch and I was really surprised to see a somewhat different picture:

Naturally, I wanted to know where the differences come from and spent some time to investigate. Actually, I think there are two big fails in the Guardian map (which was made using Google Fusion tables).

Don’t mess around with your class limits

The values in the poverty data range from 6.6% to 22.7% and the map shows them divided into five classes. If one would compute the exact equidistant class limits between the minimum and maximum value one would come up with the following classes (the gray bar is used to indicate the data range):

I’m not sure if this is the default behaviour of Google Fusion Tables or the editors choice, but the Guardian map used the class limits 6-9%, 9-12%, 12-15%, 15-18% and 18-23%. Due to the round numbers one might think that they are easier to understand than the fractioned numbers above, but this comes at the high price of distorted class distribution:

Note that the fifth class (which shows the poorest states) is blown up while the first class is a bit under-represented. Given the highly political topic, I’d argue that while we’re trying to map inequality, we should at least use equally distributed classes.

Don’t mess around with your class colors

The second big failure of the map is the choice of colors. This colors were used for the Guardian map:

Obviously there’s a large jump between the first and second class and an enormous jump between the fourth and fifth color. The fourth color looks like taken from a completely different gradient and is hardly distinguishable from the third color. Again, I’m not sure if this is some kind of default in Google Fusion tables, but maybe they were just hand-picked.

Instead, in my map I simply used equidistant colors from a HSV gradient:

But, as mentioned in the comments below, even equidistant HSV colors are not the best option. The problem is that humans perception of brightness differs from the arithmetical lightness of HSV colors.

To demonstrate this difference, let’s compare the equidistant HSV colors to a hand-picked color scale from colorbrewer2.org:

Quite a different picture, isn’t it?

And better think twice about your class count

Another question is why we should use five classes at all. It’s kind of interesting to see how “dramatically” the picture changes if one changes the number of classes. Given the fact that we’re living in the age of interactive maps that allow us to read data values from tooltips, there’s no more reason to be stingy with colors. At least, I think a number of seven classes should be a better trade-of between correctness and color distinguishability.

Be careful when visualizing non-area-related data using choropleth maps

At the end I just want to mention another well-known problem of choropleth maps. The visual significance of a particular geographic region depends on the color value multiplied with the area of that region. Thus, a larger but equally colored region appears more important than a smaller one. Especially when you’re dealing with non-area related data, like the poverty of human beings, this might cause additional mis-interpretations. One way to get around this is to use cartograms, which aim to resize geographic regions according to a measurement that has more relevance to the context of the data. For instance, in the next image you can see a Dorling cartogram (where circles represent regions) that sizes the states according to 2010 population. This way, we visually relate the poverty rates to the affected population instead of the affected area.

So, what to do next? For me, the clear answer is that we need better educated map makers and, perhaps more importantly, we need better open source tools for thematic mapping. That’s what I’m kind of working on right now..

Update: Jorge Camoes wrote a kind of follow-up post to this one, called The same data, the same map, different stories. Make sure to check it out as well.

22 Comments Take Care of your Choropleth Maps

  1. Pingback: Techniques for Transportation Data: Simple Choropleth for Trip OD « Visualizing Urban Futures

  2. Pingback: Readings: Wk 6 « Omar Bilal Akhtar

  3. Pingback: A Crash Course in Data Journalism | Journalist in Residence

  4. Michal Zimmermann

    Is there easy way (open source or free) to create Dorling cartogram? Is it possible to use rectangles instead of circles?
    You are probably right that 5 classes are just not enough (can’t judge it though as I haven’t seen distribution function), but Google Fusion Tables don’t let you choose more than five intervals.
    I don’t think equidistant intervals are always the best choice – I would say they are not. But it depends on the structure of data you have.

  5. Pingback: Summaries for Week 6 | A Year in J-School

  6. Pingback: So You Want To Be a Data Journalist: The Ethics of Mapping « thewaywardrose

  7. Pingback: Week 6 Readings « keldyortiz

  8. Pingback: Week 5 Dataviz Critique and Readings Summaries « erinrichey

  9. Pingback: Practical mapping by drbazuk - Pearltrees

  10. Pingback: Kartor och färger | Richard Öhrvall

  11. Seth

    Why have classes at all? Why not just color each state according to its data value?

    As for which colors, here’s a nice demonstration of how L*a*b* space is a better choice than HSV for choosing colors for data visualization: http://davidad.net/colorviz/

  12. Jeff Weir

    I think the Dorling cartogram is interesting, but suffers from the problem that it’s much harder for a human to compare different sized circles as it is for them to compare say the bars on a bar chart.

    By the way, I wrote a guest post on choropleths a while back here that might be interesting/relevent http://chandoo.org/wp/2009/07/24/medicare-chart-critique/

  13. Pingback: How to define classes for your thematic map The Excel Charts Blog

  14. Ben

    Informative article, thanks!

    We have been recently using leaflet.js with underlying cloudmade maps and geojson overlays to produce slippy choropleth maps. We’ve taking the shape files into QGIS, simplified them there, the exported them to GeoJSON which then has the geometry and the property attributes tied in. You can then render then with 2 lines of javascript in leaflet.js to colour using ColorBrewer for instance.

    Well worth having a look at those tools. Also agree that a better thematic map tool chain would be good.

    Ben

  15. Jorge Camoes

    Nice. These are two relevant topics, and not only to map making. I agree with you regarding the use of color. Let me add my two cents regarding class limits.

    I’m tempted to say that using equal interval classes in a map is like using alphabetical sort in a bar chart. I would prefer classes that minimize intra-class variation instead of the round-number principle. If you have this sequence: 103, 108, 147, 153 it doesn’t make sense to set a limit at 150.

    Dividing a data range into several equal-sized classes is very dangerous. Try to do it with population density in the US and you’ll see what I mean.

    This is a relevant topic. Glad you bring it up.

  16. Rob Shell

    I like the idea of using equidistant HSV values. I’ll definitely be using this as a resource in my next choropleth mapping project.

    What’s up with Michigan on your maps? You’ve increased the “visual significance of a particular geographic region” by coloring Lake Michigan.

  17. Axel

    Interessting to read. I’m working on a Choropleth Map using Raphael and blank SVG maps from Wikipedia right now for a project. Do you have a hint for a convenient conversion workflow from SVG data to JSON notation?

    Nevertheless, I will keep your advice in mind. Was using Google Fusion Tables, too, first, but want to avoid flash. Now I have to do the calculation myself.

    Best regards, appreciate your work.
    XL

  18. Gregor Aisch

    Hi Rick,

    You’re totally right with your remark on the non equidistant perception of equidistant HSV values. That’s what I’ve been thinking about while I wrote this post. Actually, when we talk about perceived brightness at screens, we’re touching the field of gamma correction, which is a pretty easy task as long as you know the output device. Colors that work on my laptop screen look completely different on my external monitor and so on. If you know about any useful study on the problem of gamma correction for unknown devices I’d be glad.

    Regarding color brewer I have to repeat what I said in one of the comments to my last post: I like the colors but I don’t like the terms of service. Maybe that’s why I keep experimenting with different color scales. At least, my color advices are be free to use for everyone.
    Finally I changed my mind on the color brewer license. It’s a good ressource for color scales.

Comments are closed.