A tale of two piggies – area matters even for the Chief Scientific Adviser

I’ve been doing some more reading of the annual report of the Chief Scientific Adviser, Sir Mark Walport. Or rather of the companion volume that, in its own words, “form the evidence for the Government Chief Scientific Advisor’s Annual Report 2014”.

(Question: is it ‘adviser’ with an ‘e’ or ‘advisor’ with an ‘o’? Both spellings appear.)

The document has some infographics, including these two piggies on page 85.

Chief-scientific-adviser-pigs-graphic-numbers-cut-out

 

 

 

 

 

 

 

 

 

 

 

 

 

I’ve obscured the numbers they are representing.

What do you think the ratio of the numbers is?

Now a golden rule of infographic design is that if using a single image (in this case, a pig) to represent data then it is the areas that must be scaled not just the width or the height. For example, say we want to visualise 1 with a square which is one unit wide and one unit high, like this:

Square-1x1-alone

 

 

 

 

 

Now let’s say we want to visualise 2 with a square. It would be wrong to do this as a 2 x 2 square, like this:

Square-1x1-with-2x2

 

 

 

 

 

The ratio we want to visualise is 2 to 1, but because we see area over any individual dimension (width or height) we see a ratio of 4 to 1.

Bearing this in mind then each dimension needs to be scaled up by the square root of 2, which is about 1.414. In other words to get the ratio of areas to be 2 to 1 we need a bigger square 1.414 units wide by 1.414 units high, like this one in black:

Square-1x1-with-2x2-and-1.414x1.414

 

 

 

 

 

You can see how ‘wrong’ things looked when we scaled both dimensions by 2, rather than area by 2.

So, back to our piggies…

What estimate did you make of the ratio of the two numbers they visualise?

I’ll give you some help.

The pig on the left is 74 units wide and 55 units high. The pig on the right is 100 units wide and 75 units high.

I measured in millimetres to within 1mm either way on a printout. However, you might measure in different units or straight from the screen, which is why I’m referring to ‘units’.

It looks like the pigs are in proportion. A quick check … for the left piggie we have a ratio of 74 / 55 or about 1.35 of width to height; for the right piggie we have a ratio of 100 / 75 or about 1.33. Allowing for error in my measurements these are essentially the same ratios – so the piggies are in proportion. (Had right piggie been tall and skinny or short and fat then things would be different.)

The ratio of the areas of the pigs is (100 x 75) / (74 x 55) or about 1.84 to 1.

I’ll let you know that left piggie represents 61%.

So right piggie should represent 1.84 x 61 per cent … that’s 112%.

But here’s the graphic with the numbers from the document:

Chief-scientific-adviser-pigs-graphic

 

 

 

 

 

 

 

 

 

 

 

 

 

Right piggie is not 112% – it’s 83%.

So, this infographic is not a good visualisation.

Does this matter?

Well, from a purist view, you’d hope that the office of the UK Chief Scientific Adviser would get this right.

But, more seriously, I have experience in political campaigning. I know that opponents of a view may pick up on any error (trivial or not) to argue their view – “if they can’t get the simple things right can you trust them on the issues that matter?!”

Imagine if this graphic were about climate change, fracking, GM foods … .

 

Beautiful data, van Gogh and Vivienne Westwood

When I go to the National Gallery I never miss the chance to spend time looking at the paintings of van Gogh, Monet and Turner. To me they are beautiful.

Beauty is a word readily and unquestionably used of art – even though we may not agree as to which works deserve to be called beautiful. Similarly with architecture, furniture, clothes, home furnishings, cars, computers, phones …

But numbers? Especially numbers in their use as data and statistics??

Yes – in the hands of people like pioneer data-journalist, David McCandless. His book, Information is Beautiful, is a best-seller.

Speaking last night (Wednesday, 19 November 2014) to a packed audience at the Royal Statistical Society, McCandless set out his view that “interesting, strange and even quite magical things can happen” when visualising data.

He is not wrong. Whether it’s Earth-skimming asteroids, government spending, popularity of dog breeds, horoscope text or efficacy of food supplements, McCandless has found ways – through his choices in placement, shape, colour palette and font – of presenting data that is pleasing and intriguing.

They are not the usual graphs and charts and tables of the academic text. They aren’t meant to be. But they can be an inspiration to those working with numbers. They can do for data visualisation what, say, Vivienne Westwood, Alexander McQueen or Giles Deacon do for fashion: challenge established thinking and explore new ground.

Some might quibble about the statistical purity of McCandless’s visualisations. But I’d say that would be like me – as an astrophysicist by education – quibbling about van Gogh’s “The Starry Night”.

Certainly McCandless given me some thoughts on how to develop my rather down-to-earth jelly bean visualisations …!

Why providing sources is vital – even for the Government chief scientific adviser

The Government’s Chief Scientific Adviser, Sir Mark Walport, has published his first annual report, “Innovation: managing risk, not avoiding it”.

It’s not a light read but it is an important one. It comes with a companion volume, “Innovation: Managing Risk, Not Avoiding It. Evidence and Case Studies”.

This is more accessible, not least for having some infographics. One at the opening of Section 1 caught my eye. Below is a screengrab of the pdf document open on my laptop.

Internet-use-Government-CSA-companion-volume

It asserts that twenty years ago (presumably toward the end of 1994, that is) there were fewer than 3 million people with internet access but that the figure today is nearly 2.5 billion.

Presumably the rows of people visualise this great increase. But how?

There are 3 shaded figures so they could be representing 3 million people with internet access – each figure being 1 million people. But what are the other figures (337 of them)?

This kind of visualisation is good for proportions. So if the three shaded figures are 3 million, with world population being about 5.5 billion in 1994 we should expect to see 5,500 other figures (because 5.5 billion is 5,500 million). Clearly there aren’t that many figures, so I’m not sure that this graphic is particularly helpful.

But there’s something else…

Were there really fewer than 3 million people with internet access in 1994?

I know I was one and I don’t think I was among the earlier adopters. The figure is repeated in the document’s text but no source is given (or at least not one that is given an end-note).

Of course the internet and the web have made it easy to look for sources with which to check facts.

There is a website, internetstatslive.com, that says it is tracking internet user numbers in real time. It also includes figures for years past.

Their figure for 1994 is 25,454,590. Now that does seem remarkably precise. But at about 25 million it’s a lot, lot more than 3 million. Even for the year before they have a figure of about 14 million.

Their source is the International Telecommunications Union which describes itself as “the United Nations specialized agency for information and communication technologies”.

Rather helpfully the World Bank has data on its website on internet user numbers expressed per 100 people. In 1994 that says 4.9 out of 100 people in the United States were internet users. The US had a population of around 260 million in 1994. This suggests the number of internet users in that country alone was around 14 million. I haven’t checked but presumably the sum of the other countries’ internet users would bring that close to the 25 million put forward by internetstatslive.

All this suggests that there were a lot more than 3 million internet users 20 years ago. So where has that 3 million figure come from?

It could be a matter of definitions, or uncertainties in estimates for years gone by, a misunderstanding on my part, or a mistake on theirs?

I’m making contact with the Government Chief Scientific Adviser to find out – using the internet to both email and tweet him, of course!

In the meantime, as I’ve blogged elsewhere – the lesson here is to give a reference for any statistics you quote.

UPDATE (21 November 2014): the Chief Scientific Adviser’s office has kindly replied. The source is apparently work done in 2012 by the author of the relevant chapter, Professor Ian Goldin. The CSA’s office say they recognise the reference to 20 years ago should really be 22 years ago.

They point out that in 1994 there was uncertainty about numbers of people with internet access citing an article in the New York Times. Personally, I feel that the uncertainty should have been recognised in the document; and even more so a source given.

I have asked if they intend to issue an amended version of the document. The great thing about the internet age is that amending documents is not the expensive business it used to be when everything was printed.

And, again, I am grateful to the CSA’s office for looking into this. It’s a great example of engagement between government institution and public.