Five tips when working with statistics

I recently gave a talk about communicating with statistics to the Fifth Estate Group of the Chartered Institute of Public Relations.

Work with numbers, data and statistics well and you might well enhance your reputation. Use them badly or get them wrong and you might lose the confidence and trust of others.

Afterwards the organiser asked me if I could put together a list of my top five tips. It’s hard to rank any tips but I’ve given it a go.

1) Sources and references – check, check and check again

You’ve found a stat that trumps everything. It comes from a reputable source. But what’s their source?

But ‘facts’ can be wrong. The thing is, we all make mistakes. It doesn’t have to be misuse or abuse.

Take how much of our electricity is ‘wasted’ by leaving things on standby.

Times’ journalist, Matthew Parris, looked into this in 2006. His article went on to be the runner-up in the Royal Statistical Society’s inaugural awards for statistical excellence in journalism in 2007. It’s a good example of how not to take a figure at its face value.

And it works the other way. If you’re making an argument based on data give a reference. Not only does this help your audience – possibly saving a lot of time coaxing search engines to yield the information – but it might make you think again about what you’re saying.

I found this with one of the organisations I belong to. A claim was made to back up a policy suggestion. No reference was given. I looked into it and found official figures which strongly suggested the claim was wrong, brought it to their attention and a correction has been made.

2) What is meant? Make your definitions clear.

Will your audience misinterpret what you mean? Have you misinterpreted someone else?

Here’s one I use in my workshops. I grew up on an island 700 miles from the coast of Spain. Where did I grow up?

It’s the Isle of Wight.

But I find that guesses are usually somewhere in the mid-Atlantic.

Why? Maybe, it’s because we seem to have a natural tendency to think horizontally rather than vertically. After all the sun rises in the east and sets in the west. Predators are likely to be on the ground rather than high in the sky.

And the image of Spain implies something more exotic than just off the coast of England.

I make misinterpretations quite a bit, though hopefully I’m getting better at taking a moment to have a second think or, if possible, to ask a question.

A good example is when fact-checking organisation, Full Fact, looked into a claim made during Prime Minister’s Questions on 29 October 2014.

Sir Bob Russell, the MP for Colchester, stated: “More people live in Essex than voted yes in the Scottish referendum.”

Full Fact tweeted:


MPs don’t like to get things wrong, especially about the area that they represent. So I wondered how Sir Bob could have made a mistake.

Here’s a check. The number of people who voted ‘yes’ in the Scottish independence referendum is 1,617,989 according to the official return.

Full Fact say the population of Essex is around 1,390,000. Their source backs this up. A figure for the population of Essex is 1,393,587. That’s less than the 1.6 million who voted ‘yes’.

But that’s just one figure for Essex. The same source has a ‘Total Essex’ figure of 1,724,950 just a little further down. And that is more than the number of ‘yes’ voters.

What’s going on? Both can’t be right, surely? It may be that ‘the only way is Essex’ but what’s the difference between Essex and Total Essex?

Well the first figure is for the administrative Essex that excludes the unitary authorities of Thurrock and Southend-on-Sea. The second is for the ceremonial county of Essex which does include them.

Bob Russell is known to be a redoubtable champion of Essex. It seems very likely he was referring to the whole of Essex in its traditional, ceremonial sense.

I pointed this out to Full Fact and they’ve updated their site with an explanatory article.

So Bob is right and Full Fact are sort-of not-wrong.

3) Describe percentages in words (and always give the denominator

In all the talks and workshops I’ve done, confusion around using and understanding percentages comes up again and again.

How do you feel about 67%? What about two-thirds?

Allowing for rounding, they are the same thing. I find that most people find words relating to simple fractions – half, third, quarter, fifth – easier to understand than numbers and “%” signs.

I’ve blogged about this and started to put together a suggested list of how to describe all the whole-number percentages from 0 % to 100 %. Let me know what you think.

But, however you report them, always give the size of the group you’re talking about. In maths terms, give the denominator.

If I told you that two-thirds – 67% – of the people I interviewed about a product said they liked it, how informative is that?

I could have spoken to two people or two thousand. But which are you more likely to be convinced by? So, always give the group size, ie the denominator.

4) Know your averages – and avoid using the word ‘average’ unless absolutely necessary; and don’t chase them!

There is more than one way to calculate an average. Most of the time it is the (arithmetic) mean that is being used – add up a list of numbers and divide by the number in the list.

But there is also the median. Line the numbers up and find the one in the middle.

And the mode. Group the numbers. The single biggest group – if there is one – is the mode.

As well as the arithmetic mean there are the geometric mean and the harmonic mean … the list goes on.

What we’re trying to find is a number that summarises the list; in a sense, what is typical or expected or most likely.

Say you’re in a pub with colleagues. You all earn roughly the same amount. Bill Gates walks in. If you took the mean income then you’re all – on average – very rich. But if you took the median, not much has changed. The median income will still be one of you on the less-than-stratospheric incomes. (Do get Bill to buy the next round!)

So, words like ‘typical‘ or ‘likely‘ may be better substitutes for ‘average’.

Being below average is seen as not so good – especially in education or healthcare.

It’s tempting to decide policy by targeting those below average. But if the ‘average’ you’re talking about is a mathematical one then, unless everything is equal, at least one has to be below average. Chasing averages can be a never-ending race.

So, what might you do? I’d say decide (beforehand if possible) what is acceptable or adequate. It’s possible for everything to be at least equal to this standard. You can then target based on what is below the threshold.

A simple example is licensing drivers. There is quite a range of driving standard. So, some will be ‘above average’ and others below, by some measure. But all were at least once above a threshold of acceptability – they passed the driving test.

(Aside: it is still possible to use the median to define something to be acceptable. Consider, one of the definitions of poverty, the approach where three-fifths (60%) of median household income is the threshold. The Poverty and Social Exclusion research project has a good discussion of this and its relation to the median and mean.)

5) Be certain about uncertainty …

… And particularly if your numbers come from sample surveys. Yes, half (50%) of 1,000 people responding might have a particular view. But for the group that you sampled, any statement about the strength of their view will need to refer to the margin of error.

We see this a lot in political opinion polling, where the margin of error is usually given as ‘plus or minus three percentage points’. (There is more to this but I’ll spare you the details!)

During the Scottish referendum, if a poll said 55% were voting ‘no’ and 45% were voting yes then it is likely that the ‘true’ values are: yes, between 52% and 58%; no, between 42% and 48%. On this basis yes are still likely to be ahead.

But what if adding 3 to the lower figure means you get a number bigger than subtracting 3 from the higher number? American pollsters are particularly good at reporting these ‘statistical dead heats’when the numbers are too close to be really certain who is ahead.

If you are interested in putting these tips into practice do get in touch.

A nod to Autumn

Mist over fields
Looking out of my study window: autumn mist shrouds the fields – 23 September 2014

Autumn is here.

But how do we know?

Google has a new doodle so it must be so.

My Oxford English Dictionary (OED) says autumn is “the season after summer and before winter”.

Hmm… okay. And summer is the season after spring and before autumn, and winter the season after autumn and before spring?

Well, yes! Thankfully, the OED has a little more to each entry, keeping me from falling into a definitional whirlpool from which I might never escape.

Summer, for the OED, is “the season after spring and before autumn, when the weather is warmest” and winter is “the coldest season of the year, after autumn and before spring”. (For completeness, they define spring as “the season after winter and before summer”.)

So, for the OED it’s all about temperature. To be fair, this is from the compact OED. The maxi version may go further.

If it’s about temperatures, who could know better than the weather people (aka meteorologists and not to be confused with The Weather Girls whose climate was altogether more complicated than most). There is a useful blog entry by the UK Met Office (they drop the ‘eorological’ and I don’t blame them, it’s quite a mouthful) which explains that:

In meteorological terms, it’s fairly simple – each season is a three month period. So, Summer is June, July and August; Autumn is September, October and November, and so on.

Of course, this is fairly arbitrary, but provides a consistent basis for the Met Office, as the holder of the UK’s national weather and climate records, to calculate long term averages and provide seasonal climate summaries from year to year.

Their definition is for statistical record-keeping and comparison. So, for the Met Office, it’s been autumn for over three weeks.

So why are we, and Google, only just getting round to it?

The Met Office blog helps us out again . It’s all down to defining the periods of seasons in terms of astronomical events, specifically equinoxes and solstices. Last night, British Summer Time, was the equinox at the start of autumn, unsurprisingly called the autumn equinox.

This astronomically inspired definition makes sense. According to Wikipedia: “an equinox occurs when the plane of Earth’s equator passes through the center [US spelling!] of the Sun.” This only happens twice a year because the Earth’s axis is tilted relative to the plane of its orbit around the Sun. This leads to different lengths of day as the year progresses and, in turn, to differences in how much warming we get from the Sun. And this underpins weather and climate, and thus the seasons.

But for most of us the seasons are marked by what we experience. Two weeks ago, Monty Don, presenter of the BBC’s Gardeners’ World said autumn seemed to have been “two to three weeks ahead of time”. For him, and countless other gardeners, the seasons are marked by what is happening in nature.

Back to the Met Office blog and another new word for me: “phenology – the process of noting the signs of change in plant and animal behaviour”. Looking out of my study window, there is a lush, green landscape shrouded in mist. In the hedgerow over the way I can see blackberries, rosehips and haws – plenty of ‘wild’ fruit for all.

Mist. Fruit. This feels very phenological to me.

And so to a final and poetic definition. (Poetry often brings out meaning more powerfully than mere prosaic definition. I cover this in my workshop on what poetry can teach us about writing better, more engaging, and more influential prose for reporting, promotional and marketing purposes.)

So, the last words on autumn – and, for me, possibly the most satisfying definition – come from John Keats’ ode “To Autumn”:

Season of mists and mellow fruitfulness
Close bosom-friend of the maturing sun