Five tips when working with statistics

I recently gave a talk about communicating with statistics to the Fifth Estate Group of the Chartered Institute of Public Relations.

Work with numbers, data and statistics well and you might well enhance your reputation. Use them badly or get them wrong and you might lose the confidence and trust of others.

Afterwards the organiser asked me if I could put together a list of my top five tips. It’s hard to rank any tips but I’ve given it a go.

1) Sources and references – check, check and check again

You’ve found a stat that trumps everything. It comes from a reputable source. But what’s their source?

But ‘facts’ can be wrong. The thing is, we all make mistakes. It doesn’t have to be misuse or abuse.

Take how much of our electricity is ‘wasted’ by leaving things on standby.

Times’ journalist, Matthew Parris, looked into this in 2006. His article went on to be the runner-up in the Royal Statistical Society’s inaugural awards for statistical excellence in journalism in 2007. It’s a good example of how not to take a figure at its face value.

And it works the other way. If you’re making an argument based on data give a reference. Not only does this help your audience – possibly saving a lot of time coaxing search engines to yield the information – but it might make you think again about what you’re saying.

I found this with one of the organisations I belong to. A claim was made to back up a policy suggestion. No reference was given. I looked into it and found official figures which strongly suggested the claim was wrong, brought it to their attention and a correction has been made.

2) What is meant? Make your definitions clear.

Will your audience misinterpret what you mean? Have you misinterpreted someone else?

Here’s one I use in my workshops. I grew up on an island 700 miles from the coast of Spain. Where did I grow up?

It’s the Isle of Wight.

But I find that guesses are usually somewhere in the mid-Atlantic.

Why? Maybe, it’s because we seem to have a natural tendency to think horizontally rather than vertically. After all the sun rises in the east and sets in the west. Predators are likely to be on the ground rather than high in the sky.

And the image of Spain implies something more exotic than just off the coast of England.

I make misinterpretations quite a bit, though hopefully I’m getting better at taking a moment to have a second think or, if possible, to ask a question.

A good example is when fact-checking organisation, Full Fact, looked into a claim made during Prime Minister’s Questions on 29 October 2014.

Sir Bob Russell, the MP for Colchester, stated: “More people live in Essex than voted yes in the Scottish referendum.”

Full Fact tweeted:

Full-Fact-tweet-screengrab-Essex-PMQs-29-October-2014

MPs don’t like to get things wrong, especially about the area that they represent. So I wondered how Sir Bob could have made a mistake.

Here’s a check. The number of people who voted ‘yes’ in the Scottish independence referendum is 1,617,989 according to the official return.

Full Fact say the population of Essex is around 1,390,000. Their source backs this up. A figure for the population of Essex is 1,393,587. That’s less than the 1.6 million who voted ‘yes’.

But that’s just one figure for Essex. The same source has a ‘Total Essex’ figure of 1,724,950 just a little further down. And that is more than the number of ‘yes’ voters.

What’s going on? Both can’t be right, surely? It may be that ‘the only way is Essex’ but what’s the difference between Essex and Total Essex?

Well the first figure is for the administrative Essex that excludes the unitary authorities of Thurrock and Southend-on-Sea. The second is for the ceremonial county of Essex which does include them.

Bob Russell is known to be a redoubtable champion of Essex. It seems very likely he was referring to the whole of Essex in its traditional, ceremonial sense.

I pointed this out to Full Fact and they’ve updated their site with an explanatory article.

So Bob is right and Full Fact are sort-of not-wrong.

3) Describe percentages in words (and always give the denominator

In all the talks and workshops I’ve done, confusion around using and understanding percentages comes up again and again.

How do you feel about 67%? What about two-thirds?

Allowing for rounding, they are the same thing. I find that most people find words relating to simple fractions – half, third, quarter, fifth – easier to understand than numbers and “%” signs.

I’ve blogged about this and started to put together a suggested list of how to describe all the whole-number percentages from 0 % to 100 %. Let me know what you think.

But, however you report them, always give the size of the group you’re talking about. In maths terms, give the denominator.

If I told you that two-thirds – 67% – of the people I interviewed about a product said they liked it, how informative is that?

I could have spoken to two people or two thousand. But which are you more likely to be convinced by? So, always give the group size, ie the denominator.

4) Know your averages – and avoid using the word ‘average’ unless absolutely necessary; and don’t chase them!

There is more than one way to calculate an average. Most of the time it is the (arithmetic) mean that is being used – add up a list of numbers and divide by the number in the list.

But there is also the median. Line the numbers up and find the one in the middle.

And the mode. Group the numbers. The single biggest group – if there is one – is the mode.

As well as the arithmetic mean there are the geometric mean and the harmonic mean … the list goes on.

What we’re trying to find is a number that summarises the list; in a sense, what is typical or expected or most likely.

Say you’re in a pub with colleagues. You all earn roughly the same amount. Bill Gates walks in. If you took the mean income then you’re all – on average – very rich. But if you took the median, not much has changed. The median income will still be one of you on the less-than-stratospheric incomes. (Do get Bill to buy the next round!)

So, words like ‘typical‘ or ‘likely‘ may be better substitutes for ‘average’.

Being below average is seen as not so good – especially in education or healthcare.

It’s tempting to decide policy by targeting those below average. But if the ‘average’ you’re talking about is a mathematical one then, unless everything is equal, at least one has to be below average. Chasing averages can be a never-ending race.

So, what might you do? I’d say decide (beforehand if possible) what is acceptable or adequate. It’s possible for everything to be at least equal to this standard. You can then target based on what is below the threshold.

A simple example is licensing drivers. There is quite a range of driving standard. So, some will be ‘above average’ and others below, by some measure. But all were at least once above a threshold of acceptability – they passed the driving test.

(Aside: it is still possible to use the median to define something to be acceptable. Consider, one of the definitions of poverty, the approach where three-fifths (60%) of median household income is the threshold. The Poverty and Social Exclusion research project has a good discussion of this and its relation to the median and mean.)

5) Be certain about uncertainty …

… And particularly if your numbers come from sample surveys. Yes, half (50%) of 1,000 people responding might have a particular view. But for the group that you sampled, any statement about the strength of their view will need to refer to the margin of error.

We see this a lot in political opinion polling, where the margin of error is usually given as ‘plus or minus three percentage points’. (There is more to this but I’ll spare you the details!)

During the Scottish referendum, if a poll said 55% were voting ‘no’ and 45% were voting yes then it is likely that the ‘true’ values are: yes, between 52% and 58%; no, between 42% and 48%. On this basis yes are still likely to be ahead.

But what if adding 3 to the lower figure means you get a number bigger than subtracting 3 from the higher number? American pollsters are particularly good at reporting these ‘statistical dead heats’when the numbers are too close to be really certain who is ahead.

If you are interested in putting these tips into practice do get in touch.

Advertisements

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s