London’s 17 passengers per bus – why averages don’t always tell you what you most need to know

Undercover economist, Tim Harford, has this in the Financial Times today (29 November 2014):

“… the average London bus has only 17 people riding on it.”

(You may have to log in but the FT does give limited access at no cost.)

A report to the London Assembly’s Transport Committee in October 2013 confirms the figure on page 13 – though the reference is to a committee transcript from December 2012.

Plenty of room for all, you might think. That is if you’re not a commuter waiting in the rain as crowded bus after crowded bus goes by.

As Harford points out – and you’ve probably guessed – it’s an issue of averages. This comes about from having a few very full buses and lots with much lower or no occupancy.

Usually this sort of ‘odd’ result comes from taking the mean – add all the numbers up and divide by how many you’ve got. The lots of small numbers outweigh the few big ones.

But it can happen with the median – the number in the middle when you line them up in order.

And even the mode might not be useful – the single most frequent number.

How might this be? Say a bus records the following occupancy over 21 trips:

45, 39, 17, 10, 7, 4, 8, 7, 2, 0, 7, 7, 9, 1, 11, 8, 7, 15, 35, 37, 4

The mean occupancy is a little over 13 passengers (13.3). The median is 8. The mode is 7.

That all sounds comfortable. Of the 21 trips, 17 have occupancy of 17 passengers or less. But if you’re on one of the four trips when the bus is full or nearly so, that is what matters. You might decide to give up on bus travel, fed up with day after day of uncomfortable rides. If others do the same it’s easy to see how a bus company has to cut services because of a drop in earnings.

So, averages can be useful. But they might not tell the story that you’re most interested in.

Five tips when working with statistics

I recently gave a talk about communicating with statistics to the Fifth Estate Group of the Chartered Institute of Public Relations.

Work with numbers, data and statistics well and you might well enhance your reputation. Use them badly or get them wrong and you might lose the confidence and trust of others.

Afterwards the organiser asked me if I could put together a list of my top five tips. It’s hard to rank any tips but I’ve given it a go.

1) Sources and references – check, check and check again

You’ve found a stat that trumps everything. It comes from a reputable source. But what’s their source?

But ‘facts’ can be wrong. The thing is, we all make mistakes. It doesn’t have to be misuse or abuse.

Take how much of our electricity is ‘wasted’ by leaving things on standby.

Times’ journalist, Matthew Parris, looked into this in 2006. His article went on to be the runner-up in the Royal Statistical Society’s inaugural awards for statistical excellence in journalism in 2007. It’s a good example of how not to take a figure at its face value.

And it works the other way. If you’re making an argument based on data give a reference. Not only does this help your audience – possibly saving a lot of time coaxing search engines to yield the information – but it might make you think again about what you’re saying.

I found this with one of the organisations I belong to. A claim was made to back up a policy suggestion. No reference was given. I looked into it and found official figures which strongly suggested the claim was wrong, brought it to their attention and a correction has been made.

2) What is meant? Make your definitions clear.

Will your audience misinterpret what you mean? Have you misinterpreted someone else?

Here’s one I use in my workshops. I grew up on an island 700 miles from the coast of Spain. Where did I grow up?

It’s the Isle of Wight.

But I find that guesses are usually somewhere in the mid-Atlantic.

Why? Maybe, it’s because we seem to have a natural tendency to think horizontally rather than vertically. After all the sun rises in the east and sets in the west. Predators are likely to be on the ground rather than high in the sky.

And the image of Spain implies something more exotic than just off the coast of England.

I make misinterpretations quite a bit, though hopefully I’m getting better at taking a moment to have a second think or, if possible, to ask a question.

A good example is when fact-checking organisation, Full Fact, looked into a claim made during Prime Minister’s Questions on 29 October 2014.

Sir Bob Russell, the MP for Colchester, stated: “More people live in Essex than voted yes in the Scottish referendum.”

Full Fact tweeted:


MPs don’t like to get things wrong, especially about the area that they represent. So I wondered how Sir Bob could have made a mistake.

Here’s a check. The number of people who voted ‘yes’ in the Scottish independence referendum is 1,617,989 according to the official return.

Full Fact say the population of Essex is around 1,390,000. Their source backs this up. A figure for the population of Essex is 1,393,587. That’s less than the 1.6 million who voted ‘yes’.

But that’s just one figure for Essex. The same source has a ‘Total Essex’ figure of 1,724,950 just a little further down. And that is more than the number of ‘yes’ voters.

What’s going on? Both can’t be right, surely? It may be that ‘the only way is Essex’ but what’s the difference between Essex and Total Essex?

Well the first figure is for the administrative Essex that excludes the unitary authorities of Thurrock and Southend-on-Sea. The second is for the ceremonial county of Essex which does include them.

Bob Russell is known to be a redoubtable champion of Essex. It seems very likely he was referring to the whole of Essex in its traditional, ceremonial sense.

I pointed this out to Full Fact and they’ve updated their site with an explanatory article.

So Bob is right and Full Fact are sort-of not-wrong.

3) Describe percentages in words (and always give the denominator

In all the talks and workshops I’ve done, confusion around using and understanding percentages comes up again and again.

How do you feel about 67%? What about two-thirds?

Allowing for rounding, they are the same thing. I find that most people find words relating to simple fractions – half, third, quarter, fifth – easier to understand than numbers and “%” signs.

I’ve blogged about this and started to put together a suggested list of how to describe all the whole-number percentages from 0 % to 100 %. Let me know what you think.

But, however you report them, always give the size of the group you’re talking about. In maths terms, give the denominator.

If I told you that two-thirds – 67% – of the people I interviewed about a product said they liked it, how informative is that?

I could have spoken to two people or two thousand. But which are you more likely to be convinced by? So, always give the group size, ie the denominator.

4) Know your averages – and avoid using the word ‘average’ unless absolutely necessary; and don’t chase them!

There is more than one way to calculate an average. Most of the time it is the (arithmetic) mean that is being used – add up a list of numbers and divide by the number in the list.

But there is also the median. Line the numbers up and find the one in the middle.

And the mode. Group the numbers. The single biggest group – if there is one – is the mode.

As well as the arithmetic mean there are the geometric mean and the harmonic mean … the list goes on.

What we’re trying to find is a number that summarises the list; in a sense, what is typical or expected or most likely.

Say you’re in a pub with colleagues. You all earn roughly the same amount. Bill Gates walks in. If you took the mean income then you’re all – on average – very rich. But if you took the median, not much has changed. The median income will still be one of you on the less-than-stratospheric incomes. (Do get Bill to buy the next round!)

So, words like ‘typical‘ or ‘likely‘ may be better substitutes for ‘average’.

Being below average is seen as not so good – especially in education or healthcare.

It’s tempting to decide policy by targeting those below average. But if the ‘average’ you’re talking about is a mathematical one then, unless everything is equal, at least one has to be below average. Chasing averages can be a never-ending race.

So, what might you do? I’d say decide (beforehand if possible) what is acceptable or adequate. It’s possible for everything to be at least equal to this standard. You can then target based on what is below the threshold.

A simple example is licensing drivers. There is quite a range of driving standard. So, some will be ‘above average’ and others below, by some measure. But all were at least once above a threshold of acceptability – they passed the driving test.

(Aside: it is still possible to use the median to define something to be acceptable. Consider, one of the definitions of poverty, the approach where three-fifths (60%) of median household income is the threshold. The Poverty and Social Exclusion research project has a good discussion of this and its relation to the median and mean.)

5) Be certain about uncertainty …

… And particularly if your numbers come from sample surveys. Yes, half (50%) of 1,000 people responding might have a particular view. But for the group that you sampled, any statement about the strength of their view will need to refer to the margin of error.

We see this a lot in political opinion polling, where the margin of error is usually given as ‘plus or minus three percentage points’. (There is more to this but I’ll spare you the details!)

During the Scottish referendum, if a poll said 55% were voting ‘no’ and 45% were voting yes then it is likely that the ‘true’ values are: yes, between 52% and 58%; no, between 42% and 48%. On this basis yes are still likely to be ahead.

But what if adding 3 to the lower figure means you get a number bigger than subtracting 3 from the higher number? American pollsters are particularly good at reporting these ‘statistical dead heats’when the numbers are too close to be really certain who is ahead.

If you are interested in putting these tips into practice do get in touch.

The inadequacy of averages in setting targets

ONS tweet - Londoners well-being has improved but still remains below the UK average  #wellbeing
Tweet from the Office for National Statistics on the well-being of Londoners, 24 September 2014

Yesterday, 24 September 2014, the Office for National Statistics (ONS) released its latest findings on the UK population’s personal well-being, popularly known as the ‘Happiness Index’.

They tweeted quite a bit about the results. The tweet that London’s well-being was lower than the UK average caught my eye. It reminded me of the problems that policy makers face if they interpret being below average as being inadequate.

But first, what has ONS measured?

Well, the figures come from data collected between April 2013 and March 2014 for the Annual Population Survey. Responses are from 165,000 people, which is an impressively large number.

ONS asked four questions:

  • Overall, how satisfied are you with your life nowadays?
  • Overall, to what extent do you feel the things you do in your life are worthwhile?
  • Overall, how happy did you feel yesterday?
  • Overall, how anxious did you feel yesterday?

All are answered using a 0 to 10 scale where 0 is ‘not at all’ and 10 is ‘completely’.

The breakdown by English region shows that, compared to the UK average, only the London region has figures that are ‘worse’ (lower for the first three questions, higher for the fourth). This is probably why ONS picked out London to tweet about.

The figures were:

London (mean) UK (mean) average Difference
Life satisfaction 7.37 7.51 -0.14
Worthwhile 7.64 7.74 -0.1
Happiness 7.32 7.38 -0.06
Anxiety 3.18 2.32 0.86

Although small, the differences are statistically significant according to ONS. Surely something must be done?

Quite possibly. But quite possibly not.

The thing about averages is that it’s impossible for everything to be above average.

What we need to know is if London’s scores aren’t meeting some acceptable threshold value. If not then action is needed. What is an acceptable score? Well, that is for the policy makers to decide. But if they base it on an average then they pretty much set themselves up for failure somewhere.