Swallows, seasons and statistics

This morning I woke up thinking of summer. Or more specifically, of swallows and summer, and statistics.

OK, maybe I should get out more. But it got me thinking about just how much we are innate statisticians. We make observations and, over time, develop guidance for our lives based on those observations.

In this case, it is the cautionary note that one swallow doesn’t make a summer. Apparently, the saying has its origin in Aristotle’s Nichomachean EthicsAristotle actually refers to spring rather than summer. Presumably this is because swallows found the ancient Grecian spring balmy enough for their return, but Britannia was too frosty for them till summer. (Pytheas of Massilia found it frosty when he explored north-western Europe around 325BCE, not long before Aristotle’s death. Whether or not Pytheas saw any swallows, I don’t know.)

Anyway… back to statistics and our modern saying: one swallow doesn’t make a summer. One way of putting this is to say that the probability that it is summer given that we have seen a swallow is not 100%.

But swallows are migratory birds. Their return to Britain is associated with the approach of summer. So, while we may be cautious, we can also be hopeful. That is, our degree of belief that summer is on its way is favourably affected by our seeing a swallow for the first time in the year. Updating degrees of prior belief based on data subsequently observed is at the heart of Bayesian inference. I’ll steer clear of that area for this posting though!

Let’s imagine an ornithologist has been imprisoned in a cave. A boulder blocks the entrance so he cannot see outside. It has been so long that the ornithologist has lost track of the seasons. It’s a glam cave though, with heating, lighting, food supplies etc.

He is told that he is going to be set free. He hopes it will be summer when this happens as that is his favourite season because he gets to see his favourite bird – the swallow – every day. If it’s not summer he remembers that in spring and autumn he would see swallows one day in every three, and in winter he would never see them.

To while away the time till his release the ornithologist decides to muse on the possibilities of what will greet him. Now, in the ornithologist’s world, all seasons have 90 days (okay, it’s cos it makes the maths simpler). He summarises what he recalls in the following table:

Winter Spring Summer Autumn
Days that swallows seen 0 30 90 30
Days that swallows not seen  90 60 0 60

Will it be summer on his release? Will he see a swallow?

Till he is released all he knows is that he has a one in four chance (25 per cent probability) of it being summer. He works this out from 90 days divided by 360 days – remember in his world all four seasons have 90 days.

As there are 30+90+30=150 days out of his 360 day year that he would see a swallow the ornithologist works out that he has a five in twelve chance of seeing a swallow on his release, which is about 42 per cent (from 150 divided by 360).

He feels a bit gloomy. The probabilities seem against him for his two favourite things. However, he notes that a lot of this rests on whether or not it is winter, when he would never see a swallow.

He realises that if the first thing he observes is that it is summer, then he knows that he has a 100% probability of seeing a swallow.

However, if the first thing he sees is a swallow, then he can work out the probability of it turning out to be summer. He knows that swallows are seen on 150 days a year. For 90 of these days it is summer. So the probability it is summer given he first sees a swallow is 90 divided by 150 or 60 per cent.

Clearly, these two probabilities are different, 60% that it is summer given the first thing he observes is a swallow; 100% he’ll see a swallow given the first thing he observes is it’s summer.

Each of these is a conditional probability – the probability of a thing you don’t know about being the case given you have observed another thing. They are usually different but are mathematically related through Bayes’ theorem.

For now we’ll let our imaginary ornithologist free. I’ll come back to Bayes’ theorem and conditional probabilities in another posting. I will, though, finish with a question.

Rather than ornithologists, swallows and seasons, let’s think about criminal justice.

If you were the defendant and you know that you are innocent how would you feel if the prosecution persuades the jury that the probability of your innocence given the evidence equals the probability of finding the evidence given that you are innocent?


Rabbits, luck and stats

What was the first word you said today?

Mine was “rabbit”.

Ever since I was a kid, I’ve tried to make sure “rabbit” was the first thing I said on the first day of each month. I’m not alone in this; many others say the same, or something similar such as “white rabbits” or “rabbit, rabbit, rabbit”. It’s supposed to bring good luck for the rest of the month.

It’s a superstition, of course, but an old one. According to Wikipedia its origin is unknown, though there is reference to it being said by children in 1909 in the journal Notes and Queries.

So, why has this superstition stood the test of time?

Nowadays, like me, most people are probably doing so out of tradition or habit. But, presuming things don’t get established for no reason at all, how did evidence arise for the power of “rabbit” uttering?

From a stats point of view, there are a few things to consider.

Confirmation bias. We tend to give more weight to information that backs up our expectations. We’ve said “rabbit” so we expect luck. So, when the ‘lucky’ event happens it ‘proves’ the power of what we said! And, if it doesn’t? Well, confirmation bias would tell us it is to ‘make up’ for any extra luck we thought we had in the past or might get in the future!

The number of opportunities for a ‘lucky’ event to happen. Few people have lives in which nothing good seems to happen for weeks on end. So, with at least 28 days each month, we’ve got plenty of chances for one of the days to have the lucky event to make saying “rabbit” appear to have been worth it.

And, of course…

Correlation does not necessarily imply causation. Just because two things are associated does not mean one has caused the other – though it may have done. We need to think about the plausibility of there being a causal link. Is it really possible that I have made October 2013 a lucky month for me because I said “rabbit” a little after midnight?

Of course, if my ticket wins the EuroMillions lottery jackpot this month … !