August 18, 2017

Surprise and belief update

In a previous post, I started discussing a paper [1] on the (un)surprising nature of a long streak of heads in a coin toss. My conclusion was that the surprise is not intrinsic to the particular sequence of throws, but rather residing in its relation with our prior information. I will detail this reasoning here, before returning to the paper itself.

Let us accept as prior information the null hypothesis \(H_0\) "the coin is unbiased". The conditional probabilities of throwing heads or tails are then equal: \(P(H|H_0) = P(T|H_0)=1/2\). With the same prior, the probability of any sequence \(S_k\) of 92 throws is the same: \(P(S_k|H_0) = 2^{-92}\), where \(k\) ranges from \(1\) to \(2^{92}\).

Assume now that the sequence we actually get consists of all heads: \(S_1 = \lbrace HH \ldots H\rbrace\) What is the (posterior) probability of getting heads on the 93rd throw? Let us consider two options:
  1.  We can hold steadfast to our initial estimate of lack of bias \(P(H|H_0) = 1/2\).
  2. We can update our "belief value" and say something like: "although my initial assessment was that the coin is unbiased [and the process of throwing is really random and I'm not hallucinating etc.], having thrown 92 heads in a row is good evidence to the contrary and on next throw I'll probably also get heads". Thus, \(P(H|H_0 S_1) > 1/2\) and in fact much closer to 1. How close exactly depends on the strength of our initial confidence in \(H_0\), but I will not do the calculation here (I sketched it in the previous post).
I would say that most rational persons would choose option 2 and abandon \(H_0\); holding on to it (choice 1) would require an extremely strong confidence in our initial assessment.

Note that for a sequence \(S_2\) consisting of 46 heads and 46 tails (in any order) the distinction above is moot, since \(P(H|H_0 S_2) =P(H|H_0) = 1/2\). The distinction between \(S_1\) and \(S_2\) is not their prior probability [2] but the way they challenge (and update) our belief.

Back to Martin Smith's paper now: what makes him adopt the first choice? I think the most revealing phrase is the following:

When faced with this result, of course it is sensible to check [...] whether the coins are double-headed or weighted or anything of that kind. Having observed a run of 92 heads in a row, one should regard it as very likely that the coins are double-headed or weighted. But, once these realistic possibilities have been ruled out, and we know they don’t obtain, any remaining urge to find some explanation (no matter how farfetched) becomes self-defeating.[italics in the text]

As I understand it, he implicitly distinguishes between two kinds of propositions: observations (such as \(S_1\)) and checks (which are "of the nature of" \(H_0\), although they can occur after the fact) and bestows upon the second category a protected status: these types of conclusions, e.g. "the coin is unbiased" survive even in the face of overwhelming evidence to the contrary (at least when it results from observation.)

There is however no basis for this distinction: checks are also empirical findings: by visual inspection, I conclude that the coin does indeed exhibit two different faces; by more elaborate experiments I deduce that the center of mass is indeed in the geometrical center of the coin, within experimental precision; by some unspecified method I conclude that the "throwing process" is indeed random; by pinching myself I decide that I am not dreaming etc. At this point, however, the common sense remark is: "if you want to check the coin against bias, the easiest way would be to throw it about 92 times and count the heads".

If we estimate the probability of the observations (given our prior belief) we should also update our belief in light of the observations. Recognizing this symmetry gives quantitative meaning to the "surprise" element, which is higher for some sequences than for others.



1. Martin Smith, Why throwing 92 heads in a row is not surprising, Philosophers' Imprint (forthcoming) 2017.
2. We only considered here the probabilities before and after the 92 throws. One might also update one's belief after each individual throw, so that \(P(H)\) would increase gradually.

August 17, 2017

How surprising is it to throw 92 heads in a row?

Martin Smith (from the University of Edinburgh) discusses the relation between surprise and belief [1].

As a striking introduction, he claims that, in a coin toss, throwing a large number of heads in a row is not surprising. He deploys a version of the sorites argument insofar as "surprise" is concerned: if the individual events \(e_k\) of getting heads on the \(k\)-th throw are unsurprising, then so is their conjunction.

This particular example is easily dealt with by noting the importance of prior information: the sequence of heads is surprising because we know that "The coins don’t appear to be double-headed or weighted or anything like that – just ordinary coins", as Smith insists in his first paragraph [2]. I know nothing about the theories of Shackle and Spohn, but I doubt his analysis would survive adding event \(e_0\): "We checked that all coins were unbiased". On the other hand, I believe a Bayesian treatment similar to that given by Jaynes in §5.2 of [3] would be quite satisfactory (see also Chapter 4 for a general presentation and §9.4 for an example dealing specifically with coin tossing and bias.) Once again, the information brought by \(e_1\) through \(e_{92}\) contradicts \(e_0\), this is why it is surprising (or informative), not because the events would have an intrinsic "surprising" character.

The author's insistance on the equivalence of the various results: "[E]ach one of these sequences is just as unlikely as 92 heads in a row." glosses over the fact that each sequence is more or less compatible with the fairness assumption \(e_0\). Let us introduce the probability \(p\) of throwing heads. Then, \(e_0\) amounts to saying that the (prior) probability distribution \(f(p)\) of parameter \(p\) is peaked in \(0.5\) and has a certain width \(w\). The higher our confidence in coin fairness, the lower \(w\).

It is only in the case of absolute certainty \(w \to 0\) (\(f(p)\) is a Dirac delta) that the results are equivalent. As soon as \(w\) exceeds a ridiculously small value, the evidence brought by the 92 heads dramatically shifts the peak of the (posterior) probability distribution \(f'(p)\) close to 1. A sequence with 46 heads, although exactly as improbable, has no such effect (at most, it may lead to a modest decrease in \(w\).)

The surprise is not related to the probability of a particular sequence, but to the extent it challenges our belief; I believe this statement to be rather trivial (or at least uncontroversial) and indeed Smith reaches pretty much the same conclusion in the last —and most interesting— section of the paper (to be discussed in a future post) although he cannot see the element of surprise in the coin toss experiment.

1. Martin Smith, Why throwing 92 heads in a row is not surprising, Philosophers' Imprint (forthcoming) 2017.
2. Had we known that all coins were double-headed, throwing only heads would not only be unsurprising, it would be certain.
3. E. T. Jaynes and G. L. Bretthorst, Probability theory the logic of science, Cambridge University Press 2003.