I will now descend into math ("maths" if this were an English-speaking, rather than an American-speaking group. This is why the Americans won the war -- having only one math, instead of lots of them, makes it easier for our engineers to perform calculations).
Consider two random, Gaussian variables, x and y. Because they are Gaussian, each one has a mean and a standard deviation. If you take them as flight scores, then their means are what the scores "should" be, and their standard deviation is a measure of how much each score can differ from that mean.
Contestant x wins over contestant y if the expression x > y is true. (Let's not think about ties -- just please no). For x > y to be true, the expression x - y > 0 must be true.
Now define the random variable w = x - y. The mean of w can be found from m
w = m
x - m
y -- in other words, the mean of w is the difference of the means of x and y. The variance of w can be found as the
sum of the variances of x and y: v
w = v
x + v
y.
Because x and y are Gaussian, w is Gaussian. This means that w is completely defined by its mean and variance.
The probability that w > 0 is true is equal to (1 + erf(m
w / s
w))/2, where s
w is the standard deviation of w, defined as s
w2 = v
w, and erf(x) is the error function as defined in the Scilab help files (everyone defines erf differently -- it's special that way).
If the mean of x is greater than the mean of y, then pilot 'x' "should" win. So the probability of an error in placement in this case is simply* the probability that the actual, judged and calculated x is greater than the actual judged and calculated y -- in other words, the probability that w > 0.
Some example probabilities of errors are:
- nearly 0.5 when mw is nearly zero -- in other words, when it's a dead heat, who wins is a coin toss
- about 0.36 when mw/sw = 0.25
- about 0.24 when mw/sw = 0.5
- about 0.08 when mw/sw = 1
- about 0.002 when mw/sw = 2
This means that when the variation of a judges score from flight to flight is equal to the difference in scores, there's about a 1 in 12 chance that the ranking between those two people will be wrong.
Where this whole analysis really crashes into the Rocks of Reality is the fact that you're making a noisy measurement (via judging) of a quantity (how "good" the flight is), which itself is random: how good I (or Paul Walker, or Howard) flies on any given day is subject to variation not only in the air conditions and all sorts of other uncontrollable external factors, but on how well the pilot does on
that particular flight. So you can toss numbers around all day, and get a better understanding of what might be -- but even if we were all trained statisticians who graduated in the tops of our respective classes, we'd still disagree on what actually is.
* People like to sprinkle mathematical calculations with the word "simple" or "simply". I don't know if it's because we're just reveling in actually having figured the stuff out, or if we like it when friends read what we've written and say
What?!? SIMPLE?
".