The Null Hypothesis

Humility Is Essential

Suppose you have a favorite hypothesis about the world; call it H1. You have used H1 in a number of different contexts to explain interesting observations. You would like to show that H1 is scientifically correct.

Sadly, this is not a meaningful goal. A widely accepted position in the philosophy of science is that observations cannot *prove* or *verify* or *confirm* a hypothesis. What they can do reliably is *refute* a hypothesis (by being inconsistent with a prediction of that hypothesis).

This sets up an important condition for applying science to evaluate a hypothesis, especially one like H1 that you are very attached to. Can you imagine an observation that would convince you that H1 is not correct? Do you have the humility to accept, given enough of the right kind of evidence, that your sincere and enthusiastic belief in H1 was wrong?

If not, you are not doing science. What you are doing may be useful, but it is not science. [Note 1]

Science Compares Competing Hypotheses

Observational evidence can show that a universal hypothesis is false, but not that it is true. [Note 2]

Therefore, scientific experiments do not test a single hypothesis. They compare two competing hypotheses. A well-designed experiment comparing two hypotheses, call them H1 and H0, identifies a situation S, in which the two hypotheses make different predictions. Say:

S & H1 → x
S & H0 → y
where x and y are different and incompatible with each other.  At most one of H1 and H0 will be consistent with the outcome of the experiment.

If the experiment results in observation x, then H0 is refuted, and H1 survives. If the observation is y, then H1 is refuted, and H0 survives. If the result is z, which is incompatible with both x and y, then both H1 and H0 are refuted, and you need to look for a new hypothesis, say H2.

As a classic example, Aristotle (4th century BC) said that objects fall at rates proportional to their weights. Galileo (late 16th century) claimed that similar objects of different weights fall at the same rate. Galileo may have conducted an experiment by dropping balls of different weights from the Tower of Pisa to test these competing hypotheses. (Others did so earlier, apparently, but got less attention for their efforts.) Dropped at the same time, the balls land at the same time, disproving Aristotle’s theory.

As a more recent example, in the early 20th century, Einstein's new theory of general relativity made a startling prediction about how light would bend under strong gravity, contradicting Newton's theory. Physicists figured out that this could be tested by observing the apparent position of certain stars during a total solar eclipse, since Einstein and Newton made different predictions. An expedition was sent to Africa in 1919 to make the observation.

The observation was consistent with the prediction of Einstein's theory. Newton's theory was refuted. Note that this experiment did not show that Einstein's theory was correct, but that it was consistent with the observation. Maybe future observations will refute Einstein's theory, like this one refuted Newton's theory, but it hasn't happened yet.

Theories that we have a lot of confidence in — like relativity, evolution, and quantum mechanics — are survivors of many challenging comparisons. [Note 3]

Where Do Competing Hypotheses Come From?

It's not often that there are two well-developed competing theories to be compared. Scientists often focus on a single theory, which is their favorite. But if a scientific experiment necessarily compares two theories, where do they get the second one? Simply making up a foolish theory and then refuting it doesn't provide much confidence.

The Null Hypothesis Trick

Many disciplines including biology, medicine, psychology, sociology, and economics, study complex systems that behave in certain ways that are reasonably predictable. One way to study the system is to make an intervention (A) and look for an observable difference (B).

Suppose your favorite hypothesis (H1) is “A ⇒ B”, meaning that intervention (A) causes a certain observable difference (B). The null hypothesis (H0) says, “No, it doesn’t. Any difference you observe in B is just random error.”

Observations are never perfect. They are always slightly distorted by many different sources of error, sometimes from the physical situation, or the experimental setup, or measurement error, or human perceptual error, or even from human clerical error. So it’s always technically possible that an observation seems to clearly show the expected observable difference (B), but it is just the result of random error. How can we possibly deal with this?

We use the mathematical theories of probability and statistics to reason about random errors. We collect a lot of observations, and use them to determine how likely various types and magnitudes of random errors are. Then we look at the observations that seem to support the difference (B), and ask, “If the null hypothesis (H0) is true, what is the probability p of getting those observations due to random error?”

Many disciplines have set a probability threshold, such as p < 0.05, and sometimes far lower, even one in a million (p < 0.000001). If the likelihood of the observations are less than that threshold, we essentially say, “Yes, it’s possible that the observations are just random error, but the likelihood of that is less than 0.05 (or whatever the threshold is). It’s much more likely (above 0.95) that the observations show a real difference, so we reject the null hypothesis.”

Summary

The key take-away here is that a scientific experiment compares the predictions of two competing hypotheses. No matter how much a scientist wants their favorite hypothesis (say, H1) to be true, they must be prepared to accept enough evidence that it is false.

Ideally, you start with two competing hypotheses whose predictions you want to compare. But in certain cases, you can generate a plausible second hypothesis, the null hypothesis (H0), which predicts that all observed variation is simply random. The null hypothesis is rejected by showing that the observations are sufficiently unlikely, if the null hypothesis is true.

Notes

[Note 1] Astrology is generally agreed not to be a science, in this sense, because there is no agreement on what sort of observations would falsify an astrological prediction. Some time ago, I wrote a relevant essay, titled “Why do we believe in electrons, and not in fairies?”

[Note 2] Really, this applies to universal statements like “All swans are white”. Observing any number of white swans doesn’t prove that this is true. But observing a black swan proves that this statement is false. An existential statement like, “There is a black swan”, can be proved by a single observation of a black swan.

[Note 3] Sometimes someone gets all excited because they believe they have observed a clear contradiction to a long-accepted theory like relativity or evolution. However, the “clear contradictory observation” may have other explanations, including errors of many different kinds. A useful rule is “Extraordinary claims require extraordinary evidence.” If you have observed something that you believe refutes a long-accepted theory, you can expect years of effort to convince the scientific community that your observation is reliable, reproducible, and requires a change in the prevailing theory. (Einstein’s original prediction (1911) about light deflection was wrong, but he corrected it in 1915, and that matched the observations in 1919.)


Benjamin Kuipers, 13 July 2025.
BJK