"Total Information Awareness" as a diagnosis problem

by Benjamin Kuipers

Consider the aims of the 'Total Information Awareness' program as a problem in diagnosis.

Out of a large population, you want to diagnose the very few cases of a rare disease called "terrorism". Your diagnostic tests are automated data-mining methods, supervised and checked by humans. (The analogy is sending blood or tissue samples to a laboratory.)

This type of diagnostic problem, looking for a rare disease, has some very counter-intuitive properties.

Suppose the tests are highly accurate and specific:

99.9% of the time, examining a terrorist, the test says "terrorist".
99.9% of the time, examining an innocent civilian, the test says "innocent civilian".

Terrorists are rare: let's say, 250 out of 250 million people in the USA.

When the tests are applied to the terrorists, they will be detected 99.9% of the time, which means there is about a 25% chance of missing one of them, and the other 249 will definitely be detected. Great!
However, out of the remaining 249,999,750 innocent civilians, 99.9% accuracy means 0.1% error, which means that 250,000 of them will be incorrectly labeled "terrorist". Uh, oh!

The law enforcement problem is now that we have 250,250 people who have been labeled as "terrorist" by our diagnostic tests. Only about 1 in 1,000 of them is actually a terrorist.

If we were mining for gold, we would say that the ore has been considerably enriched, since 1 in 1,000 is better than 1 in 1,000,000 by quite a lot. There's still a long way to go, though, before finding a nugget.

But we are talking about people's lives, freedom, and livelihoods here. The consequences to an innocent civilian of being incorrectly labeled a "terrorist" (or even "suspected terrorist") can be very large.

Suppose, out of the innocent people incorrectly labeled "terrorist", 1 in 1,000 is sufficiently traumatized by the experience so that they, or a relative, actually becomes a terrorist. (This is analogous to catching polio from the polio vaccine: extremely rare, and impossible with killed-virus vaccine, but a real phenomenon.)

In this case, even after catching all 250 original terrorists, 250 new ones have been created by the screening process!

The numbers I've used give a break-even scenario. But 99.9% accuracy and specificity is unrealistically high. More realistic numbers make the problem worse. Nobody knows what fraction of people traumatized as innocent victims of a government process are seriously radicalized. 1 in 1,000 is an uninformed guess, but the number could be significantly higher.

A mass screening process like this is very likely to have costs that are much higher than the benefits, even restricting the costs to "number of free terrorists" as I have done here. Adding costs in dollars and the suffering of innocents just makes it harder to reach the break-even level.

Ask your neighborhood epidemiologist to confirm this analysis. It is applied routinely to public health policy, and applies no less to seeking out terrorists.

There are alternative ways to detect and defend against terrorists. Mass screening approaches like TIA are very questionable in terms of costs and benefits.

Written 12/14/2002.

BJK