Jonathan Cutrell

Base rates and regression lines

A base rate is how often something happens within some sample. For example, the base rate of hours of sleep for a given population may be 8 hours. You can have base rates for your own behaviors, too.

When evaluating frequencies of events and other statistical representations of information, it's important to do so in light of the base rate. For example, if I say that "someone will die of a lightning strike in the US in the next 30 minutes", am I predicting a likely future? The only way we can predict (given no specific weather information) is by using base rates.

In the lightning strike example, we can look at a base rate of yearly lightning strikes (in the US, 51 fatalities occur per year - that's a little less than 1/week - so 30 minutes has about a 1/340 chance of being correct. So how about likelihood of dying if you are hit by a lightning strike? How can we figure out this base rate? The annual injury rate of lightning strikes is around 240k globally, and around 6k of those are fatalities.

6k / 240k is less than 2.5% - in other words, being struck by lightning is safer than some surgeries.

If we didn't have base rates, we wouldn't know that 98 of 100 people survive lightning strikes.