Lies, Damn Lies and Statistics

A message came through the Security Metrics mailing list yesterday that got me thinking about our perception of statistics.  The post is regarding a paper on the security of an electronic voting system.

I’ll quote the two paragraphs I find most interesting:

To create a completely unhackable system, Smartmatic combined the following ideas: security fragmentation, security layering, encryption, device identity assurance, multi-key combinations and opposing-party auditing. Explaining all of them is beyond the scope of this article.

The important thing is that, when all of these methods are combined, it becomes possible to calculate with mathematical precision the probability of the system being hacked in the available time, because an election usually happens in a few hours or at the most over a few days. (For example, for one of our average customers, the probability was 1 × 10−19. That is a point followed by 19 zeros and then 1). The probability is lower than that of a meteor hitting the earth and wiping us all out in the next few years—approximately 1 × 10−7 (Chemical Industry Education Centre, Risk-Ed n.d.)—hence it seems reasonable to use the term ‘unhackable’, to the chagrin of the purists and to my pleasure.

The claim here appears to be that the number of robust security controls included in the system, all of which have a small chance of being bypassed taken together, along with the limited time that an election runs yields a probability of 1×10^-19 of being hacked, which is effectively a probability of zero.

A brief bit of statistical theory: the process for calculating the probability of two or more events happening at the same time depends on whether the events are independent from each other.  Take, for example, winning the lottery.  Winning the lottery a second time is in no way related to winning the lottery a first time…  You don’t “get better” at winning the lottery.  Winning the lottery is an independent event.  If the odds of winning a particular lottery are one in a million, or 1/1000000, the probability of winning the lottery twice is 1/1000000 x 1/1000000, which is 1/1000000000000 or 1×10^-12.  However, many events are not actually independent from each other.  For example,  I manage a server and the probability of the server being compromised through a weak password might be 1/1000000.  Since I am clever, getting shell on my server does not get you access to my data.  To get at my data, you must also compromise the application running on the server through a software vulnerability and the probability of that might also be 1/1000000.   Does this mean that the probability of someone stealing my data is 1×10^-12?  These events are very likely not independent.  The mechanism of dependence may not be readily apparent to us, and so we may be apt to treat them as independent and decide against the cyber insurance policy, given the remarkably low odds.  Upon close inspection, there is a nearly endless list of ways in which the two events (getting a shell, then compromising the application) might not be independent, such as:

  • Password reuse to enter the system and application
  • Trivial passwords
  • Stealing data out of memory without actually needing to break the application
  • A trivial application bug that renders the probability of compromise closer to 1/10 than 1/1000000
  • An attacker phishing the credentials from the administrator
  • An attacker using a RAT to hijack an existing authenticated connection from a legitimate user
  • and many, many more

When we see the probability of something happening stated as being exceedingly low as with 1×10^-19, but then see the event actually happen, we are right to question the fundamental assumptions that went into the calculation.

A practical example of this comes from the book “The Black Swan” in which Taleb points out the Nobel Prize winning Modern Portfolio Theory  calculated the odds of the 1987 stock market crash to be 5.51×10^-89.

My experience is that these kinds of calculations happen often in security, even if only mentally.  However, we make these calculations without a comprehensive understanding of the relationships between systems, events and risks.

Outside of gambling, be skeptical of such extraordinary statements of low probabilities, particularly for very important decisions.