Differentiating IT Risk From Other Business Risk

It’s often said that IT risk is just another type of business risk, not different than the risk of hiring a new person, or the risk of a new product or a new acquisition.

I recently listened to the audiobook “The Undoing Project”, which is the story of Amos Tversky and Daniel Kahneman and their development of the foundational concepts of behavioral economics.  The book is a great read for anyone familiar with their work, though it does not bring any new insight if you’ve already read “Thinking Fast and Slow” and works by Dan Ariely.  On this listen, though, something clicked in me when the author recapped discussions about how people value bets.  Consider this scenario:

You are given two options:

  1. A $500 payment
  2. A 50% chance of a $1000 payment, and a 50% chance of nothing.

Most people given this choice will take the $500 sure thing option.  We require the amount to be a significantly higher in option b to take that bet, even though they stand to make double the money.

Now consider this scenario:

You are given two options:

  1. A $500 fine that you must pay
  2. A 50% chance of a $1000 fine that you must pay, and a 50% chance that you pay nothing

In this scenario, most people become risk seeking and chose option B, even though it may cost them twice as much.  We require the potential loss in option b to be significantly higher to select option a.

How does this relate to business risk?  First, businesses are led by people, and those people have the same bias as above.  I contend that there are (at least) two distinct types of business risk that we need to keep separate in our minds:

First, investment risk.  This is the risk arising from investing in some new “thing” that has the promise of generating more money.  That “thing” could be a new employee, a new product line, an acquisition, and so on.  There is a chance that the venture fails, and the money is lost, but a (hopefully) much larger promise of increasing revenue and/or profits.  This very likely explains why many companies won’t take major bets, often opting for something closer to a “sure thing” payoff.

Second, risk of loss.  This is the risk arising from things like theft, fire, flood, data breaches, and so on.  It’s all downside risk.  This is the second scenario above.  To what extent do business leaders avoid a sure thing loss (the $500 fine) in the form of increased spending on IT security, because they do not full comprehend the actual potential loss?

 

 

Prioritizing Infosec Programs

What follows is a barely intelligible, Christmas cookie-induced attention deficit rant on the state of the industry.

The most excellent Jake Williams wrote on his company’s blog an interesting post from a Twitter Survey he ran, asking whether network or endpoint visibility is more important for detecting APT intrusions.  Jake points out there really isn’t a strong consensus among the over 1100 people that voted in the survey, nor in the responses to the survey, and that there may be a cyclical nature to the way infosec people would rank order these controls over time.

I continue to grow increasingly interested in the psychological aspects of security and perceptions of risk among IT and infosec people, and Jake’s post is a good example of why.  There is not an objectively “right” answer Jake’s question, but that doesn’t really stop us from forming a strong narrative in our minds that leads us to an answer we feel is correct.  I suspect that each of us apply particular context to such questions when forming our position.  For some people who work in organizations with highly diffuse / cloud-y IT, the concept of monitoring networks might not make any sense at all…  Which network would you monitor?  Monitoring the endpoint in this case is the only approach that makes any sense.  Other people point out that IOT devices are becoming more attractive APT targets and endpoint security tools do not (and likely never will) work on these devices, hence the network is the only place that makes sense to monitor.  Still others point out that the “right” answer is to get 100% coverage using which ever approach can accommodate that level of coverage.

I know that Jake intentionally framed this question in an unnatural way that yielded these results.  We can intuitively look at this situation and see that everyone is right, and that no organization would take such a position of going all in on endpoint security or network security.  While that may be true, this example does highlight the varied thought processes each of us as individuals use as we approach such questions, and that almost certainly influences how we approach questions of security investment prioritization – you know the exercises many of us perform where we rank order risks, possibly using addition, multiplication, weighting, and really nice looking heat maps, tweaking the numbers until they match our expected view of reality and hence our view of what we controls we should be implementing where?

An  intuitively “right” way to approach this is to consider whether each asset has the proper level of visibility – in some cases that may be through endpoint controls because the devices are not on an central network, and it might be network controls because we have IOT devices not supported by endpoint security solutions.  I don’t believe this is the right way to think about the problem: in my 20 years of working in and around infosec, the complaint has always been that we try to bolt on security, rather than to bake it in, but I see us continue to perpetuate it – possibly even embracing the notion of “bolt on security” for a variety of reasons.  In my estimation, the objectively “right” solution is to take a more systems-oriented approach to designing our IT systems in the first place.  We can’t use network controls to monitor diffuse IT environments because there is no logical network location to monitor.  What happens when IOT devices are added to that environment?  Where does the network control go?

Clearly this is far outside the bounds of the two answers Jake’s survey permitted.  Though I will hammer on one more point.  Jake’s specific question was “…which one matters more for detecting APT intrusions?”  A number of comments pointed out that “it’s not a breach until the data gets out”, and therefore network detection is critical for the final determination.  Schrödinger’s Breach, I suppose.  What concerns me with this line of thought is that the only harm a threat actor can exact on a victim is data theft.  The question posed wasn’t specific to a “data breach”, but rather an “APT intrusion”.  We have seen cases like Saudi Aramco, Sony, and the Dark Seoul attacks where the end game was destruction.  WannaCry and NotPetya likewise were not intending to exflitrate data.  Under HIPAA and other data protection laws, data doesn’t have to be exfiltrated in order to be reportable (and potentially fine-able) as a data breach.  Plenty of other harms can befall an organization, such as impacting the availability of an application, or physically damaging equipment and so on.

To sum up, I think we have a lot of growing ahead of us as an industry, in terms of how we think about controls, risks, and terminology.

 

Behavioral Economics Sightings in Information Security

Below is a list of resources I am aware of exploring the intersection of behavioral economics and information security.  If you are aware of others, please leave a comment.

Website: Applying Behavioral Economics to Harden Cyberspace

Paper: Information Security: Lessons from Behavioural Economics

Paper: Using Behavioural Insights To Improve the Public’s Use of Cyber Security Best Practices

Links: Psychology and Security Resource Page

Book: The Psychology of Information Security

Conference Talks:

 

An Inconvenient Problem With Phishing Awareness Training

Snapchat recently disclosed that it was the victim of an increasingly common attack where someone in the HR department is  tricked into providing personal details of employees to someone purporting to be the company’s CEO.

In response, the normal calls for “security awareness training!” and “phishing simulations!” is making the rounds.  As I have said, I am in favor of security awareness training and phishing simulation exercises, but I am wary of people or organizations that believe this is a security “control”.

When organizations, information security people and management begin viewing awareness training and phishing simulations as a control, incidents like happened at Snapchat are viewed as a control failure.  Management may ask “did this employee not take the training, or was he just incompetent?”  I understand that your gut reaction may be to think such a reaction would not happen, but let me assure you that it does.  And people get fired for falling for a good phish.  Maybe not everywhere.  Investment in training is often viewed the same as investment in other controls.  When the controls fail, management wants to know who is responsible.

If you ask any phishing education company or read any of their reports, you will notice that there are times of day and days of the week where phishing simulations get more clicks than others, with everything else held constant.  The reason is that people are human.  Despite the best training in the world, factors like stress, impending deadlines, lack of sleep, time awake, hunger, impending vacations and many other factors will increase or decrease the likelihood of someone falling for a phishing email.  Phishing awareness training needs to be considered for what it is: a method to reduce the frequency, in aggregate, of employees falling for phishing attacks.

So, I do think that heads of HR departments everywhere should be having a discussion with their employees on this particular attack.  But, when a story like Snapchat makes news, we should be thinking about prevention strategies beyond just awareness training.  And that is hard because it involves some difficult trade offs that many organizations don’t want to think about.  Not thinking about them, however, is keeping our head in the sand.

Probability of Getting Pwnt

I recently finished listening to episode 398 of the Risky Business podcast where Patrick interviews Professor Lawrence Gordon. The discussion is great, as all of Patrick’s shows are, but something caught my attention.  Prof Gordon describes a model he developed many years ago for determining the right level of IT security investment; something that I am acutely interested in.  Professor points out that a key aspect of determining the proper level of investment is the probability of an attack, and he points out that the probability needs to be estimated by the people who know the company in question best: the company’s leadership.

That got me thinking: how do company leaders estimate that probability?  I am sure there are as many ways to do it as there are people doing it, however the discussion reminded me of a key topic in Daniel Kahneman’s book “Thinking Fast and Slow” regarding base rates. Base rates are more or less an average quantity measured against a population for a given concept. For example, the probability of dying in a car crash is about 1 in 470.  That’s the base rate. If I wanted to estimate my likelihood of dying in a car crash, I should start with the base rate and make adjustments I believe are necessary given unique factors to me, such as that I don’t drive to work every day, I don’t drink while driving and so on. So, maybe I end up with my estimate being 1 in 60o. 

If i didn’t use a base rate, how would I estimate my likelihood of dying in a car crash?  Maybe I would do something like this:

Probability of Jerry dying in a car crash <

1/(28 years driving x 365 x 2 driving trips per day) 

This tells me I have driven about 20,000 times without dying. So, I pin my likelihood of dying in a car crash at less than 1 in 20,000. 

But that’s not how it works. The previous 20,000 times I drove don’t have a lot to do with the likelihood of me dying in a car tomorrow, except that I have experience that makes it somewhat less likely I’ll die.  This is why considering base rates are key. If something hasn’t happened to me, or happens really rarely, I’ll assign it a low likelihood. But, if you ask me how likely it is for my house to get robbed right after it got robbed, I am going to overstate the likelihood.

This tells me that things like the Verizon DBIR or the VERIS database are very valuable in helping us define our IT security risk by providing a base rate we can tweak. 

I would love to know if anyone is doing this. I have to believe this is already a common practice. 

The Value of Saving Data (from theft)

I am currently reading Richard Thayler’s new book “Misbehaving: The Making of Behavioral Economics”.  I trust I don’t need to explain what the book is about.  Early in the book, Thayler describes the work leading up to his thesis, “The Value of Saving a Life”, and points out something most of us can relate to: we value a specific person more than we value the nebulous thought of many unnamed people.  Let me give an example: a girl is very sick and needs an expensive treatment that costs $5 million which her family cannot afford and is not covered by insurance. We have seen similar cases, where the family receives a flood of donations to pay for the treatment. Now consider a different situation: the hospital in the same city as the girl needs $5 million to make improvements which will save an average of two lives per year by reducing the risk of certain infections that are common in hospitals. There is no outpouring of support to provide $5 million to the hospital. The person in the first case is specific – an identified life, while we have no idea who the 2 people per year that would be saved are in the second case – statistical lives. Identified lives vs. statistical lives.  If we were “rational” in the economic sense of the word, we should be far more willing to contribute money to the hospital’s improvement program since it will save many more people than just the lone sick girl. But we are not rational. 

There seems to be a powerful implication for information security in this thought: we have trouble with valuing things that are abstract, like the theft of some unknown amount of our data belonging to people who may not even be customers of ours yet. After a breach, we care very deeply about the data and the victims, and not just because we are in the news, may face lawsuits and other penalties, but because the victims are now “real”. We only move from “statistical” data-subjects to “identified” data-subjects after a breach. Post breach, we generally care more about and invest more in security to avoid a repeat because the impacts are much more real to us. 

One of the fundamental tenants of behavioral economics is that we humans often do not act in an economically rational way – this gave rise to calling the species of people who act according to standard economic theory “econs”. It occurs to me that, in the realm of IT security, we would do well to try to behave more like econs.  Of course, it helps to understand the ways in which econs and humans think differently. 

Intuition and Experience – from Thinking Fast and Slow

I’ve been reading “Thinking Fast and Slow” for the 3rd time now… Technically, I am listening to the audio book, and keep picking up new insights.

The most recent insight is related to intuition.  To net out the topic, intuition, Kahneman believes, is actually only familiarity.  A host of heuristics can influence our perception of how familiar something seems, however we the intuitions of people are often not very good, almost always fairing less well than a basic algorithm.  Therefore, we should be wary when we use intuition to guide an important decision.  Having said that, some people do develop intuition for some things, but two criteria must be met:

  1. The subject of the intuition must be “learnable”.  Some things, such as the stock market, politics, or the outcome of a lottery cannot be learned.  We do not get better at picking lottery numbers or deciding which stock to buy, only lucky or unlucky.  Others, such as fighting fires, playing poker or chess, can be learned, at least to some extent.  They contain repeatable, recognizable patterns.
  2. The person exhibiting the intuition needs to have had an opportunity to learn.  The 10,000 hours rule for chess playing is an example.  The other key element is that the person needs feedback related to the decision.  In the context of playing chess or poker or fighting fires, the person receives feedback quickly.  These two factors combine to build familiarity with specific situations, decisions and their outcomes.

Kahneman recommends asking questions about whether an intuitive judgement is a related to a  process and whether the person exhibiting the judgement has the requisite experience to have developed the intuition.

This is an interesting thought in the context of information security.

By the way, if you have not yet read “Thinking Fast and Slow”, I highly recommend it.  The audio version is excellent, too, even though it is nearly 20 hours long.

Lies, Damn Lies and Statistics

A message came through the Security Metrics mailing list yesterday that got me thinking about our perception of statistics.  The post is regarding a paper on the security of an electronic voting system.

I’ll quote the two paragraphs I find most interesting:

To create a completely unhackable system, Smartmatic combined the following ideas: security fragmentation, security layering, encryption, device identity assurance, multi-key combinations and opposing-party auditing. Explaining all of them is beyond the scope of this article.

The important thing is that, when all of these methods are combined, it becomes possible to calculate with mathematical precision the probability of the system being hacked in the available time, because an election usually happens in a few hours or at the most over a few days. (For example, for one of our average customers, the probability was 1 × 10−19. That is a point followed by 19 zeros and then 1). The probability is lower than that of a meteor hitting the earth and wiping us all out in the next few years—approximately 1 × 10−7 (Chemical Industry Education Centre, Risk-Ed n.d.)—hence it seems reasonable to use the term ‘unhackable’, to the chagrin of the purists and to my pleasure.

The claim here appears to be that the number of robust security controls included in the system, all of which have a small chance of being bypassed taken together, along with the limited time that an election runs yields a probability of 1×10^-19 of being hacked, which is effectively a probability of zero.

A brief bit of statistical theory: the process for calculating the probability of two or more events happening at the same time depends on whether the events are independent from each other.  Take, for example, winning the lottery.  Winning the lottery a second time is in no way related to winning the lottery a first time…  You don’t “get better” at winning the lottery.  Winning the lottery is an independent event.  If the odds of winning a particular lottery are one in a million, or 1/1000000, the probability of winning the lottery twice is 1/1000000 x 1/1000000, which is 1/1000000000000 or 1×10^-12.  However, many events are not actually independent from each other.  For example,  I manage a server and the probability of the server being compromised through a weak password might be 1/1000000.  Since I am clever, getting shell on my server does not get you access to my data.  To get at my data, you must also compromise the application running on the server through a software vulnerability and the probability of that might also be 1/1000000.   Does this mean that the probability of someone stealing my data is 1×10^-12?  These events are very likely not independent.  The mechanism of dependence may not be readily apparent to us, and so we may be apt to treat them as independent and decide against the cyber insurance policy, given the remarkably low odds.  Upon close inspection, there is a nearly endless list of ways in which the two events (getting a shell, then compromising the application) might not be independent, such as:

  • Password reuse to enter the system and application
  • Trivial passwords
  • Stealing data out of memory without actually needing to break the application
  • A trivial application bug that renders the probability of compromise closer to 1/10 than 1/1000000
  • An attacker phishing the credentials from the administrator
  • An attacker using a RAT to hijack an existing authenticated connection from a legitimate user
  • and many, many more

When we see the probability of something happening stated as being exceedingly low as with 1×10^-19, but then see the event actually happen, we are right to question the fundamental assumptions that went into the calculation.

A practical example of this comes from the book “The Black Swan” in which Taleb points out the Nobel Prize winning Modern Portfolio Theory  calculated the odds of the 1987 stock market crash to be 5.51×10^-89.

My experience is that these kinds of calculations happen often in security, even if only mentally.  However, we make these calculations without a comprehensive understanding of the relationships between systems, events and risks.

Outside of gambling, be skeptical of such extraordinary statements of low probabilities, particularly for very important decisions.

 

Wisdom of Crowds and Risk Assessments

If your organization is like most, tough problems are addressed by assembling a group of SMEs into a meeting and hashing out a solution.  Risk assessments are often performed in the same way: bring “experts” into a room, brain storm on the threats and hash out an agreed-upon set of vulnerability and impacts for each.   I will leave the fundamental problems with scoring risks based on vulnerability and impact ratings for another post[1].

“None of us is as smart as all of us” is a common mantra.  Certainly, we should arrive at better conclusions through the collective work of a number of smart people.  We aren’t.  Many people have heard the phrase “the wisdom of crowds” and implicitly understood that this reinforces the value of the collaborative effort of SMEs.  It doesn’t.

The “wisdom of crowds” concept describes the phenomenon where a group of people are each biased in random directions when estimating some quantity.  When we average out the estimates of the “crowd”, the resulting average is often very close to the actual quantity.  This works with the estimates are given independently of one another.  If the “crowd” collaborates or compares ideas when estimating the quantity, this effect isn’t present.  People are heavily influenced by each other and the previously present array biases are tamped down, resulting in a estimates that reflect the group consensus and not the actual quantity being analyzed.

The oft cited example is the county fair contest where the crowd writes down his or her guess for the weight of a cow or giant pumpkin on a piece of paper, drops the paper in a box and hopes to have the closest guess to win the Starbucks gift card.  Some enterprising people have taken the box of guesses and averaged them out and determined that the average of all guesses is usually very close to the actual weight.  If, instead, the fair goers were somehow incentivized to work together so that they only had one guess, and if that guess were within, say 2 pounds of the actual weight, the entire crowd won a prize, it’s nearly a sure thing the crowd would lose every time, absent some form of cheating.

With this in mind, we should consider the wisdom of our risk assessment strategies.

[1] In the mean time, read Douglas Hubbard’s book: “The Failure of Risk Management”.

Information Security and the Availability Heuristic

Researchers studying human behavior describe a trait, referred to as the availability heuristic, that significantly skews our estimation of the likelihood of certain events based on how easy or hard it is for us to recall an event, rather than how likely the event really is.

It isn’t hard to identify the availability heuristic at work out in the world: shark attacks, terror attacks, plane crashes, kidnappings and mass shootings.  All of them are vivid.  All of them occupy, to a greater or lesser extend, the news media.  The recollection of these events, usually seen through the media, will often cause people to irrationally overestimated certain risks.  For instance, the overwhelming majority, approximately 88%, of child kidnappings is perpetrated by a relative or caregiver.  However, the raw statistics regarding kidnappings, constant Amber alerts and media stories about horrible kidnapping cases is the source of much consternation for parents.  Consternation to the point that police in some jurisdictions are accusing parents who allow kids to play outside unsupervised of child neglect.  The gun debate rages on in the U.S., with mass shooting tragedies leading news reports, even though the number of people who kill themselves with a gun significantly outnumbers those murdered with a gun.

The availability heuristic causes us to worry about shark attacks, plane crashes, stranger kidnappings and mass shootings, while we are far more likely to die in car crashes, or from diabetes, or heart disease, or cancer or even of suicide, however the risks from those are generally not prominent in our minds when we think about the most important risks we, and our friends and families, face.  Maybe if, at the end of the TV news, the commentators recapped the number of car crash fatalities and heart disease fatalities, we would put better context around these risks, but probably not.  As Stalin said: “a single death is a tragedy; a million deaths is a statistic.”

How does this related to information security?

Information security programs are, at their core, intended to mitigate risks to an organization’s systems and data.  Most organizations need to be thoughtful in the allocation of their information security budgets and staff: addressing risks in some sort of prioritized order.  What, specifically, is different between the ability to assess the likelihood of information security risks as opposed to the “every day” risks described above?

Increasingly, we are bombarded by news of mega breaches and “highly sophisticated” attacks in the media.  The availability of these attacks in recollection is certainly going up as a result.  However, just like fretting about a shark attack as we cautiously lounge in a beach chair safely away from the water while eating a bag of Doritos, are we focusing on the unlikely Sony-style attack, while our data continues to bleed out through lost or stolen unencrypted drives on a daily basis?  In many cases, we do not actually know the specific mechanisms that lead to the major beaches.  Regardless, security vendors step in and tailor their “solutions” to help organizations mitigate these attacks.

Given that the use of quantitative risk analyses are still pretty uncommon, the assessment of likelihood of information risks is, tautologically, subjective in most cases.  Subjective assessment of risks are almost certainly vulnerable to the same kinds of biases described by the availability heuristic.

The availability heuristic works in both directions, too.  Available risks are over-assessed, while other risks that may actually be far more likely but not prominently recalled, are never even considered.  Often, the designers of complex IT environments appear to be ignorant of many common attacks and do not account for them in the system design or implementation. They confidently address the risks, as budget permits, that they can easily recall.

Similarly, larger scale organizational risk assessments that do not enumerate the more likely threats will most certainly lead to suboptimal prioritization of investment.

At this point, the above linkage of the availability heuristic to information security is hypothetical- it hasn’t been demonstrated objectively, though I would argue that we see the impacts of it with each new breach announcement.

I can envision some interesting experiments to test this hypothesis: tracking how well an organization’s risk assessments forecast the actual occurrence of incidents; identifying discrepancies between the likelihood of certain threats relative to the occurrence of those threats out in the world and assessing the sources of the discontinuities; determining if risk assessment outcomes are different if participants are primed with different information regarding threats, or if the framing of assessment questions result in different risk assessment outcomes.

A possible mitigation against the availability heuristic in risk assessments, if one is really needed, might be to review sources of objective threat information as part of the risk assessment process.  This information may come from threat intelligence feeds, internal incident data and reports such as the Verizon DBIR.  We have to be cognizant, though, that many sources of such data are going to be skewed according to the specific objectives of the organization that produced the information.  Reading an industry report on security breaches written by the producer of identity management applications will very likely skew toward analyzing incidents that resulted from identity management failures, or at least play up the significance of identity management failures in incidents where multiple failures were in play.