Thoughts on Cloud Computing In The Wake of Meltdown

One of the promises of cloud computing, particularly IaaS, is that the providers operate at a scale that affords certain benefits that hard to justify and implement for all but the largest private datacenter operations.  For example, most cloud datacenters have physical security and power redundancy that is likely cost prohibitive for most companies whose primary business is not running a datacenter.  Meltdown and Spectre highlighted some additional benefits of operating in the cloud, and also some potential downsides.

First, since manage servers is the very business of cloud providers and they tend to have very large numbers of physical servers, it seems that most cloud companies were able to gain early insight into the issues and perform early testing of patches.

Second, because cloud providers do have so many systems to manage and the name of the game is efficiency, cloud providers tend to be highly automated, and so most of the major cloud providers were able patch their estates either well before the disclosure, or shortly after the disclosure.  That’s a good thing, as many companies continue to struggle with obtaining firmware fixes from their hardware vendors nearly two weeks later.  Of course, to fully address the vulnerability, cloud customers also have to apply operating system patches to their virtual server instances.

There are some downsides, however.

First, Meltdown provided an apparent possibility for a guest in one virtual machine to read the memory of a different virtual machine running on the same physical server.  This is a threat that doesn’t exist on private servers, or is much less concerning for private cloud.  This vulnerability existed for many years and we may never know it is was actually used for this purpose, however once it was disclosed, cloud providers (generally) applied fixes before any known exploitation was seen in the wild.

Second, another benefit of cloud from the consumer perspective is “buying only what you need”.  In the case of dedicated servers, we traditionally size the server to accommodate the maximum amount of load it would need to handle while providing the required performance.  Cloud, though, gave us the ability to add capacity on demand, and because on a clock cycle-by-clock cycle basis, cloud is more expensive than a physical server in the long run, we tend to only buy the capacity we need at the time.  After cloud providers applied the Meltdown patches, at least some kinds of workloads saw a dramatic increase in required compute capacity to maintain the same level of performance.  One of the big downsides to cloud therefore, seems to the risk of a sudden change in the operating environment that results in higher cloud service costs.  As problematic as that might be, firing an API to increase the execution cap or add CPUs to a cloud server is logistically much simpler than private physical servers experiencing the same performance hit and needing to be replaced, which requires the arduous process of obtaining approval for a new server, placing the order, waiting, racking, cabling, set up, and so on.


Prioritizing Infosec Programs

What follows is a barely intelligible, Christmas cookie-induced attention deficit rant on the state of the industry.

The most excellent Jake Williams wrote on his company’s blog an interesting post from a Twitter Survey he ran, asking whether network or endpoint visibility is more important for detecting APT intrusions.  Jake points out there really isn’t a strong consensus among the over 1100 people that voted in the survey, nor in the responses to the survey, and that there may be a cyclical nature to the way infosec people would rank order these controls over time.

I continue to grow increasingly interested in the psychological aspects of security and perceptions of risk among IT and infosec people, and Jake’s post is a good example of why.  There is not an objectively “right” answer Jake’s question, but that doesn’t really stop us from forming a strong narrative in our minds that leads us to an answer we feel is correct.  I suspect that each of us apply particular context to such questions when forming our position.  For some people who work in organizations with highly diffuse / cloud-y IT, the concept of monitoring networks might not make any sense at all…  Which network would you monitor?  Monitoring the endpoint in this case is the only approach that makes any sense.  Other people point out that IOT devices are becoming more attractive APT targets and endpoint security tools do not (and likely never will) work on these devices, hence the network is the only place that makes sense to monitor.  Still others point out that the “right” answer is to get 100% coverage using which ever approach can accommodate that level of coverage.

I know that Jake intentionally framed this question in an unnatural way that yielded these results.  We can intuitively look at this situation and see that everyone is right, and that no organization would take such a position of going all in on endpoint security or network security.  While that may be true, this example does highlight the varied thought processes each of us as individuals use as we approach such questions, and that almost certainly influences how we approach questions of security investment prioritization – you know the exercises many of us perform where we rank order risks, possibly using addition, multiplication, weighting, and really nice looking heat maps, tweaking the numbers until they match our expected view of reality and hence our view of what we controls we should be implementing where?

An  intuitively “right” way to approach this is to consider whether each asset has the proper level of visibility – in some cases that may be through endpoint controls because the devices are not on an central network, and it might be network controls because we have IOT devices not supported by endpoint security solutions.  I don’t believe this is the right way to think about the problem: in my 20 years of working in and around infosec, the complaint has always been that we try to bolt on security, rather than to bake it in, but I see us continue to perpetuate it – possibly even embracing the notion of “bolt on security” for a variety of reasons.  In my estimation, the objectively “right” solution is to take a more systems-oriented approach to designing our IT systems in the first place.  We can’t use network controls to monitor diffuse IT environments because there is no logical network location to monitor.  What happens when IOT devices are added to that environment?  Where does the network control go?

Clearly this is far outside the bounds of the two answers Jake’s survey permitted.  Though I will hammer on one more point.  Jake’s specific question was “…which one matters more for detecting APT intrusions?”  A number of comments pointed out that “it’s not a breach until the data gets out”, and therefore network detection is critical for the final determination.  Schrödinger’s Breach, I suppose.  What concerns me with this line of thought is that the only harm a threat actor can exact on a victim is data theft.  The question posed wasn’t specific to a “data breach”, but rather an “APT intrusion”.  We have seen cases like Saudi Aramco, Sony, and the Dark Seoul attacks where the end game was destruction.  WannaCry and NotPetya likewise were not intending to exflitrate data.  Under HIPAA and other data protection laws, data doesn’t have to be exfiltrated in order to be reportable (and potentially fine-able) as a data breach.  Plenty of other harms can befall an organization, such as impacting the availability of an application, or physically damaging equipment and so on.

To sum up, I think we have a lot of growing ahead of us as an industry, in terms of how we think about controls, risks, and terminology.


Probability of Getting Pwnt

I recently finished listening to episode 398 of the Risky Business podcast where Patrick interviews Professor Lawrence Gordon. The discussion is great, as all of Patrick’s shows are, but something caught my attention.  Prof Gordon describes a model he developed many years ago for determining the right level of IT security investment; something that I am acutely interested in.  Professor points out that a key aspect of determining the proper level of investment is the probability of an attack, and he points out that the probability needs to be estimated by the people who know the company in question best: the company’s leadership.

That got me thinking: how do company leaders estimate that probability?  I am sure there are as many ways to do it as there are people doing it, however the discussion reminded me of a key topic in Daniel Kahneman’s book “Thinking Fast and Slow” regarding base rates. Base rates are more or less an average quantity measured against a population for a given concept. For example, the probability of dying in a car crash is about 1 in 470.  That’s the base rate. If I wanted to estimate my likelihood of dying in a car crash, I should start with the base rate and make adjustments I believe are necessary given unique factors to me, such as that I don’t drive to work every day, I don’t drink while driving and so on. So, maybe I end up with my estimate being 1 in 60o. 

If i didn’t use a base rate, how would I estimate my likelihood of dying in a car crash?  Maybe I would do something like this:

Probability of Jerry dying in a car crash <

1/(28 years driving x 365 x 2 driving trips per day) 

This tells me I have driven about 20,000 times without dying. So, I pin my likelihood of dying in a car crash at less than 1 in 20,000. 

But that’s not how it works. The previous 20,000 times I drove don’t have a lot to do with the likelihood of me dying in a car tomorrow, except that I have experience that makes it somewhat less likely I’ll die.  This is why considering base rates are key. If something hasn’t happened to me, or happens really rarely, I’ll assign it a low likelihood. But, if you ask me how likely it is for my house to get robbed right after it got robbed, I am going to overstate the likelihood.

This tells me that things like the Verizon DBIR or the VERIS database are very valuable in helping us define our IT security risk by providing a base rate we can tweak. 

I would love to know if anyone is doing this. I have to believe this is already a common practice. 

Lies, Damn Lies and Statistics

A message came through the Security Metrics mailing list yesterday that got me thinking about our perception of statistics.  The post is regarding a paper on the security of an electronic voting system.

I’ll quote the two paragraphs I find most interesting:

To create a completely unhackable system, Smartmatic combined the following ideas: security fragmentation, security layering, encryption, device identity assurance, multi-key combinations and opposing-party auditing. Explaining all of them is beyond the scope of this article.

The important thing is that, when all of these methods are combined, it becomes possible to calculate with mathematical precision the probability of the system being hacked in the available time, because an election usually happens in a few hours or at the most over a few days. (For example, for one of our average customers, the probability was 1 × 10−19. That is a point followed by 19 zeros and then 1). The probability is lower than that of a meteor hitting the earth and wiping us all out in the next few years—approximately 1 × 10−7 (Chemical Industry Education Centre, Risk-Ed n.d.)—hence it seems reasonable to use the term ‘unhackable’, to the chagrin of the purists and to my pleasure.

The claim here appears to be that the number of robust security controls included in the system, all of which have a small chance of being bypassed taken together, along with the limited time that an election runs yields a probability of 1×10^-19 of being hacked, which is effectively a probability of zero.

A brief bit of statistical theory: the process for calculating the probability of two or more events happening at the same time depends on whether the events are independent from each other.  Take, for example, winning the lottery.  Winning the lottery a second time is in no way related to winning the lottery a first time…  You don’t “get better” at winning the lottery.  Winning the lottery is an independent event.  If the odds of winning a particular lottery are one in a million, or 1/1000000, the probability of winning the lottery twice is 1/1000000 x 1/1000000, which is 1/1000000000000 or 1×10^-12.  However, many events are not actually independent from each other.  For example,  I manage a server and the probability of the server being compromised through a weak password might be 1/1000000.  Since I am clever, getting shell on my server does not get you access to my data.  To get at my data, you must also compromise the application running on the server through a software vulnerability and the probability of that might also be 1/1000000.   Does this mean that the probability of someone stealing my data is 1×10^-12?  These events are very likely not independent.  The mechanism of dependence may not be readily apparent to us, and so we may be apt to treat them as independent and decide against the cyber insurance policy, given the remarkably low odds.  Upon close inspection, there is a nearly endless list of ways in which the two events (getting a shell, then compromising the application) might not be independent, such as:

  • Password reuse to enter the system and application
  • Trivial passwords
  • Stealing data out of memory without actually needing to break the application
  • A trivial application bug that renders the probability of compromise closer to 1/10 than 1/1000000
  • An attacker phishing the credentials from the administrator
  • An attacker using a RAT to hijack an existing authenticated connection from a legitimate user
  • and many, many more

When we see the probability of something happening stated as being exceedingly low as with 1×10^-19, but then see the event actually happen, we are right to question the fundamental assumptions that went into the calculation.

A practical example of this comes from the book “The Black Swan” in which Taleb points out the Nobel Prize winning Modern Portfolio Theory  calculated the odds of the 1987 stock market crash to be 5.51×10^-89.

My experience is that these kinds of calculations happen often in security, even if only mentally.  However, we make these calculations without a comprehensive understanding of the relationships between systems, events and risks.

Outside of gambling, be skeptical of such extraordinary statements of low probabilities, particularly for very important decisions.


Wisdom of Crowds and Risk Assessments

If your organization is like most, tough problems are addressed by assembling a group of SMEs into a meeting and hashing out a solution.  Risk assessments are often performed in the same way: bring “experts” into a room, brain storm on the threats and hash out an agreed-upon set of vulnerability and impacts for each.   I will leave the fundamental problems with scoring risks based on vulnerability and impact ratings for another post[1].

“None of us is as smart as all of us” is a common mantra.  Certainly, we should arrive at better conclusions through the collective work of a number of smart people.  We aren’t.  Many people have heard the phrase “the wisdom of crowds” and implicitly understood that this reinforces the value of the collaborative effort of SMEs.  It doesn’t.

The “wisdom of crowds” concept describes the phenomenon where a group of people are each biased in random directions when estimating some quantity.  When we average out the estimates of the “crowd”, the resulting average is often very close to the actual quantity.  This works with the estimates are given independently of one another.  If the “crowd” collaborates or compares ideas when estimating the quantity, this effect isn’t present.  People are heavily influenced by each other and the previously present array biases are tamped down, resulting in a estimates that reflect the group consensus and not the actual quantity being analyzed.

The oft cited example is the county fair contest where the crowd writes down his or her guess for the weight of a cow or giant pumpkin on a piece of paper, drops the paper in a box and hopes to have the closest guess to win the Starbucks gift card.  Some enterprising people have taken the box of guesses and averaged them out and determined that the average of all guesses is usually very close to the actual weight.  If, instead, the fair goers were somehow incentivized to work together so that they only had one guess, and if that guess were within, say 2 pounds of the actual weight, the entire crowd won a prize, it’s nearly a sure thing the crowd would lose every time, absent some form of cheating.

With this in mind, we should consider the wisdom of our risk assessment strategies.

[1] In the mean time, read Douglas Hubbard’s book: “The Failure of Risk Management”.

Information Security and the Availability Heuristic

Researchers studying human behavior describe a trait, referred to as the availability heuristic, that significantly skews our estimation of the likelihood of certain events based on how easy or hard it is for us to recall an event, rather than how likely the event really is.

It isn’t hard to identify the availability heuristic at work out in the world: shark attacks, terror attacks, plane crashes, kidnappings and mass shootings.  All of them are vivid.  All of them occupy, to a greater or lesser extend, the news media.  The recollection of these events, usually seen through the media, will often cause people to irrationally overestimated certain risks.  For instance, the overwhelming majority, approximately 88%, of child kidnappings is perpetrated by a relative or caregiver.  However, the raw statistics regarding kidnappings, constant Amber alerts and media stories about horrible kidnapping cases is the source of much consternation for parents.  Consternation to the point that police in some jurisdictions are accusing parents who allow kids to play outside unsupervised of child neglect.  The gun debate rages on in the U.S., with mass shooting tragedies leading news reports, even though the number of people who kill themselves with a gun significantly outnumbers those murdered with a gun.

The availability heuristic causes us to worry about shark attacks, plane crashes, stranger kidnappings and mass shootings, while we are far more likely to die in car crashes, or from diabetes, or heart disease, or cancer or even of suicide, however the risks from those are generally not prominent in our minds when we think about the most important risks we, and our friends and families, face.  Maybe if, at the end of the TV news, the commentators recapped the number of car crash fatalities and heart disease fatalities, we would put better context around these risks, but probably not.  As Stalin said: “a single death is a tragedy; a million deaths is a statistic.”

How does this related to information security?

Information security programs are, at their core, intended to mitigate risks to an organization’s systems and data.  Most organizations need to be thoughtful in the allocation of their information security budgets and staff: addressing risks in some sort of prioritized order.  What, specifically, is different between the ability to assess the likelihood of information security risks as opposed to the “every day” risks described above?

Increasingly, we are bombarded by news of mega breaches and “highly sophisticated” attacks in the media.  The availability of these attacks in recollection is certainly going up as a result.  However, just like fretting about a shark attack as we cautiously lounge in a beach chair safely away from the water while eating a bag of Doritos, are we focusing on the unlikely Sony-style attack, while our data continues to bleed out through lost or stolen unencrypted drives on a daily basis?  In many cases, we do not actually know the specific mechanisms that lead to the major beaches.  Regardless, security vendors step in and tailor their “solutions” to help organizations mitigate these attacks.

Given that the use of quantitative risk analyses are still pretty uncommon, the assessment of likelihood of information risks is, tautologically, subjective in most cases.  Subjective assessment of risks are almost certainly vulnerable to the same kinds of biases described by the availability heuristic.

The availability heuristic works in both directions, too.  Available risks are over-assessed, while other risks that may actually be far more likely but not prominently recalled, are never even considered.  Often, the designers of complex IT environments appear to be ignorant of many common attacks and do not account for them in the system design or implementation. They confidently address the risks, as budget permits, that they can easily recall.

Similarly, larger scale organizational risk assessments that do not enumerate the more likely threats will most certainly lead to suboptimal prioritization of investment.

At this point, the above linkage of the availability heuristic to information security is hypothetical- it hasn’t been demonstrated objectively, though I would argue that we see the impacts of it with each new breach announcement.

I can envision some interesting experiments to test this hypothesis: tracking how well an organization’s risk assessments forecast the actual occurrence of incidents; identifying discrepancies between the likelihood of certain threats relative to the occurrence of those threats out in the world and assessing the sources of the discontinuities; determining if risk assessment outcomes are different if participants are primed with different information regarding threats, or if the framing of assessment questions result in different risk assessment outcomes.

A possible mitigation against the availability heuristic in risk assessments, if one is really needed, might be to review sources of objective threat information as part of the risk assessment process.  This information may come from threat intelligence feeds, internal incident data and reports such as the Verizon DBIR.  We have to be cognizant, though, that many sources of such data are going to be skewed according to the specific objectives of the organization that produced the information.  Reading an industry report on security breaches written by the producer of identity management applications will very likely skew toward analyzing incidents that resulted from identity management failures, or at least play up the significance of identity management failures in incidents where multiple failures were in play.

Certainty, Cybersecurity and an Attribution Bonus

In “Thinking Fast And Slow”, Daniel Kahneman describes a spectrum of human irrationalities, most of which appear to have significant implications for the world of information security.  Of particular note, and the focus of this post, is the discussion on uncertainty.

Kahenman describes that people will generally seek out others who claim certainty, even when there is no real basis for expecting someone to be certain.  Take the example of a person who is chronically ill.  A doctor who says she does not know the cause of the ailment will generally be replaced by a doctor who exhibits certainty about the cause.  Other studies have shown that the doctor who is uncertain is often correct, and the certain doctor is often incorrect, leading to unnecessary treatments, worry, and so on.  Another example Kahneman cites is the CFO of companies.  CFO’s are revered for their financial insight, however they are, on average, far too certain about things like the near term performance of the stock market.  Kahneman also points out that, just as with doctors, CFOs are expected to be certain and decisive, and not being certain will likely cause both doctors and CFOs to be replaced.  All the while the topic each is certain about is really a random process, or such a complicated process containing so many unknown and unseen influencing variables as to be indistinguishable from randomness.

Most of us would be rightly skeptical about someone who claims to have insight into the winning numbers of an upcoming lottery drawing, and would have little sympathy when that person turns out to be wrong.  However, doctors and CFOs have myriad opportunity to highlight important influencing variables that weren’t known when their prediction was made.  These variables are what make the outcome of the process random in the first place.

The same dichotomy regarding irrational uncertainty of random processes appears to be at work in information security as well.  Two examples are the CIO who claims that an organization is secure, or at least would be secure if she had an additional $2M to spend, and the forensic company that attributes an attack on a particular actor – often a country.

The CIO, or CISO, is in a particularly tough spot.  Organizations spend a lot of money on security and want to know whether or not the company remains at risk.  A prudent CIO/CISO will, of course, claim that such assurances are hard to give, and yet that is the mission assigned to them by most boards or management teams.  They will eventually be expected to provide that assurance, or else a new CIO/CISO will do it instead.

The topic of attribution, though, seems particularly interesting.  Game theory seems to have a strong influence here.  The management of the breached entity wants to know who is responsible, and indeed the more sophisticated the adversary appears to be, the better the story is.  No hacked company would prefer to report that their systems were compromised by a bored 17 year old teaching himself to use Metasploit over the adversary being a sophisticated, state-sponsored hacking team the likes of which are hard, neigh impossible for an ordinary company to defend against.

The actors themselves are an intelligent adversary, generally wanting to shroud their activities with some level of uncertainty.  We shouldn’t expect that an adversary will not  mimic other adversaries, reuse code, fake timezones, change character sets, incorporate cultural references, and so on, of other adversaries in an attempt to deceive.  These kinds of things add only marginal additional time investment to a competent adversary.  As well, other attributes of an attack, like common IP address ranges, common domain registrars and so on, may be common between adversaries for reasons other than the same actor is responsible, such as that of convenience or, again, an intentional attempt to deceive.  Game theory is at play here too.

But, we are certain that the attack was perpetrated by China.  Or Russia. Or Iran. Or North Korea. Or Israel.  We discount the possibility that the adversary intended for the attack to appear as it did.  And we will seek out organizations that can give us that certainty.  A forensic company that claims the indicators found in an attack are untrustworthy and can’t be relied upon for attribution will most likely not have many return customers or referrals.

Many of us in the security industry mock the attribution issue with dice, an magic 8-ball and so on, but the reality is that it’s pervasive for a reason: it’s expected, even if it’s wrong.


What Happens When Most Attackers Operate As An APT?

I’ve been concerned for some time about the rate at which offensive tactics are developing, spurred by the dual incentives of financial gain by criminals and information gathering by government military, intelligence and law enforcement agencies and their contractors.

I find it hard to imagine, in this day of threat intelligence, information sharing, detailed security vendor reports on APT campaigns and other criminal activities, that criminals are not rapidly learning best practices for intrusion and exfiltration.

And indeed Mandiant’s recently released 2015 M-Trends report identifies trend 4 as: “BLURRED LINES—CRIMINAL AND APT ACTORS TAKE A PAGE FROM EACH OTHERS’ PLAYBOOK”, which describes the ways Mandiant observed criminal and governmental attackers leveraging each other’s tools, tactics and procedures (TTPs) in incidents they investigated.

I see this as bad news for the defense.  Adversaries are evolving their TTPs much more rapidly than our defensive capabilities are maturing.

Something has to change.

“Cyber security” is still largely viewed as an add-on to an IT environment: adding in firewalls, anti-virus, intrusion prevention advanced malware protection, log monitoring, and so on.  All of which has dubious effectiveness, particularly in the face of more sophisticated attacks.  We need a new approach.  An approach that recognizes the limitations of information technology components, and designs IT environments, from the ground up, to be more intrinsically secure and defensible.

A way to get there I believe, is for IT architects, not just security architects, to maintain awareness of offensive tactics and trends over time.  This way, those architects have a healthy understanding of the limitations of the technology they are deploying, rather than making implicit assumptions about the “robustness” of a piece of technology.

As defenders, we often have our hands full with “commodity” attacks using very basic TTPs.  We need to dramatically improve our game to face what is coming.


Human Nature And Selling Passwords

A new report by Sailpoint indicating that one in seven employees would sell company passwords for $150 is garnering a lot of news coverage in the past few days.  The report also finds that 20% of employees share passwords with coworkers.  The report is based on a survey of 1,000 employees from organizations with over 3,000 employees.  It isn’t clear whether the survey was conducted using statistically valid methods, so we must keep in mind the possibility for significant error when evaluating the results.

While one in seven seems like an alarming number, what isn’t stated in the report is how many would sell a password for $500 or $1,000.  Not to mention $10,000,000.  The issue here is one of human nature.  Effectively, the report finds that one in seven employees are willing to trade $150 for a spin of a roulette wheel where some spaces result in termination of employment or end his or her career.

Way back in 2004, an unscientific survey found that 70% of those surveyed would trade passwords for a chocolate bar, so this is by no means a new development.

As security practitioners, this is the control environment we work in.  The problem here is not one of improper training, but rather the limitations of human judgement.

Incentives matter greatly.  Unfortunately for us, the potential negative consequences associated with violating security policy, risking company information and even being fired are offset by more immediate gratification: $150 or helping a coworker by sharing a password.  We shouldn’t be surprised by this: humans sacrifice long term well being for short term gain all the time, whether smoking, drinking, eating poorly, not exercising and so on.  Humans know the long term consequences of these actions, but generally act against their own long term best interest for short term gain.

We, in the information security world, need to be aware of the limitations of human judgement.  Our goal should not be to give employees “enough rope to hang themselves”, but rather to develop control schemes that accommodate limitations of human judgement.  For this reason, I encourage those in the information security field to become familiar with the emerging studies under the banner of cognitive psychology/behavioral economics.  Better understanding the “irrationalities” in human judgement, we can design better incentive systems and security control schemes.