Non-traditional Sources of Vendor Risk

NotPetya seemed to be a pretty rude awakening for some organizations – realizing that vendors and business partners previously thought to be “benign” can be the source of significant risk.  This should not be surprising after the Target and Home Depot breaches.  The initial distribution mechanism of NotPetya was through auto-updates to business software.  We know that NotPetya propagated on a network using a few different tactics.  A number of organization were infected with no apparent connection to the original distribution mechanism, meaning that the infection very likely propagated through network connections between organizations.

One of the fundamental challenges we seem to have in cyber security is a lack of imagination – imagination for how attacks can happen.  As is the case in some many things, we seem to be stuck fighting yesterday’s problem.  After Target and Home Depot, we started interrogating our HVAC vendors pretty hard, presumably DOUBLING or TRIPLING the number of security related questions on our vendor management questionnaires.  Possibly the issue here is that each organization needs to learn the lesson for its self and the situation really is improving in aggregate, but I am growing cynical in old age.  It seems that we are not hitting the problem head, instead choosing to “accept risks” that we choose not to understand.

Certainly a big headwind is the extreme complexity of IT environments, though I am not sure that means we should just default to “well, we followed ISO27k” (aka. sticking head-in-sand).  It seems that a better solution would be to break the problem up into “trust-able components” with a reliable/predictable demarcation, and limiting the trust between systems and networks.

Is there any reason – any at all – that a malicious update some Ukrainian tax software should end up infecting unrelated subsidiaries/parent companies in other countries, or hospitals in the US?

One of the issues I see with such a strategy is that it necessarily causes IT to cost more, almost regardless of how it’s implemented.  But without it, we end up with interconnections that criminals, nation-states, and others can leverage for mass destruction.  Particularly interesting to me is that the risk decisions of one organization can impact many, many organizations down stream – both from a “cyber contagion” but also from a simple economic perspective, if we consider the effects NotPetya caused on global shipping and WannaCry caused on delivery of health care.

 

Reflecting on the Need For Cyber Resilience

The recent NotPetya attacks disrupted the IT operations of many firms world wide, and as I write this, the Internet is littered with media reports of companies still struggling to return to operations nearly two weeks after the outbreak.  This style of attack is not new: we saw it in the Shamoon attack on Saudi Aramco, and in the Dark Seoul attack in South Korea, and in the attack on Sony, and most recently in WannaCry and now NotPetya.  I will readily admit that if you become the target of a government operation, the outcome is nearly assured no matter what you do to prepare.  NotPetya, though, should highlight to us that we don’t necessarily have to become collateral damage in “cyber war” between countries if we design and operate our systems appropriately.

IT security, at its core, is a set of trade-offs, though.  Some are good, some bad, the implications of some are understood, but often they are not.  I guess the best way to think about it is “we don’t go to cyber war with the network we want; we go to cyber war with the network we have”.  Recognizing that and the rapidly evolving techniques used by the more advanced adversaries, as well as the terrible track record those advanced adversaries have of keeping those tools and techniques secret, we need to recognize the need for cyber resilience.   I recognize the term “cyber resilience” is not going over well with many of you, but I don’t currently have a better term for the concept.  I believe it is important to distinguish cyber resilience from traditional disaster recovery programs.  I work a lot with US-based banks and US banking regulators that are part of the FFIEC.  A lot of my thinking has been shaped by the programs and guidelines the FFIEC has established in recent years regarding cyber resilience, and reflecting on that, I see many linkages between their guidance and these recent destructive attacks.

Many organizations view disasters as purely geographical events, and their programs are designed to address that.  Off-site backups, hot, warm, or cold systems in remote data centers, and so on.  These make good sense when the threat being mitigated is a fire, flood, tornado, or volcano.  But cyber attacks happen orthogonal to location – along technology boundaries, rather than on geographic boundaries.  A great example was Code Spaces, who operated their environment in AWS.  Code Spaces promised a highly fault tolerant environment, replicating data across multiple AWS data centers on multiple continents.  Sadly, when their AWS keys were stolen, attackers deleted all traces of Code Space’s data from all those redundant data centers.  In the recent NotPetya attacks, a number of victims had their entire server environments wiped out, including replicated systems, geographically distributed fail over systems, and so on.  Consider what would happen to an organization’s geographically distributed Active Directory infrastructure during a NotPetya outbreak.  There is no restoration; only starting over.

Maybe starting over is good, but I’m guessing most victims impacted in that way would rather plan the project out a bit better.

That takes me back to cyber resilience.  I recognize that most of us don’t see our organizations as being in the line of fire in a cyber war between Russia and the Ukraine.  I am sure the three US hospitals hit by NotPetya certainly didn’t consider themselves in that firing line.  It is hard to predict the future, but it seems like a safe bet that we are going to see more attacks of the Dark Seoul and NotPetya variety than less.  And as time goes on, we are all becoming interconnected in ways that we may not really understand.  If your organization is sensitive to IT outages in its operations, I would recommend putting some focus on the concept of cyber resilience in your strategic plans.  At the risk of offending my friends in the banking industry, I’d recommend taking a look at the FFIEC’s business continuity planning material.  It has some good ideas that may be helpful.

NotPetya, Complex Attacks, and the Fog of War

I cannot recall a previous widespread incident that created confusion and misdirection the way NotPetya did.  I want to use this post to examine a bit of what happened and what we can learn.

On the morning of June 27, Twitter was abuzz with discussions about a new variant of the Petya ransomware spreading everywhere.  Early reports indicated that the Petya was being introduced into networks via infected email attachments

I strongly suspect that at least some of the organizations affected by the outbreak were making a connection that likely turned out to be coincidental, rather than causal.  If I see a evidence someone received a suspicious email attachment – something that happens all day every day in a large company, and then suddenly that computer reboots and begins locking itself up, I suspect most of us would draw that same conclusion, and because it fits so neatly in our daily experience in defending the network, convincing us otherwise can be difficult.  I do not know what, if any, net effect this misdirection may have had on the overall NotPetya story, but it seems likely that there were at least some security teams spending time locking down email to prevent becoming a victim.

As it turns out, NotPetya was introduced to victim networks via the update process to the ME Doc tax software in widespread use in the Ukraine and leverage the compromised infrastructure of Intellect Service, who makes ME Doc.  There are, however, some outliers, such as the three hospitals in the US who were infected.  There is no word on how hospitals in the US came to be infected with seemingly no tie to the ME Doc software.  My best guess is that malware propagated via connections to other entities that did use ME Doc.  Merck, for example, was one of the companies infected.  I can envision a number of possible scenarios where an infection at a vendor propagates to a hospital in the US.  For example, a Merck sales person may have been visiting a hospital and VPN’d back to the mothership when her computer was infected and began spreading locally within the hospital network.  Or maybe a VPN or other remote access connection that Merck uses to monitor equipment or inventory, or something else.  I want to emphasize, by the way, that I use Merck here for the sake of argument – I have no idea if they were in any way involved in spreading to these hospitals, and even if they were, they were also a victim.

Discussions throughout the day on June 27 focused on the new Petya variant’s use of the ETERNALBLUE vulnerability/exploit to propagate within an organization.  That turned out to be true, but the focus on this aspect of the malware likely detracted from the bigger picture.  Many organizations, no doubt including those that were, or would soon be affected, were likely scrambling to track down systems missing the MS17-010 patch, and grilling sysadmins on why they neglected to patch.  Reports by that afternoon, however, indicated that fully patched systems were being infected.  We now know that ETERNALBLUE was just one of the mechanisms used to propagate, and that NotPetya included code from mimikatz to pull credentials from memory on infected systems, and a copy of psexec to run commands on other systems on the local network using the gathered credentials.  At the time, however, security advice being thrown around was essentially that which helped prevent WannaCry.  We were fighting the last war, not the current one.  Rather than address the crux of the problem, which included password reuse across systems, excessive privileges, and so on, we saw, and continue to see, advice that includes blocking ports 139 and 445 at the firewall, among other unhelpful nuggets.  Those recommendations are not wrong generally, but were not helpful for this case.  I tried to round up the things that do help here.

Days later, security companies started proclaiming the the Petya outbreak was definitely not really Petya, only loosely based on Petya, and not intended as a ransomware attack at all, but rather a nation-state attack against the Ukraine.

We focused heavily on the ransomware/system wiping aspect of this outbreak.  Many organizations rebuilt and restored many systems wiped by NotPetya.  Some victims, including one of the hospitals mentioned, decided to start over and buy all new systems.  Finally, and possibly most significantly, the latest news is that the adversary behind the NotPetya outbreak had compromised the update server of Intellect Service and likely had the ability to remotely control and collect information from the systems of many thousands of ME Doc users.

This episode highlights, to me at least, the need to keep a clear head during an incident and to be open to revising our understanding of what is happening and what our response should be.

Designing a Defensible Network

2017 has been an eventful year so far in the information security works, with the return of the network worm used in apparent nation-state attacks.  With the most recent attack, alternatively known as Petya an NotPatya, among other names, the focus among many in the industry, particularly early on, was that it spread via EternalBlue, and whether or not an infection in a company indicated bad patch hygiene.  Much debate continues to rage over the initial method of infection, with some reliable sources indicating that the malware was seeded into Ukrainian companies through an update to the ME Doc tax application, and others indicating that it was delivered via a malicious email attachment.

These debates seem to miss the larger point.  While it’s interesting to know how a particular threat like WannaCry or [Not]Petya initially entered a network and the means by which it propagated between systems, there are many possible ways for the next threat to enter, and many ways for it to propagate.  Implementing tactical changes to defend against yesterday’s threat may not be the best plan.

On the Defensive Security Podcast, we poke fun at the blinky box market place, and I particularly rail on Active Directory.  I believe the WannaCry and [Not]Petya outbreaks exemplify the core concerns that we are trying to convey: as an industry, we seem to have gotten away from core principles, like least privilege, and are looking to stack supplementary security technology on top of ill-designed IT systems.  Our security technology mainstays, like antivirus and IPS, are constantly chasing the threats.   Those technologies are wholly ineffective against broadly and rapidly propagating worms, though.  Hopefully, these recent events cause a rethink in fundamental security strategies, rather than a search for the next technology that promises to deliver us from the perils of worms.

Having said that, here are a few fundamental, completely unsexy, things we can do to mitigate these types of attacks in the future:

  1. Use unique local administrator passwords on every endpoint.
  2. Disable network logins for local administrators on every endpoint.
  3. Implement properly designed, limited, and segmented Active Directory permissions.
  4. Implement the secure administrator workstation concept.
  5. Implement network port isolation.
  6. Block connections to Windows services for all systems to and from the Internet.
  7. Disable unused protocols, including SMBv1 (and continue to monitor for newly deprecated protocols).
  8. Apply patches quickly.
  9. Remove local administrator permissions.
  10. Implement application whitelisting.

All recommendations like these have a shelf life.  That is why we need smart people who monitor the threat landscape.  If we do a good job of preventing the basic tactics, the adversary will inevitably move to more complex methods.

Improving The Effectiveness of Vulnerability Remediation Targeting

Many organizations seem to apply a sensible heuristic to patching: patch the systems that are most exposed and valuable first, in descending order of exposure and importance.  The heuristic usually looks something like this:

  1. Internet facing systems – patch first
  2. Critical internal production systems – patch second
  3. Other production systems – patch third
  4. Development, test, and other lab systems, patch last

Workstation patching usually ends up in there somewhere, but is usually performed by a different team and different processes and so a bit orthogonal to this scenario.

The reason for this prioritization of patching is that most organizations don’t apply patches automatically to servers and other infrastructure.  Generally, even when automation is used, there is some amount of testing and sequencing applied to the process.  It makes sense to apply patches in a manner that reduces risk from greatest to least.

I’ve noticed a potential problem with this strategy in the aftermath of the MS17-010 patch in March, 2017, and in the recent Microsoft Security Advisory 4025685. 

First, organizations should be assessing whether a new vulnerability is wormable,  Generally the condition for that is remote, unauthenticated code execution over a network network on some kind of system or service that is common enough for a worm to be a threat.

Second, some consideration for the attack vector should be factored in.  If, as was the case with MS17-010, the vulnerability is in SMB (TCP/445), but none of your Internet facing systems have TCP/445 exposed, prioritizing patches for Internet facing systems over other systems likely doesn’t make sense.  Patching the most critical systems that are the most exposed to that vulnerability should be the heuristic used.  That can be complicated, though.  In the case of something like an SMB vulnerability, the most exposed servers are likely going to be those servers that are accessed by the organization’s workstations via SMB.

And certainly we should be proactively limiting our exposure by following more fundamental best practices, such as not permitting inbound TCP/445 access to systems and disabling SMBv1 in the case of MS17-010.

To sum up, a single heuristic model for prioritizing patches is sub-optimal when trying to reduce risk.  Some additional thought may be required.

P.S., MS Advisory 4025685 appears, to me anyhow, to have the potential to lead to some significant attacks in the near future.  Hopefully you are already limit TCP/445 where possible and are in process of applying the patches.

Limiting Lateral Movement Options With Port Isolation

I had a meeting with some network team members from a government entity recently.  They described a configuration where all of the network ports that workstations connect to are configured with port isolation, which prevents workstations, even on the same VLAN, from communicating with each other over the network.  This feature is available on most network switches.

There are not many use cases I am aware of where workstations need to directly connect to each other.  At least not many that we want to encourage.  Isolating systems in this way seems like a good way to limit lateral movement.  Lateral movement is limited to systems that are “upstream”, enabling a convenient opportunity to monitor for and detect such attacks.

I was initially thinking about this in the context of mitigating impact of network worms in the wake of WannaCry.  However, it seems like the utility in this extends far beyond just worms.

Thinking Graphically To Protect Systems

I recently read this post on TechNet regarding the difference in approaches between attackers and defenders.  Specifically that defenders tend to think of their environment in terms of lists:

  • Lists of data
  • Lists of important systems
  • Lists of accounts
  • etc

But, attackers “think” in graphs.  Meaning that they think of the environment in terms of the interconnections between systems.

I’ve been pondering this thought since I read the TechNet post.  The concept seems to partly explain what I’ve written about in the past regarding bad risk decisions.

My one critique of the TechNet post is that it didn’t (at least in my view) clearly articulate a really important attribute of thinking about your network as a graph: considering the inter-connectivity between endpoints from the perspective of each endpoint.

In our list-based thinking mode, we have, for instance, a list of important systems to protect and a list of systems that are authorized to access each protected system.  What is often lost in this thinking is the inter-connectivity between endpoints down-stream.  As the TechNet article describes it:

“For the High Value Asset to be protected, all the dependent elements must be as protected as thoroughly as the HVA—forming an equivalence class.”

The pragmatic problem I’ve seen is that the farther we get away on the graph from the important asset to be protected, the more willing we are to make security trade offs.  However, because of the nature of the technology we are using and the techniques being successfully employed by attackers, it’s almost MORE important to ensure the integrity of downstream nodes on the graph to protect our key assets and data.

This creates a tough problem for large networks, and I found the comments on the TechNet post slightly telling: “Can you tell me the name of the tool to generate these graphs?”  The recommendations in the TechNet post are certainly good, however often too vague…  “Rethink forest trust relationships”.  That sounds like sage advice, but what does it mean?  The problem is that there doesn’t appear to be a simple or clean answer.  To me, it seems that we need some type of methodology to help perform those re-evaluations.  Or, as I’ve talked about a lot on my podcast, we need a set of “design patterns” for infrastructure that embody sound security relationships between infrastructure components.

Another thought I had regarding graphs: graphs exist at multiple layers:

  • Network layer
  • Application layer
  • User ID/permission layer (Active Directory’s pwn once, pwn everywhere risk)
  • Intra-system (relationship between process/applications on a device)

Final Thoughts (for now)

The complexity of thinking about our environments in graphs shouldn’t dissuade us from using it (potentially) as a tool to model our environment.  Rather, that complexity, to me, indicates that we should likely be thinking about building more trusted and reliable domains (the abstract definition of domain) that relate to each other based on the needs of protecting “the environment”, and less about trying to find some new piece of security technology to protect against the latest threats.

Want To Get Ahead? Create Something!

I get a lot of requests lately asking for career advice.  I’m not sure why, as I don’t feel like I’m a paragon of success or wisdom, but I have tried to help with things like a guide for getting into information security.

Yesterday, I was spending quality time as my dog’s favorite chew toy reflecting on this more and something really obvious hit me.  Painfully obvious.  The most direct way to establish a place in the industry is by creating and contributing.

Think about it: who are the people you respect in this industry?  Even outside this industry?  Why do you respect them?  How do you even know about them?

I will bet it’s because they are creators.

So, my advice for those wanting to get into, or advance in information security is to create something:

  • Conference presentations
  • Open source software
  • Informative blog posts
  • Podcasts

The point is to participate.  It will focus you on getting better and learning.  It will help you meet people.  It will establish you as someone who can add value.

Security Chaos Monkey

Netflix implemented a wonderfully draconian tool it calls the chaos monkey, meant to drive systems, software and architectures to be robust and resilient, gracefully handling many different kinds of faults. The systems HAVE to be designed to be fault tolerant, because the chaos monkey cometh and everyone knows it.

I believe there is something here for information security. Such a concept translated to security would change the incentive structure for system architects, engineers and developers. Currently, much design and development is based around “best case” operations, knowingly or unknowingly incorporating sub-optimal security constructs.

For lack of a better name, I will call this thing the Security Chaos Monkey (SCM). The workings of a SCM are harder to conceive of than Netflix’s version: it’s somewhat straight forward to randomly kill processes, corrupt files, or shut off entire systems or networks, but it’s another thing to automate stealing data, attempting to plant malware, and so on. In concept, the SCM is similar to a vulnerability scanning system, except SCM’s function is exploitation, destruction, exfiltration, infection, drive wiping, credential stealing, defacement and so on.

One of the challenges with the SCM is the extreme diversity in potential avenues for initial exploitation and subsequent post exploitation activities, many of which will be highly specific to a given system.

Here are some possible attributes of a security chaos monkey:

• Agent attempts to copy random files using random user IDs to some location off the server

• Agent randomly disables local firewall

• Agent randomly sets up reverse shells

• Agent randomly starts listening services for command and control

• Agent randomly attempts to alter files

• Agent randomly connects a login session under an administrative ID to a penetration tester

An obvious limitation is that SCM likely would have a limited set of activities relative to the set of all possible malicious activities, and so system designers may simply tailor their security resilience to address the set of activities performed by SCM. This may still be a significant improvement over current activities.

The net idea of SCM is to impress upon architects, developers and administrators that the systems they build will be actively attacked on a continual basis, stripping away the hope that the system is protected by forces external to it. SCM would also have the effect of forcing our IT staff to develop a deeper understanding of, and appreciation for, attack methods and methods to defend against them.