NCSAM Day 4: Understanding Lateral Movement Opportunities

As discussed previously, lateral movement is an important technique of many adversaries.  I previously described using port isolation, but there many more avenues for lateral movement, particularly between servers, where port isolation may not be possible, and between systems that need to talk to each other over the network.

In the aftermath of one particularly bad breach, the IT team for the organization I was helping did not understand the potential problem that can arise from placing an active directory domain controller on an external DMZ network.  The placement of this device brought all of the benefits of AD, like single sign on, ID deactivation, privilege assignment, and so on.  But it also required certain network ports to be opened to other domain controllers on the organization’s internal networks.  Once a server was compromised in the external DMZ network, the adversary obtained administrative access that allowed connection to the domain controller located on the same network, and the credentials obtained from that domain controller and the network access to other domain controllers allowed complete compromise of the internal network.

There are many such examples where we implement some control intended to provide some security benefit, but instead creates a means for lateral movement.  Other examples are using Citrix servers as a gateway between trusted and untrusted networks.  While a compromised Citrix server may seem like a benign thing from the perspective of a workstation connecting to the server, adversaries can propagate to connecting workstations through the drive mapping of the connected workstation.

The net point is this: look at all the places that serve as a demarcation point between different zones of trust, like the firewall separating the DMZ from the internal network, or the Citrix server separating an untrusted network from a trusted one, and work to identify the means by with an adversary could move through the boundary, and then implement an appropriate fix to address that lateral movement opportunity, if one exists.

NCSAM Day 3: Watch What You Write

Many of us in the cyber security field came up through IT administration roles.  We are often troubleshooters at heart.  We look at a problem and try to figure out what the cause is to get things back up and running.  In the security world, these are useful traits…  When responding to a security incident, we are generally inhibited by the “fog of war” – we have incomplete information about what is happening and are forced to hypothesize about potential causes, sources, actors, and so on.  As we learn more about the situation, our hypotheses are refined until we know who did what, where, and how.

These skills are vital, but the can also cause problems for your organization if you are not careful.  Sometimes a security incident turns out to be a breach where data is stolen or destroyed, and that can lead to legal actions – either by government agencies or by customers, employees, or others that may be impacted by the incident.  The things we write, particularly in emails, text messages, and so on, may be discoverable in court, and our words used against our organizations.  Investigative activities performed over email, for example, may include speculation about the cause or extent of the incident, and that may turn out to be wrong, or at least an incomplete picture of the situation.  If I’m working with a team to investigate a compromised server we just learned about, I might be inclined to email an update saying “You know, I’ll bet that server team forgot to patch Apache Struts again.  That team is very understaffed and keeps missing patches.”  Hopefully, it’s not hard to see how that statement could be used as evidence that we not only knew about the problem, we actually caused it through our actions.

At the same time, we need to communicate during incidents, and we necessarily need to hypothesize about the causes.  But, we can do so without running ourselves and our organizations up the flag pole.  Here are some recommendations:

  1. Speak to your organization’s legal counsel about this topic, and if possible, set some ground rules for how to manage communications during an incident. Rules vary by jurisdiction, and I am not a lawyer in any of them, so you should see the advice of an expert in your area
  2. Do not make legal conclusions in written communications. For example, do not write “we just violated HIPAA by sending that file to the wrong person.”  There is a lot of nuance in law that we in IT land may not understand.  Instead, communicate what is known without a conclusion.  In this example, a better statement would be “We appear to have sent a file containing PHI to the wrong person.”  I am sensitive to the fact that making the harsh statement can be more motivational than the factual statement, but keep in mind that it may end up being your words printed in a media article or presented in court about the breach.
  3. Keep communication clear, concise, and free of emotion and speculation. This can be difficult to do in an incident situation, where time is short, tension is high, and we may be tempted to release some inter-office drama.  But this is not the time or place for such things.  For example, do not write “I don’t know who started it, but Jerry has already managed to opened a malicious attachment and caused an outbreak four times this month, so I’ll bet it’s him.  His management team just doesn’t care, though, because they love his hair.”  Instead, say “The infection appears to have originated from a workstation.  We will prioritize investigating sources we’ve seen in the recent past.”
  4. If and when you do need to hypothesize and speculate about causes, do so on a phone call where the issue can be discussed and resolved without leaving a potentially ugly paper trail of incorrect speculation.
  5. Above all else, we must act ethically. The intent of this post is not to provide guidance on how to hide incidents, but rather to ensure that any reviews of the incident are not contaminated with office politics, incorrect speculation, hyperbole, and IT people declaring the organization “guilty” of some bad act.

NCSAM Day 2: Network Isolation

There is a nearly endless list of ways that an adversary can compromise an organization’s workstation, from USBs in the parking lot to malware laden email attachments.  We should design out environments to account for the eventuality that one or more workstations will get compromised by an aggressive adversary.

Enabling port isolation on your wired networks and client isolation on your wireless networks limits opportunities for lateral movement between workstations.  Isolation, of course, will not prevent all lateral movement opportunities, but if implemented properly, it can significantly limit the ability for an adversary to hop from workstation to workstation across a local subnet, collecting credentials, and will force the use of potentially more noisy/easier to detect techniques.  The name of the game is making the lives of adversaries more difficult, take longer to accomplish objectives, and make more noise in doing so.

I once had a discussion with an unnamed person from an unnamed agency that told me that part of the agency’s penetration testing regiment includes connecting a drop box of the pen tester’s choosing to the agency’s wireless or wired networks (including an LTE modem for out of band access), to simulate a workstation being compromised and needing to rely on other aspects of the infrastructure to protect systems and data from further compromise.  Port isolation was part of the strategy for that agency.

The downside implementing isolation is that it requires much more deliberate design of common services, like the placement of printers and scanners.  Coincidentally, one of the other upsides to implementing isolation is that it also requires much more deliberate design of common services, like the placement of printers and scanners.

Assessing Risk Assessments

I recently finished listening to the book titled “Suggestible You”.  The book is fascinating overall, but one comment the author made repeatedly is that the human brain is a “prediction machine”.  Our brains are hardwired to make constant snap predictions about the future as a means of surviving in the world.

That statement got me to thinking about IT security, as most things do.  We make predictions based on our understanding of the world…  can I cross the street before that car gets here?  I need to watch out for that curb.  If I walk by the puddle, a car will probably splash me…  and so on.  IT risk assessments are basically predictions, and we are generally quite confident in our ability to perform such predictions.  We need to recognize, however, that our ability to predict is limited by our previous experiences and how often those experiences have entered our awareness.  I suspect this is closely related to the concept of availability bias in behavioral economics, where give more weight to things that are easier to bring to mind.

In the context of an IT risk assessment, limited knowledge of different threat scenarios is detrimental to a quality result.  Our challenge then is that the threat landscape has become incredibly complex meaning that it’s difficult, and possibly just not practical, to know about and consider all threats to a given system. And consider that we generally are not aware of our blind spots: we *think* we have enumerated and considered all of the threats in a proper order, but we have not.

This thought drives me back to the concept of standard “IT building blocks“, that have well documented best practices, risk enumerations, and interfaces with other blocks.  It’s a highly amorphous idea right now, but I don’t see a better way to manage the complexity we are currently faced with.

Thoughts appreciated.  More to come as time permits.

Prioritizing vulnerability Remediation

As we’ve seen in the past events such as WannaCry and the Equifax breach, timely vulnerability remediation is a challenge for many organizations.  Ideally, all vulnerabilities would be fixed as soon as they are discovered, and patches applied immediately upon release, however that’s often not an option.  For example, patches often need to be tested to ensure nothing breaks, and patching often requires reboots or service restarts, which must be done during a change window.  All of this takes coordination and limits the throughput of applying patches, and so organizations end up adopting prioritization schemes.  Most organizations prioritize remediation based on a combination of the severity of the vulnerability (CVSS score) and the exposure of assets (such as Internet-facing), however the vast majority of vulnerabilities are never exploited in the wild. The team at Kenna Security published a paper that indicates less than two percent of vulnerabilities end up being exploited in the wild and proposes some alternative attributes to help more effectively prioritized remediation.  This is an excellent paper, but the challenge remains: it’s difficult to predict which vulnerabilities actually end up being exploited.

Last week, a researcher posted a link paper written for USENIX on prioritizing vulnerabilities to the Security Metrics mailing list.  The paper describes a method of rapidly detecting vulnerability exploitation, within 10 days of vulnerability disclosure, by comparing known vulnerable hosts to reputation blacklists (RBLs), on the theory that most vulnerability exploitation that happens in the wild ends with the compromised host sending spam.  The authors claim to achieve 90% accuracy in predicting whether there is active exploitation of a vulnerability under analysis.

While I see a few potential issues with the approach, caveated by the fact that I am no where near as smart as the authors of this paper, this is the sort of approach that we need to be developing and refining, rather than the haruspicy that we currently use to prioritize vulnerability remediation today.

System Restore Points

This is one for the home users…

A while back, I was unfortunate enough to be affected by the Windows 10 build 1803 issue that caused systems with certain SSDs to crash during the update and become unbootable.  Frustratingly, I realized that system restore points are not enabled by default on Windows 10 when I tried to recover.  After a few maddening hours in the recovery console, I decided to reinstall Windows, which is probably not a bad thing to do periodically.

One I was back up and running though, I did enable system restore points – instructions are here, if you aren’t sure how.   I later realized that, most unhelpfully, Windows wasn’t actually creating system restore points automatically.  Fortunately, some helpful people wrote this guide on setting up a scheduled task to create system restore points every day.

Be aware that system restore points ARE NOT BACKUPS.  Restore points will help in some instances, such as when MS forgets to QA a Windows release (or whatever happened with 1803) and you’re left with an unbootable system or introduced some other instability.  Be aware that most destructive malware will disable/delete restore points and they won’t save you from drive failures, stolen devices, or defenestration.  I personally use iDrive for backups since CrashPlan pulled out of the consumer market.  It was really cheap – I think I got a year for about $6 – but that was during “National Back Up Your Crap Month” (or whatever that holiday is called) – and it lets you set an encryption key that prevents (allegedly) iDrive the company from being able to recover your data without the key only you have.

 

*** Edit ***

Well, I got too excited…  Apparently Windows feature upgrades (such as the one that laid waste to my computer) intentionally disables and deletes system restore points as part of the upgrade process.  Details here.  Dammit Microsoft… (Thanks to @galaxis@mastodon.infra.de for pointing this out)

Data Breaches and Randomness

The field of information security is a prime example of making decisions under uncertainty.  Generally, there is far more to do than can be done, and therefore we must make priority decisions of what to protect, where to invest, how and who to train, and so on.  We know that we cannot create a perfectly secure system that retains some useful business value, beyond that of a doorstop or paperweight.

I recently started listening to Annie Duke’s book “Thinking in Bets: Making smarter decisions when you don’t have all the facts”.  Like others in the field of behavioral economics, Mrs. Duke cautions us against the phenomenon of outcome bias and hindsight bias.  Basically, we should reward good process, not good outcomes that might be/likely are the result of pure luck.  This quote particularly resonated with me:

“An unwanted result doesn’t make our decision wrong if we thought about the alternatives and probabilities in advance and allocated our resources accordingly.”

This, by the way, is why we should be wary of management teams that purport to be “outcome based”.  This means that management will almost certainly value luck over sound decision making.  As with any random process, a “lucky” executive that was just promoted will very likely come to understand the concept of “regression to the mean”.

In the IT security world, we shame people and organizations that have a breach.  We look at what happened with the benefit of hindsight and conclude that any reasonable person could have foreseen the breach, therefore condemning the person or organization as incompetent or negligent.  Oddly, though, we (often) don’t know whether any of us could have made better decisions than the person or organization involved in a breach.  Possibly the breached organization made reasonable priority and investment decisions but got unlucky.  Or maybe the organization made crappy decisions and the breach really was inevitable.  We never get to consider the contra scenario, where an organization makes crappy security decisions but gets lucky.

Luck shouldn’t be part of our strategy to defend the assets we are charged with protecting, but good process should.  By that logic, we should direct our criticisms at organizations that have bad processes, whether they were breached or not.

Hopefully this is unsurprising and intuitive, at least upon hearing it.  There are two problems with applying this concept to infosec:

  1. There is no objective way to tell whether a breached organization fell victim to crappy planning or had good planning and got unlucky. We can’t rely on the organization itself to help us out there.  In some rare cases, we do get to make informed decisions based on the civil or criminal court proceedings.
  2. Unlike a poker player, as in Mrs. Duke’s book, who gets unlucky during a card game, when modern organizations are unlucky enough to suffer a breach, it is not just the organization itself that is harmed. Quite often, those harmed are customers, business partners, employees, and others.

For those of us who are harmed in data breaches through no fault of our own, we can’t simply accept that the breach organization was just “unlucky”.  We believe that the breach happened is evidence that the organization was not doing enough to protect its systems.  This gets to the heart of fundamental philosophical issue facing organizations in the age of pervasive data: unlike almost any other business risk that an organization faces, the harm from many breaches are not borne by the organization itself.  Organizations are playing poker both with its own chips, and with the chips of the people it stores data on.

Regardless, organizations do not have perfect visibility into threats, nor do they have unlimited budgets, and so long as they handle such data, they will be making decisions on how to protect the data.  Some organizations will find that items that fell “below the line” on the priority list created gaps that lead to a breach.  Others will get lucky.

Laws like the GDPR will help, because the GDPR raises a possible significant fine and civil liability from not properly protecting personal data.  I am skeptical that we will see any noticeable decline in data breaches after the law takes effect, because at the end of the day:

  1. We do not have perfect security
  2. Breaches are the result of effectively random processes

Human Error as a Cause of Data Breaches

Australia recently enacted a new law that requires organizations to disclose breaches of personal data.  The Australian Information Commissioner released its Quarterly Statistics Report for Q1 2018 and one of the findings is that about half of the breaches were caused by “human error”.

It occurs to me that human error is at the root of all these breaches.  Those attributed to human error are instances where (apparently) the last action in the chain of events that lead to the breach was conducted by a human making a mistake.

But after that, it gets more nuanced.  If we start following back in the chain of events, it seems like we will always end up with human error as a cause.  For those that are “malicious or criminal attacks”, doesn’t the person clicking on a malicious link count as an error?  Doesn’t the person who designed the environment in a manner that allowed clicking on a malicious link to lead to a data breach count as human error?  What about the IT person that ignorantly exposed RDP to the Internet with a weak password?  Or the manager that decided to save some money when implementing a system that caused patches to be delayed because the system can’t be taken down during the week?  Aren’t all of those human errors, too?  Why does the fact that some opportunistic criminal took advantage of these “upstream” human errors cause us to think about their causes differently?

Oh, and for those that ARE in the human error category in the report, I suspect that, similarly, the cause of the breach was not necessarily the person that “made the error”, but rather the person that designed the process in such a manner that allowed such errors to lead to a breach.

It seems clear to me that we really only consider the “end nodes” in the chain of events that lead to a data breach, and I suspect we will not make material improvements until we accept that we need to begin dealing with the actual causes of breaches, which happen much earlier in the chain.

Cyber Security Immaturity

It is difficult to make improvements in some system without an understanding of the problem that needs to be addressed.  In the IT security world, we face many problems.  It’s clear to me, though, that one of the main problems we have is, well, agreeing on what the problems are.  A great example is a recent post I read by Rob Graham on the Errata Security Blog regarding the advice to use “strong passwords”.  While the post itself is a good read, I found the comments to be much more interesting.  Granted, the comments only involve a small number of people, but in my experience, they exemplify a broader issue in the IT security world: the objectively “right” approach to address some security challenge is determined by the perspective, experiences, and subjective judgement of each practitioner.  In some ways, that’s a good thing.  I recently listened to the book “Deviate: The Science of Seeing Differently” and in it, the author made the point that breakthroughs often only come when people don’t know that they shouldn’t ask a particular question. In other respects, it seems clear to me that we should not be in a place where there is such disagreement between whether the advice to use strong passwords is good or not.

In this particular instance, I suspect much of the debate stems from two things:

  1. lack of a consistent understanding of modern threats that face password-based authentication systems
  2. lack of a consistent view of what kind of systems we are trying to protect

I suspect, for example, that most people that come from large organizations will view weak passwords as much more of a problem than password reuse, whereas incident responders and those who manage consumer-oriented Internet services will see password reuse as much more problematic.

I hope and expect that this difference of perspective leading to different views is intuitive.  What may not be so intuitive, though, is that our own views on the “objectively right” approach to address some security concern is colored by our knowledge and experience, and may not be a universal truth, and we should always be questioning ourselves, our beliefs, and our approaches to problems.

/LunchTimeThoughts

Treating The Disease of Bad IT Design, Rather Than The Symptoms

I have a lot of opportunities to see and think about how IT security disasters play out.  I talk a lot about how to help avoid these on the Defensive Security Podcast, and I’ve written a good bit here in infosec.engineering.  There are many causes of weaknesses that lead to intrusions and data breaches such as underinvestment, bad processes, malicious or careless insiders, and so on.  I am growing more convinced, though, that a significant factor, particularly in the extent of intrusions, arises from poor design of IT systems.

IT environments in average organizations are exceedingly complex, in terms of the number of entry points and paths that an adversary can use for lateral movement (see Thinking Graphically to Protect Systems).  There is little formalized guidance on how to design an IT environment, and much of the time, there are nearly unlimited ways of connecting and arranging things in a way that “works”, in terms of meeting the core intent of the system.  We are at the mercy of the imagination of the architects who design these systems to foresee the potential for abuse and properly design in concepts like least privilege.  Most of the time, the people working in those design roles aren’t familiar with many of the techniques that adversaries use.

Much of what we do in IT security is treat the symptoms of the underlying disease; the disease being badly designed IT.  We try to apply increasingly sophisticated antivirus software, next-gen firewalls, and so on, to mitigate risks to the environment.  To make our environments more resilient, we need to spend some time tackling the disease.  It’s extremely difficult to make fundamental or large-scale changes to existing IT.  At the same time, IT in most organization is a constantly changing organism, meaning there are likely opportunities to inject more robust design patterns incrementally.  By that, I mean that we are generally always upgrading some component, optimizing or redesigning some aspect of the network, and so on.  There are fortuitous changes happening in the IT landscape, such as the move to cloud, which may present opportunities to make fundamental improvements.  Some organizations end up in the unfortunate position of having an opportunity to start over – such as was the case with Sony Pictures, and many of the victims of the NotPetya worm.

As I previously mentioned, my experience, and the experiences of many colleagues I’ve discussed this with, is that failure modes (of the malicious intruder sort) are often not considered, or are only given passing thought because of a variety of factors including schedule and cost limitations, ignorance of threat actor techniques, and the fact that IT is an art form, and sometimes IT people just want to be artists.

It seems to me that the industry would do well to establish modular IT infrastructure design patterns that are both very specific in terms of configuration, scalable, and laid out in such a way that various building blocks can be linked together to form the foundational IT infrastructure.  There may be building blocks that are effectively “frameworks” (though not in the manner of the NIST Cyber Security Framework) where oddball or specific systems and applications operate standard,  This would become a set of design patterns that are cost efficient, resilient, and modeled after best practices and tuned based on changes in technology and new understanding of deficiencies in the original design.

The idea here is to develop an approach that removes much of the design weaknesses in organizational IT environments by providing an objective set of “tried and true” design patterns that IT people can use, rather than designing a half-assed, difficult to secure environment because those IT people are ignorant of how their custom designs can be exploited.  I see a lot of parallels here to encryption (though I admit it is a tenuous comparison): it’s mostly accepted in the IT world that designing your own custom encryption scheme is a bad idea, and that the most effective approach to encryption is using accepted standards, like AES, that people a lot smarter than the average IT person has designed and demonstrated to be robust.  Also like encryption algorithms, IT environments tend to be vastly complex, and weaknesses difficult to spot to an IT layperson.  We will get the occasional DES and Dual EC DRBG, but that risk seems far preferable to creating something custom that is easy to break.

The move to cloud, virtualization, and infrastructure as code provide an excellent opportunity for such a concept to be used by IT teams with minimal effort, if these design patterns exist as Vagrant/Ansible and SDN style configuration files that can be tailored to meet specific needs, particularly in the area of scale, dispersion across locations, and so on.

Is anyone working on such a thing?  If not, is there any interest in such a thing?