Treating The Disease of Bad IT Design, Rather Than The Symptoms

I have a lot of opportunities to see and think about how IT security disasters play out.  I talk a lot about how to help avoid these on the Defensive Security Podcast, and I’ve written a good bit here in infosec.engineering.  There are many causes of weaknesses that lead to intrusions and data breaches such as underinvestment, bad processes, malicious or careless insiders, and so on.  I am growing more convinced, though, that a significant factor, particularly in the extent of intrusions, arises from poor design of IT systems.

IT environments in average organizations are exceedingly complex, in terms of the number of entry points and paths that an adversary can use for lateral movement (see Thinking Graphically to Protect Systems).  There is little formalized guidance on how to design an IT environment, and much of the time, there are nearly unlimited ways of connecting and arranging things in a way that “works”, in terms of meeting the core intent of the system.  We are at the mercy of the imagination of the architects who design these systems to foresee the potential for abuse and properly design in concepts like least privilege.  Most of the time, the people working in those design roles aren’t familiar with many of the techniques that adversaries use.

Much of what we do in IT security is treat the symptoms of the underlying disease; the disease being badly designed IT.  We try to apply increasingly sophisticated antivirus software, next-gen firewalls, and so on, to mitigate risks to the environment.  To make our environments more resilient, we need to spend some time tackling the disease.  It’s extremely difficult to make fundamental or large-scale changes to existing IT.  At the same time, IT in most organization is a constantly changing organism, meaning there are likely opportunities to inject more robust design patterns incrementally.  By that, I mean that we are generally always upgrading some component, optimizing or redesigning some aspect of the network, and so on.  There are fortuitous changes happening in the IT landscape, such as the move to cloud, which may present opportunities to make fundamental improvements.  Some organizations end up in the unfortunate position of having an opportunity to start over – such as was the case with Sony Pictures, and many of the victims of the NotPetya worm.

As I previously mentioned, my experience, and the experiences of many colleagues I’ve discussed this with, is that failure modes (of the malicious intruder sort) are often not considered, or are only given passing thought because of a variety of factors including schedule and cost limitations, ignorance of threat actor techniques, and the fact that IT is an art form, and sometimes IT people just want to be artists.

It seems to me that the industry would do well to establish modular IT infrastructure design patterns that are both very specific in terms of configuration, scalable, and laid out in such a way that various building blocks can be linked together to form the foundational IT infrastructure.  There may be building blocks that are effectively “frameworks” (though not in the manner of the NIST Cyber Security Framework) where oddball or specific systems and applications operate standard,  This would become a set of design patterns that are cost efficient, resilient, and modeled after best practices and tuned based on changes in technology and new understanding of deficiencies in the original design.

The idea here is to develop an approach that removes much of the design weaknesses in organizational IT environments by providing an objective set of “tried and true” design patterns that IT people can use, rather than designing a half-assed, difficult to secure environment because those IT people are ignorant of how their custom designs can be exploited.  I see a lot of parallels here to encryption (though I admit it is a tenuous comparison): it’s mostly accepted in the IT world that designing your own custom encryption scheme is a bad idea, and that the most effective approach to encryption is using accepted standards, like AES, that people a lot smarter than the average IT person has designed and demonstrated to be robust.  Also like encryption algorithms, IT environments tend to be vastly complex, and weaknesses difficult to spot to an IT layperson.  We will get the occasional DES and Dual EC DRBG, but that risk seems far preferable to creating something custom that is easy to break.

The move to cloud, virtualization, and infrastructure as code provide an excellent opportunity for such a concept to be used by IT teams with minimal effort, if these design patterns exist as Vagrant/Ansible and SDN style configuration files that can be tailored to meet specific needs, particularly in the area of scale, dispersion across locations, and so on.

Is anyone working on such a thing?  If not, is there any interest in such a thing?

Reflecting on the Need For Cyber Resilience

The recent NotPetya attacks disrupted the IT operations of many firms world wide, and as I write this, the Internet is littered with media reports of companies still struggling to return to operations nearly two weeks after the outbreak.  This style of attack is not new: we saw it in the Shamoon attack on Saudi Aramco, and in the Dark Seoul attack in South Korea, and in the attack on Sony, and most recently in WannaCry and now NotPetya.  I will readily admit that if you become the target of a government operation, the outcome is nearly assured no matter what you do to prepare.  NotPetya, though, should highlight to us that we don’t necessarily have to become collateral damage in “cyber war” between countries if we design and operate our systems appropriately.

IT security, at its core, is a set of trade-offs, though.  Some are good, some bad, the implications of some are understood, but often they are not.  I guess the best way to think about it is “we don’t go to cyber war with the network we want; we go to cyber war with the network we have”.  Recognizing that and the rapidly evolving techniques used by the more advanced adversaries, as well as the terrible track record those advanced adversaries have of keeping those tools and techniques secret, we need to recognize the need for cyber resilience.   I recognize the term “cyber resilience” is not going over well with many of you, but I don’t currently have a better term for the concept.  I believe it is important to distinguish cyber resilience from traditional disaster recovery programs.  I work a lot with US-based banks and US banking regulators that are part of the FFIEC.  A lot of my thinking has been shaped by the programs and guidelines the FFIEC has established in recent years regarding cyber resilience, and reflecting on that, I see many linkages between their guidance and these recent destructive attacks.

Many organizations view disasters as purely geographical events, and their programs are designed to address that.  Off-site backups, hot, warm, or cold systems in remote data centers, and so on.  These make good sense when the threat being mitigated is a fire, flood, tornado, or volcano.  But cyber attacks happen orthogonal to location – along technology boundaries, rather than on geographic boundaries.  A great example was Code Spaces, who operated their environment in AWS.  Code Spaces promised a highly fault tolerant environment, replicating data across multiple AWS data centers on multiple continents.  Sadly, when their AWS keys were stolen, attackers deleted all traces of Code Space’s data from all those redundant data centers.  In the recent NotPetya attacks, a number of victims had their entire server environments wiped out, including replicated systems, geographically distributed fail over systems, and so on.  Consider what would happen to an organization’s geographically distributed Active Directory infrastructure during a NotPetya outbreak.  There is no restoration; only starting over.

Maybe starting over is good, but I’m guessing most victims impacted in that way would rather plan the project out a bit better.

That takes me back to cyber resilience.  I recognize that most of us don’t see our organizations as being in the line of fire in a cyber war between Russia and the Ukraine.  I am sure the three US hospitals hit by NotPetya certainly didn’t consider themselves in that firing line.  It is hard to predict the future, but it seems like a safe bet that we are going to see more attacks of the Dark Seoul and NotPetya variety than less.  And as time goes on, we are all becoming interconnected in ways that we may not really understand.  If your organization is sensitive to IT outages in its operations, I would recommend putting some focus on the concept of cyber resilience in your strategic plans.  At the risk of offending my friends in the banking industry, I’d recommend taking a look at the FFIEC’s business continuity planning material.  It has some good ideas that may be helpful.