NCSAM Day 12: Down With The Sickness

While I previously wrote that the cloud is not a magical place, I think it’s important to point out that there is a sickness in the IT world.  It’s insidious and seems to hang around Kanban boards like West Nile laden mosquitos hang around a pond.  Of course, I’m talking about exposed S3 buckets and NoSQL/MongoDB databases.

The fundamental issue appears to be that the those who configure these environments do not know what they don’t know.  We need to take down this sickness.  Unfortunately, there is no blinky box that can fix this problem*.  Rather, employee awareness and support are needed.  For example, include a segment in your organization’s mandatory security training to engage the IT or IT security team for guidance on the proper use of such services.  Yes, this may encourage some people who may not otherwise have thought to copy the contact database into an S3 bucket, and may drive up work on the IT team, but it’s better than the alternative.  If you offer help rather that harsh criticism, you may just get people to ask for that help.

I suppose it should go without saying that your organization’s IT and security teams should themselves know how to properly use these services as a start.

*depends on your willingness to believe CASB vendor marketing pitches.  YMMV.

NCSAM Day 11: Test Cases for Security Infrastructure

Recently disclosed details about the Equifax data breach indicate that, in addition to the Apache Struts vulnerability that initially led to the breach, some security tools had stopped working for an extended period of time, and only after those tools were brought back online was the breach detected.

There are many potential reasons for security technology to fail, but quite often we don’t recognize that they have failed, because we are only monitoring for alerts, or simply assuming that the security “thing” is quietly doing its job.  When those technologies do fail and we’re not aware that they’ve failed, a key element of our security program is not working, and if we’re not actively monitoring for those failures, we don’t know that we’re blind or unprotected.

For this reason, I recommend developing a set of ongoing test cases that are implemented along with new security technology to help ensure that the technology is operating as expected and raise an alert when it fails in some way.  For example, a SIEM should be configured to trigger an alert if a log source does not provide a log within a certain timeframe, which may indicate that the logging service died on the host, or some network issue is preventing logs from being sent to the SIEM.  Another example might be a periodic injection of a particular type of network “attack” (in a relatively safe manner, of course) designed to trigger an IPS block and alert, in a manner that tests both the blocking (did the “attack” make it to the destination?) and the alerting (did the “attack” result in a generated alert?).

These test cases should be developed to measure the ongoing effectiveness of all the key functionality that the security technology provides.

NCSAM Day 10: Email Security

Here we are, after decades of security enhancements, blinky boxes, and hundreds of hours of security awareness training and companies still get compromised through email.  My movement to drive everyone back to using pine, mutt, and elm for email has failed miserably, so here are my next best recommendations:

  • Strongly consider not doing email, or at least email filtering, on your own. I don’t advocate for particular technology vendors, but most of the big names, like Proof Point and others, have pretty good mail filtering capabilities that you’re just not going to match.   Save your efforts for security programs that are unique to your organization.  Email is a commodity service these days.
  • Prepend the tag “[external]” to the subject line of incoming email from the Internet to serve as a visual cue for employees. It’s not foolproof, particularly in the context of business email compromises where malicious emails can originate locally, but it can help and gives some fodder for awareness training.
  • If you do use a service, such as Proof Point, that rewrites URLs in emails and/or add the “[external]” tag, be wary of the way in which you run phishing simulation exercises. If the simulation emails appear to come from outside the organization, but do not have the “[external]” tag, or do not have URLs rewritten in the way that all other external emails do, employees will quickly learn to identify the simulation emails based on those characteristics, rather than the characteristics you want them to observe.
  • Tailor awareness training by role. If someone has a job that requires them to open attachments from strangers, such as is the case with recruiters, don’t give them training that tells them not to open such attachments.  At best, it’s confusing.  Rather, provide guidance on the proper means for various roles in the organization to do their jobs in a safe manner.
  • Be aware that every hacker and her dog are trying to get into your organization’s email and act accordingly.  Require two factor authentication for mail access, particularly for any cloud-based mail that is accessible straight from the Internet.

NCSAM Day 9: The Cloud Isn’t A Magical Place

Traditional IT environments generally required the coordination of different people and different teams to turn on a new service.  There might have been a datacenter person involved, a network person, a server person, a firewall person, and an application person involved, each playing a part to install a new server, connect it to the network, install and configure the operating system, install and configure the application, and finally, open expose the application through the firewall.  Some of those functions were consolidated into the same person or team, but in most cases, each function felt ownership for their role and generally had a set of guidelines some level of competence, including knowing what questions to ask, and when to push back if something seems too risky with a planned deployment.

All of this necessarily added up to delays and inefficiencies.  Reducing or eliminating these delays are one of the many benefits that cloud computing offers: we no longer need to rack servers; installing operating systems is automated through orchestration tools; the provider offers an easy to configure software defined network; and so on.  The move to cloud reduces or eliminates many of the IT specializations, like sysadmin, network engineer, or firewall engineer.  In the cloud, those functions no longer exist as specialties, and depending on the way in which cloud is used (for example cloud native versus rehoming server images to the cloud), simply may not be required at all.

The cloud isn’t magical though, and it still requires good security practices, and those must very likely happen without the watchful eye of the delay inducing specialists.  The way that many organizations that successfully adopt the cloud, and related practices, such as devops, is using scripted processes that are designed to ensure environments are created, configured, and managed in a secure(ish) manner.

All this despite most cloud providers’ claims that their cloud is “secure”.  Hopefully it’s apparent what the providers mean, and what they don’t mean:  generally, their description as “secure” refers to the components of the cloud infrastructure that the provider is responsible for managing, and it is understood that the cloud consumer is responsible for managing and securing everything else, which is quite a lot.

Embracing cloud isn’t just saving capital expenses and laying off administrators.  The agility and speed require even tighter processes than traditional IT, but those processes can hopefully be scripted, automated, and orchestrated.  An organization moving to the cloud needs to invest in the right skills and tools to keep the environment secure.  Unfortunately, these skills are in high demand right now, but that is the tradeoff.

NCSAM Day 8: Work on your policies

In many organizations, security policies and standards are unapproachably long and complex, or are so high-level that the reader must be a security expert to fill in missing details.  Security policies, standards, processes, and procedures must be written for the people who need to follow, implement, and interpret them, not for the people that write them.  These documents need to clearly define expectations and outcomes in a way that can be understood and implemented.

For example, a policy might state “You may not copy files containing company confidential information to USB drives.”

But, what about copying those files to other types of devices, like a home NAS drive that is exposed to the Internet?  Or someone’s clever home-brew cloud backup system using an unsecured S3 bucket? Or a cell phone via Bluetooth?  And how should employees legitimately back up their data?  What happens when they need to copy confidential files to a USB drive?  Do they get to figure out the proper controls to apply?

This extends to policies that apply to IT and infosec teams, too.  Define the set of outcomes desired and the proper guard rails that need to be applied, at the appropriate level of specificity based on the type of documentation (policy, process, procedure, and so on), ensure employees are familiar with those documents, and provide help to interpret the requirements for edge cases and fold any lessons learned back into policy enhancements and FAQs.

NCSAM Day 7: Monitor Those AV Logs

Much of the security industry is pretty down on anti-virus, and for good reason: it’s not very effective at blocking many malware infections.  When installed, it is a tool in the toolbox, though, and can be quite valuable.  One major problem with AV is that it’s not always great tool to monitor because if it can detect malware, it can probably block it.   As with many things, the context is important, though.

For example, if your AV product detects and blocks an attempted infection on a workstation, that might be interesting, but likely will not result in any kind of investigation, leading one to question why AV logs should be monitored.  But if that detection happens as the result of a full scan, depending on what was detected and where it was detected, some investigation to find out what happened or wipe/reinstall the system is likely in order.

The story is a bit different on servers: if a server’s AV detects malware, regardless of when the malware was detected, investigation is likely warranted, since servers should generally not encounter malware, and if they do, something is wrong in the environment and should be investigated.  File servers are different still, since endpoints can and will copy malware laden files onto a file server, and that does not indicate that the file server itself is “under attack”, however such events should still be investigated to find and address the culprit.

I once worked on an incident where a web server was compromised.  In the analysis, we could see an adversary found a file upload and separate local file inclusion vulnerability in the web application on the server.  Upon inspecting the AV engine, we found that the AV engine dutifully detected and quarantined various versions of a web shell the adversary was uploading for several days.  Eventually, the adversary found a web shell that the AV engine didn’t detect, and the rest is history.

In summary, collect your AV logs and apply some form of analysis on them.  AV is far from perfect, but it does work at times, and we should pay attention when it does.

NCSAM Day 6: Defend Your Tools

As IT continues to commoditize and organizations drive more efficient operations, IT and security departments continue to implement and rely on automation, management, and orchestration tools.  The function of many of these tools is to manage or enhance security but may not be properly protected.  Tools like Chef, Puppet, Ansible, Vagrant, vulnerability scanners, Active Directory, and many others can provide one stop shopping for an adversary to compromise an environment due to the functionality and level of access these tools have to an IT environment.  Fortunately, we’ve not yet seen widespread exploitation of these tools, but it is happening, and I expect they will become an increasingly important target for adversaries, and likely even automated malware attacks.

The environment these tools operate in need to be resilient against attack.  Here are some guidelines for doing so:

  • Require multi-factor authentication to the operating system and any applications
  • Dedicate the system to the function
  • Prevent inbound and outbound Internet access from the servers these systems operate on and limit inbound traffic from only authorized management hosts. Inbound and outbound traffic should be allowed, as necessary, to only those devices the system needs to connect to as part of the application’s functionality and retrieve software updates, and only on the network ports required.  I *strongly* recommend such systems NOT be managed by Active Directory.
  • Monitor the systems and applications for any evidence of compromise, including file integrity monitoring and/or whitelisting, A/V logs, and firewall logs – particularly looking for unexpected inbound or outbound connection attempts.
  • Workstations that administrators use to manage these tools must similarly be secured, including:
    • Dedicating the workstation to the purpose of administering these tools – no email access, web access, Office applications, and so on.
    • Block inbound and outbound Internet access from these workstations.
    • Blocking ALL inbound network traffic
    • Limit outbound connections to only the systems being managed and that needed to retrieve software updates
    • Monitor the systems and applications for any evidence of compromise, including file integrity monitoring and/or whitelisting, A/V logs, and firewall logs – particularly looking for unexpected inbound or outbound connection attempts.

This all may seem like overkill, but it but consider the level of access these systems have and the destruction an adversary can create by abusing them.

NCSAM Day 5: Wipe that Drive

Despite our best attempts to prevent it, malware infections happen. When it does happen, we need to

respond appropriately to prevent the problem from becoming worse. In my experience, many IT personnel do not understand infections and compromises very well, and often employ very basic response techniques, such as relying on antivirus scans, or the ever popular Malwarebytes scan.  Apparently nothing can evade Malwarebytes. (Side note: despite my cynical tone, I think Malwarebytes is very good, and I pay to run it on all of my and my family’s laptops, but it’s not perfect.)

 

Depending on the nature of the infection (A subsequent post will cover this), the only sure way to remove the infection is to wipe the drive and perform a reinstall. Malware authors and intruders can employ a wide range of techniques to maintain persistence, even if the malware itself is removed. These persistence mechanisms can reinfect the system with the same or new malware, provide other forms of access to an adversary, or destroy data.

 

For this reason, the only effective way to “clean” an infected system is wiping the drive and reinstalling the OS, applications, and data from a backup. It’s important for IT staff to understand this important nuance, and treat infections with the proper diligence. There are techniques emerging that can alter hardware components, such as UEFI and drive firmware, which may render even a wipe and reinstall ineffective, but fortunately these techniques are not yet common.

 

In summary: train your IT organization on the appropriate response to malware infections, which should start with disconnecting the system from the network, then may include making a forensic copy of the affected system and its memory, and finally should generally conclude with the affected system being wiped and reinstalled.

 

NCSAM Day 4: Understanding Lateral Movement Opportunities

As discussed previously, lateral movement is an important technique of many adversaries.  I previously described using port isolation, but there many more avenues for lateral movement, particularly between servers, where port isolation may not be possible, and between systems that need to talk to each other over the network.

In the aftermath of one particularly bad breach, the IT team for the organization I was helping did not understand the potential problem that can arise from placing an active directory domain controller on an external DMZ network.  The placement of this device brought all of the benefits of AD, like single sign on, ID deactivation, privilege assignment, and so on.  But it also required certain network ports to be opened to other domain controllers on the organization’s internal networks.  Once a server was compromised in the external DMZ network, the adversary obtained administrative access that allowed connection to the domain controller located on the same network, and the credentials obtained from that domain controller and the network access to other domain controllers allowed complete compromise of the internal network.

There are many such examples where we implement some control intended to provide some security benefit, but instead creates a means for lateral movement.  Other examples are using Citrix servers as a gateway between trusted and untrusted networks.  While a compromised Citrix server may seem like a benign thing from the perspective of a workstation connecting to the server, adversaries can propagate to connecting workstations through the drive mapping of the connected workstation.

The net point is this: look at all the places that serve as a demarcation point between different zones of trust, like the firewall separating the DMZ from the internal network, or the Citrix server separating an untrusted network from a trusted one, and work to identify the means by with an adversary could move through the boundary, and then implement an appropriate fix to address that lateral movement opportunity, if one exists.

NCSAM Day 3: Watch What You Write

Many of us in the cyber security field came up through IT administration roles.  We are often troubleshooters at heart.  We look at a problem and try to figure out what the cause is to get things back up and running.  In the security world, these are useful traits…  When responding to a security incident, we are generally inhibited by the “fog of war” – we have incomplete information about what is happening and are forced to hypothesize about potential causes, sources, actors, and so on.  As we learn more about the situation, our hypotheses are refined until we know who did what, where, and how.

These skills are vital, but the can also cause problems for your organization if you are not careful.  Sometimes a security incident turns out to be a breach where data is stolen or destroyed, and that can lead to legal actions – either by government agencies or by customers, employees, or others that may be impacted by the incident.  The things we write, particularly in emails, text messages, and so on, may be discoverable in court, and our words used against our organizations.  Investigative activities performed over email, for example, may include speculation about the cause or extent of the incident, and that may turn out to be wrong, or at least an incomplete picture of the situation.  If I’m working with a team to investigate a compromised server we just learned about, I might be inclined to email an update saying “You know, I’ll bet that server team forgot to patch Apache Struts again.  That team is very understaffed and keeps missing patches.”  Hopefully, it’s not hard to see how that statement could be used as evidence that we not only knew about the problem, we actually caused it through our actions.

At the same time, we need to communicate during incidents, and we necessarily need to hypothesize about the causes.  But, we can do so without running ourselves and our organizations up the flag pole.  Here are some recommendations:

  1. Speak to your organization’s legal counsel about this topic, and if possible, set some ground rules for how to manage communications during an incident. Rules vary by jurisdiction, and I am not a lawyer in any of them, so you should see the advice of an expert in your area
  2. Do not make legal conclusions in written communications. For example, do not write “we just violated HIPAA by sending that file to the wrong person.”  There is a lot of nuance in law that we in IT land may not understand.  Instead, communicate what is known without a conclusion.  In this example, a better statement would be “We appear to have sent a file containing PHI to the wrong person.”  I am sensitive to the fact that making the harsh statement can be more motivational than the factual statement, but keep in mind that it may end up being your words printed in a media article or presented in court about the breach.
  3. Keep communication clear, concise, and free of emotion and speculation. This can be difficult to do in an incident situation, where time is short, tension is high, and we may be tempted to release some inter-office drama.  But this is not the time or place for such things.  For example, do not write “I don’t know who started it, but Jerry has already managed to opened a malicious attachment and caused an outbreak four times this month, so I’ll bet it’s him.  His management team just doesn’t care, though, because they love his hair.”  Instead, say “The infection appears to have originated from a workstation.  We will prioritize investigating sources we’ve seen in the recent past.”
  4. If and when you do need to hypothesize and speculate about causes, do so on a phone call where the issue can be discussed and resolved without leaving a potentially ugly paper trail of incorrect speculation.
  5. Above all else, we must act ethically. The intent of this post is not to provide guidance on how to hide incidents, but rather to ensure that any reviews of the incident are not contaminated with office politics, incorrect speculation, hyperbole, and IT people declaring the organization “guilty” of some bad act.