Thursday, October 30, 2008

Beware the key

USB keys are prevalent. They are used heavily by many incident response teams and first responders. They're less fragile than CD's, faster and offer greater storage.

They are also weapons of destruction and can become fast victims of compromised systems. It's been estimated that 10% of malware has the ability to infect removable media devices. Recall if you will that old is new. When you respond in an incident, you'll want to take some precautions if you use USB devices.

1) Make sure your devices are wiped and formatted after each case. If your device infected, your device becomes a weapon.

2) Create a directory named Autorun.inf in the root of your devices. This offers some protection against autorun malware.


To protect your Windows workstations if you haven't already, do the following things.

Copy and paste this in to a .reg file and merge it.

REGEDIT4
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\IniFileMapping\Autorun.inf]
@="@SYS:DoesNotExist"


Follow the instructions here:
http://support.microsoft.com/kb/953252

Monday, October 20, 2008

OODA at play

Imagine the following scenario:

You've identified a system communicating with an botnet C&C over IRC. This system happens to be a system that should never be communicating via IRC. It's a webserver. It's running multiple Vhosts and has multiple IP addresses. The IRC connection is active. You check in with the system administrator and inform him of the situation. You discover that the webserver is merely serving public data. It doesn't process or store sensitive information. It's a good case for root cause analysis, eradication and rebuilding.

The system administrator calls you back and says it looks like SSH binaries have been replaced on the system. The administrator happens to be running cfengine and informs you that a large amount of systems have had ssh binaries replaced. What was a run of the mill investigation and analysis just blew up and turned in to an incident for which there is no playbook. Friends, this is a triage situation.


Triage is less about solving the problem as it is about prioritizing systems and stopping the bleeding to buy time to properly assess the situation, and react appropriately. The problem with triage is business continuity. Triage situations would be much easier if we could identify all of the affected systems, contain systems based on priority and threat, and move to more thorough response and analysis. Unfortunately we can't do that. The systems that need to be contained more often than not, can't be contained because they are critical to operations, meaning they can't be shut down.

Returning to the incident at hand. Over 50 systems have had SSH binaries replaced. At this point we need to triage the situation. Were we dealing with human beings, this would be a mass casualty incident and a methodology called START is applied to the situation. When dealing with human beings in an MCI, the priority goes to the most critical patient that can't survive long without immediate treatment. The job of the people performing triage is to assess only. No care is provided except opening airways and tending to patients that are bleeding severely. A good starting point is here. People get classified in to the following categories:

Dead
Immediate
Minor
Delayed

There's a lot that can be taken from this type of real triage in a mass casualty situation and applied to Incident Response when dealing with a lot of systems.


What kind of systems do we typically come across? Let's use the incident I mentioned above. Assume 50 systems. Assume the attacker is actively attacking and compromising systems. There are obvious limitations to physically visiting each system. So what can we do? Assess the situation from the network. In a few easy steps we can triage the situation. With 50 systems its rare that you would find different attackers and different methodologies being used against you. So, we make an assumptive hypothesis based on the following premise. Cfengine detected ssh binary replacement on 50 systems, therefore the attack signature will be similar across systems. In addition, we can assume that very few remote systems will be used in such an attack. So what can be done to triage?

We can quickly divide the systems in to the following categories:

4) Systems that can't be blocked at the perimeter
3) Systems that can't be taken offline (network or power)
2) Systems that that can be blocked at the perimeter (internally critical systems)
1) Systems that can be taken offline (network or power)


Now you might be asking why is priority 1 a system that can be taken offline rather than the system that can't be taken offline? The idea is simple. If I can take it offline, then I should do so by whatever means are necessary. If I can't take the system offline, the task of response is more advanced. Assign the system administrators or other tech staff the role of identifying and containing the systems that can be quickly contained. The idea being that if you are hemorrhaging from 50 holes, and can close 30 of them then you've cut down the tedious work by 60%. Get them under control and off of the immediate concern list.

If I can block a host at the perimeter, then I should do so, quickly. This is a solution that can work to directly cut off the attacker, however with so many systems, there is no way to guarantee the effectiveness of this type of action. An indirect attack is still very possible. Sometimes though, you just have to make a decision, and adapt.

If I can't take a system offline, and I can't block it at the perimeter then I need to respond quickly and carefully. These are the business continuity cases that hamper triage and response. So what can be done to triage them? Remember we're buying time, not solving the problem 100%.

If we work based on our assumptive hypotheses, we can enable a perimeter block to stop the remote sites from being accessed by any of the compromised or soon to be compromised systems.

Have you noticed the OODA loops?

As systems are being contained - via network blocks and physical containment - more compromised systems begin actively attacking. Port scanning begins on internal hosts. Initial triage, while containing 60% of systems left an opening. Once again, division of forces is key to success. With two IR staff, one can work on active containment, while the other works to gather more intelligence.

F-response is a fantastic intelligence gathering tool in this case. Using it, a remote system can be analyzed in real time. Connecting to a system, being able to identify the files created/modified/accessed during the attack lends itself to a more rapid action cycle. Combined with traffic captures and live access to disk based data, we can break in to the OODA loop of the attacker. We can predict what the tools being used will be used for and what files get replaced. We can predict what the attacker will do at each step and can develop a rapid active response to stop him before he begins. With the situation unfolding and new information, further containing the systems that couldn't be taken offline or blocked at the perimeter becomes simple. With a tool like cfengine, a few commands can remove the active threat and we can continue working the problem.

As the situation is contained, a signature is developed and active monitoring is implemented to watch for other systems showing signs of intrusion.

Saturday, October 18, 2008

The clock is ticking

When an incident has been detected or an event has been escalated to incident status a timer starts. The attacker is now inside your OODA loop. Every minute wasted could be money lost, identities stolen, disruption of operations. He/She controls of, or has access to something of yours and can disrupt your ability to determine the correct measures. The speed and accuracy of your response will make all the difference.

Regarding the OODA loop, there's one thing to remember. The attacker has the initiative, we have to play catch up and out maneuver. In a non automated attack scenario, an attacker has presumably done a reasonable amount of homework on the target host or target network. In the majority of scenarios the attacker has been entrenched in a system for hours, days, weeks or even months before their presence is detected. They are already two or more steps ahead of us. As Incident Responders we are at an immediate disadvantage and we have many foes working against us. Not just the intruder, but many times the local IT staff work against us, albeit unintentionally for the most part.

So, how do we, as incident responders react? What must we do to be effective?

Our response must be fast, accurate, appropriate. What does our OODA loop look like?


Observe:
Confirm Incident - Are we certain we're dealing with an incident?

Threat assessment - What's the threat doing? Is it actively attacking or scouring systems for data? What's the depth of the penetration? What has local staff done already?

Prior Reference - Have we seen this before? What happened then? What's different?

Victim assessment - Is sensitive data present on the system? Where is sensitive data stored, how is it processed?

Business Continuity assessment - Can the system be shut down? How long can the system be down? If the system goes down, what is impacted?

Defense mechanism assessment - What options do we have for containment? How quickly can we enable them?

Orient:

In this portion of the loop we take our various assessments and synthesize and analyze our results. We must weigh them against each other and they feed one another. This is by far the most thought intensive portion of the process. We take large amounts of data must process it quickly as time is of the essence.

Decide:

Decisions need to be made. In a recent incident this phase was done on a whiteboard with a co-worker. We identified what we knew about the scope and gravity of the situation, and what options were available to us. We then, on the best information available at the time made a decision to do a specific set of things. An evaluation takes place during the decision making, generally along the lines of "If I do X, what will happen"?


Act:
At this point we act upon those decisions that make the most sense. Not all decisions get acted upon, because not all decisions are appropriate. Action feeds back in to Observation and Orientation.


Now, recall that this is a loop. It's not a step by step protocol. It's a thinking, living, breathing course of Assessment, testing, action, reaction, and adaptation. We tend to do these things naturally. Assuming the intruder is in the system during a response, they will be working through their own OODA loop and will be attempting to subvert and disrupt your OODA loop.

But wait. What advantages do we have, or rather what advantages does your client have?
The good news is that the battlefield is one of our choosing. We know the landscape and have the opportunity to plan ahead. This is a great place to inject an Incident Response Playbook.

What is an IR playbook? It's a set of protocols that educate responders - from the first responder to the tier 1 responder, and it allows the incident handler to make faster decisions, and provides a control structure for handling the incident. It allows everyone involved a chance to orient themselves to the landscape, thereby speeding up the defender OODA loop. In a playbook, many things can be decided ahead of time, and the answers to questions are already present. For instance we can walk in to an incident already knowing:

1) If sensitive data is on the system, and how it gets processed and stored.
2) What containment options are available.
3) Business continuity can be pre-assessed.

With a playbook we can short circuit the initial OODA loop and improve our response accuracy and speed. Of course we can't always rely upon a playbook. There will be times when the playbook must be thrown out because it doesn't apply to the situation at hand.

All the credit for OODA obviously belongs to John Boyd. A fantastic book is here.

Wednesday, October 15, 2008

Remotely examining a disk image

Recently I've been exploring new modes of operation. One such mode is working on disk images remotely. This has become increasingly important as cases roll in constantly. Disk to Disk or Disk to Image imaging is great for small operations. It's mandatory when working a criminal case. A new mode of operation I'm exploring is image to SAN and analyze from SAN. Imaging to SAN is a great way of operating if you can swallow the cost. One of the best, and most cost effective tools on the market today is F-response. It's now cross platform. As I've mentioned to Matt Shannon - It's mac-tested and mother approved. I've used it several times in such cases. It sure beats doing target disk mode let me tell you. Imaging is a pretty straight forward activity. Imaging to a SAN is no different. Where it gets interesting is analysis. How do you do it?

My current tests center around using multiple platforms.

I have a linux box that I have attached to the SAN. It has access to my disk images that are stored there.

I mount the disk image in a particular directory as follows:

mount -o offset=32256,ro,noatime,noexec,nosuid,nodev,show_sys_files,nls=utf8 image.dd /path/to/case

This is a straight forward method of mounting a disk image for analysis in linux. Now, how to gain access to it from Windows, where the armory of analysis tools exists?

I've been using sftpdrive. I enter all the requisite login information and folder location and voila. sftpdrive is really the windows equivalent to the native linux sshfs. After logging in I now have an SSH secure tunnel to a mounted disk image, that is listed as drive letter in windows. I can run any and all analysis tools I need to. One caveat I've run in to has been a timeout issue. The fix to this has been to "always trust" the SSH key within sftpdrive.

Do you image and analyze disk images from a SAN? What methodology do you use if you use a non-traditional method?

Old is new - Tales from the field

If we believe in certain truths such as "past is prologue" or "life is circular" or "security is cyclical" and "what was, is, and what is, will be again", then what we are seeing in the field is easily explained and indicates the cycle is starting all over again. If I've confused you, then my job is done here.

No really, let me explain. In the past two months I've seen compromises that take me back to the days of yore, when we had malware like stoned floating around on boot sectors of floppy disks. I can recall one bit of malware a friend shared with me at the time that would play a "drip drip drip" sound as it deleted the contents of the system from underneath you. Why did these types of malware work? Well, think about it. Floppies were prevalent, and they were primary storage medium of users. In order to move data from computer to computer you put it on a floppy disk and carried it to the next system you needed to use. This was otherwise known as using a sneaker net.

Stepping forward, developers figured out how to largely prevent this type of infection from taking root in a system. Soon we had Bios detection of boot sector viruses, we had antivirus detection of removable media and the threat changed. This type of malware was removed from our forward thinking minds. Self spreading worms were the threat of the day soon after and the world suffered. We then moved in to the user being our weakest link. Browser based exploitation of systems is popular, phishing works and so on.

Apply brakes here.


In the past 30 days I've seen compromises that are based on MBR rootkits, and removable media. I will not spend a lot of time detailing the intricacies of the malware, because others have done so really well. This was detected as..well it wasn't detected by antivirus. It couldn't be. Because of the way the malware loads and runs, Antivirus could not detect the malware on a live running system. Oddly enough, Antispyware software identified the malware through the presence of registry keys. RegRipper was able to assist in this identification. Harlan, has the world said "thank you" yet? Full detection could only occur by mounting the disk image I had. In testing after the fact I was able to mount the disk remotely and detect the trojan using F-response. This trojan, while using OLD concepts, uses new techniques. Discovery of files in C:\Windows\Temp is what really got the blood boiling. One contained a logfile of keystrokes entered in Internet Explorer. Another was XOR'ed. Thanks to Didier Stevens fantastic tool XORsearch I was able to determine the key used (11 byte XOR using keyword "bank") and the file was "un xor'ed". A list of over 900 banks was uncovered. The malware's intent was revealed and the case moved on.


Next....


Let's now discuss removable media malware. This is floppy malware all over again. Except this is 2008 and self spreading worms are dead right? Well, not quite. They've simply ducked under the external threat radar and they are using the internal threat agent - the user. The class of malware is Worm.Autorun and the name says it all. It functions by creating autorun.inf files on removable media and fixed drives and relies upon the user and windows autorun functionality. Once the malware makes its way on to a host, the real fun begins.

The malware hooks in to the registry and replaces your registry key
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon to point to a new userinit.exe that is created on the system by the malware. Why is this important? userinit.exe is responsible for execution of scripts and other programs that need to get loaded before explorer.exe runs such as establishing network connections. That's under normal operation. This replacement userinit.exe is a keystroke logger that has a degree of shell integration, since it's loaded before, and injected in to explorer.exe. It grabs the window titles and contents of balloon windows and all else that exists in explorer.

Attempt to remove or rename the autorun.inf from the infected system, and 2-10 seconds later a new one is created. Upon attempted deletion you receive a dialog box asking if you're sure you want to delete it. Guess what generates that dialog?


Lots of things happening, lots of cyclical security issues are reaching their return time, so it may be time to revisit old and forgotten response models, brush the dust off and update them.