Friday, June 20, 2008

Finding PII data

In case you didn't know I live in New York. New York has a fantastic law on par with California's SB1386. In case you're not sure if your state has a similar law check out this article. Odds are your state has one of these great laws enacted.


Why is this important? In every security breach the following question MUST be asked. "Was there PII data on the system?". If you're not asking that question or addressing it in a timely fashion, you're not doing your job if you're dealing with security breaches. If you don't believe me, ask the company identified in this article. Waiting six weeks to notify even if data was not accessed is considered NEGLIGENCE, and it'll cost you $60,000 in New York. That's just negligence...the cost of the investigation usually starts around $500,000 for a small incident involving this type of data.

This law generally applies to every business, including educational institutions that suffer a security breach where Personally Identifiable Information is at risk of unauthorized disclosure. So what is PII data?
In New York it's:
(1) social security number;
(2) driver's license number or non-driver identification card number;
or
(3) account number, credit or debit card number, in combination with
any required security code, access code, or password that would permit
access to an individual's financial account;

"Private information" does not include publicly available information which is lawfully made available to the general public from federal, state, or local government records.

What I find absolutely amazing is that the commercial world doesn't seem to give a damn about this information, let alone losing it on a regular basis, yet the proverbial security industry punching bag (read higher education) has taken the lead in this arena.

Cornell University has a tool and a feature list of the latest version is here. The use of this tool has been mandated in a few places already.

Virginia Tech has a tool
Utexas Austin has a tool
Illinois U has a tool as well.
Sippy has released a tool called WHACK as well that will work with web sites, though I've not tried it yet.


Oh yeah..Identity Finder, a commercial outfit, has well uhm.."borrowed" code from some of these applications and is charging for it. I wonder if they've heard of GPL.

You can also use tools like PowerGrep to search for PII data.

You could also use expensive tools like Encase to do searches but that will cost you about $3,800, and you can only do one machine at a time. Many of the tools listed above are FREE as in BEER and have many more capabilities. You can also run the tools above over an F-response connection during a live response, but I definitely prefer to see a proactive scanning methodology. If you have the consultant edition, you could provide a service..hint hint. Figure out a technique that works for you.

My point in this post is simply to deliver a small wakeup call to whoever is reading. I have conducted numerous searches for PII data in recent years and it's EVERYWHERE. It's in email, it's in databases, spreadsheets, word docs, Scanned PDF's, CV's and on and on. It's in your organization right now and I guarantee that the majority of organizations are doing nothing about it. The worst thing about this data is that it's been so heavily overused in the past few decades that it's on computers and people don't even realize it. Search for it, Get rid of it. You don't need it, and consumers do NOT need to provide SSN for many of the transactions that take place.

I also promise this..if I ever receive a notification letter from your company, you'll be hearing from me.

10 comments:

Anonymous said...

This something that I have looked into recently. I've used this tool with limited success: (Lots of false positives when scanning a ~1 TB file share)

https://agora.cs.uiuc.edu/display/tsg/SSNiper+SSN+Scanner

I didn't stumble upon those other tools in my quick search, but I will definitely check them out. I planned on RTFM for ssniper to cut down on the false positives, but got caught up on some other projects. This post came at a good time!

Unknown said...

Nice article on this topic! It's the sad truth that companies are not taking this as seriously as they should be. That is, until a disaster is on their hands.

Have you ever used Encase's Ediscovery suite? I used it on an incident once, it was pretty powerful.

Thanks for the WHACK reference. Let me know if you have any feedback on it.

Sippy

H. Carvey said...

First off, there are commercial companies doing work in this area...they call it "data leakage prevention", or DLP. One of the best I've seen so far is Vontu, owned by Symantec.

The interesting thing about many of the freeware tools you mentioned is that they will not only throw a lot (no, sorry...I meant to say "A LOT") of false positives for CCNs, but they also miss a great number of CCNs that are part of the Visa PCI program. For instance, one of the tools only looks for numbers 15 and 16 digits long...it doesn't account for spaces or dashes, and doesn't appear to include anything to reduce your data, such as running a check for BINs or the Luhn formula.

Further, the freeware tools appear to only work for live file systems, and not on images, such as those used when performing a PCI forensic audit (following a suspected breach).

The issue with EnCase is that the built-in tools are not adequate for the job of locating CCNs, and you have to develop your own EnScript, or you will be in undated with false positives.

..if I ever receive a notification letter from your company, you'll be hearing from me.

Just out of curiousity, what would you say? Let's say that you had been contacted as a result of one of the larger breaches...TJX, Hannover, etc...what would you say?

hogfly said...

First off, there are commercial companies doing work in this area...they call it "data leakage prevention", or DLP. One of the best I've seen so far is Vontu, owned by Symantec.

Sure and how many companies have it deployed? How many have it configured properly and how many are monitoring it actively? Did Choicepoint? Did Axiom? Did Hannaford? Did TJX?



The interesting thing about many of the freeware tools you mentioned is that they will not only throw a lot (no, sorry...I meant to say "A LOT") of false positives for CCNs, but they also miss a great number of CCNs that are part of the Visa PCI program. For instance, one of the tools only looks for numbers 15 and 16 digits long...it doesn't account for spaces or dashes, and doesn't appear to include anything to reduce your data, such as running a check for BINs or the Luhn formula.


See that's the beauty of what's called open source software. If you don't like it, you can modify it to do what you need it to.


Further, the freeware tools appear to only work for live file systems, and not on images, such as those used when performing a PCI forensic audit (following a suspected breach).


You're kidding me right? And examiners don't know how to mount a DD image for reading?


Just out of curiousity, what would you say? Let's say that you had been contacted as a result of one of the larger breaches...TJX, Hannover, etc...what would you say?

I think you mean Hannaford. What would I say? Well I suppose that would be for me and my lawyer, and the EFF to decide.

Ronald Weiss said...

OK, great free tools using standard techniques to locate data that may or may not be relevant to the investigation!
Why the dig at EnCase? Why not FTK? Why not X-Ways, ProDiscover, or any other tool? IMHO it is not the tool (insert your software tool here) it is the TOOL who runs the tool.

If the examiner lacks a creative, problem solving mindset the investigation results will not be adequate. I see it all the time in law enforcement and corporate "investigations" tha I review and\or encounter. They see the obvious (or don't see it) and do not follow through with researching and developing a different approach to the problem.

None of us knows everything and we all encounter new operating systems, new applications and data data structures as part of investigations that require us to research, develop, test, and employ new techniques to find data that is relevant to our investigation. Once we find the data relevant to the matter being investigated we are required to justify and explain how we found it and, if possible, how it got on that system. The tools can help us reach that goal.... but if there is a limitation don't blame the tool! Its the examiner who runs it and makes it work.

But let's also put this issue into perspective.. free and commercial tools are great, but will they be used the right way? That is the question for any tool, be it free or commercial. Will it be used to produce accurate results in a reliable and trustworthy manner?

I believe in open source free tools, I have used them to obtain effective results. I have also obtained relevant data with almost every commercial tool out there (and found good and bad about those tools too). Sure the tool had limitations but so did my brain. And when any examiner hits an obstacle in the t00l and\or their brain they need to research the problem and come up with an alternate solution. It is not the fault of the tool! I remember some big clients telling me what they wanted in a tool: "We want a tool that will grab everything we need to get the bad guy, find it quickly, be admissable in court, work on every system, be able to run by everyone with a click of a button, and it shold be free and open source!" Them and everyone else!

What always concerns me though is the proper application of a tool (free or commercial), proper documentation and testing, and knowledge of the examiner. We all have this concern but how do we properly assess the tool? the technique? The examiner?

My concern is the application of any tool to an investigation being used by someone untrained and without the proper methodology and mindset. Operating in this manner is risky and can lead to poor results that sink a case. Either the guilty go free, the innocent get punished, or the corporation loses the civil\regulatory hearing. This is dangerous because the client (whether it be management or an external customer) do not know how to assess the results of a digital investigation or how those results were obtained. They also want to push for quick, free, and easy!

But beyond that......... how does the examiner put the data found into context? How do they confirm it is relevant to the matter being investigated? How do they handle data? Are they prepared to recognize an incident when it ahppens and then are they prepared to respond and deal with the situation............

Great post...........obviously stirred some ideas in me!

hogfly said...

Ron,
No dig at Encase or GSI intended. I was simply using them as an example, because Lance's blog showed tne enscript to search for CCN's. You could definitely insert any vendor there, but let's face it encase is the most expensive.

The rest of your comments are right on, and I couldn't agree more with your assessments.

Ronald Weiss said...

True EnCase is can be expensive! And we will not touch on the impression of many about how quickly EnCase, FTK, and other tools are released with or without the appropriate testing!

I just wanted to throw in a rant, lol..... sorry. Good blog good thoughts......... I also responded because I see clients who want their investigation wrapped like a CSI TV show.. quick cheap and easy! And as you know.. there's always more to it that that! Thanks for the ideas.. good stuff!

H. Carvey said...

Sure and how many companies have it deployed?

Wow, what's with the backlash? Sheesh. Sorry I said anything! I have visited several customer who have this deployed, and several others who are implementing it.

Did Choicepoint? Did Axiom? Did Hannaford? Did TJX?

You know, I really have to wonder why you even bother to ask these questions. You know as well as anyone else that any consultant who worked those engagements is under NDA...if they were to tell you this information, they'd be in violation of that NDA. And besides, what's the point of the question at all?

You're kidding me right? And examiners don't know how to mount a DD image for reading?

I'm sure many do...but organizations such as Visa PCI require that the unallocated space be searched, as well. In fact, the incident dashboard in Visa PCI reports (a component of the report required by the PCI council) has an explicit section that deals w/ PCI data found only in unallocated space.

...I suppose that would be for me and my lawyer, and the EFF to decide.

Understood...I just thought that since you were announcing publicly that you would say something, you'd be willing to share what it is you'd be saying.

hogfly said...

Wow, what's with the backlash? Sheesh. Sorry I said anything! I have visited several customer who have this deployed, and several others who are implementing it.

Guess that came across wrong. I applaud those companies that have it deployed, but simply deploying these tools isn't enough and not enough have even considered it.

You know, I really have to wonder why you even bother to ask these questions. You know as well as anyone else that any consultant who worked those engagements is under NDA...if they were to tell you this information, they'd be in violation of that NDA. And besides, what's the point of the question at all?

Yes of course there would be NDA's. My point is simply that there is not enough accountability when it comes to consumer data and these companies that serve as data warehouses need to own some responsibility for those whose data they possess. These companies don't make a reasonable effort to protect the data of millions. That is negligence in every sense of the word. Yet they simply get away with sending out a letter, paying a fine and perhaps credit auditing.

I'm sure many do...but organizations such as Visa PCI require that the unallocated space be searched, as well. In fact, the incident dashboard in Visa PCI reports (a component of the report required by the PCI council) has an explicit section that deals w/ PCI data found only in unallocated space.

True. But that still isn't an issue since unallocated can be extracted without much issue. At least one of the tools I mentioned searches unallocated as well.

As far as what I would say...it would first truly depend on the nature of the incident. If the company did their due diligence then I couldn't truly blame them since compromises do occur. I would through the lawyers request access to report information, and determine the level of negligence. If there was a case, I would begin a grassroots campaign to contact other affected individuals and push for strict leglislation - not PCI "requirements" but actual legislation that holds these companies accountable in a pro-active manner, and stiffer penalties. I don't consider the ability to 'settle' a penalty. It's a cop-out solution to ease the burden on tax payers, but as a tax payer, that's something I think my taxes would be well spent on. Too many people I know have had to suffer the pains of credit problems because of these companies.

So as far as what I would say? I don't know..but that's what I would do.

Naturally this would not be a simple process and could very well be fruitless but someone has to try...

H. Carvey said...

My point is simply that there is not enough accountability when it comes to consumer data and these companies that serve as data warehouses need to own some responsibility for those whose data they possess. These companies don't make a reasonable effort to protect the data of millions. That is negligence in every sense of the word. Yet they simply get away with sending out a letter, paying a fine and perhaps credit auditing.

Well, it's better to light a candle than to curse the darkness. What do you suggest? What would you do differently? How would you do it better, and how do you plan to do it better?

Don't get me wrong...I agree with you completely. However, all I'm seeing is complaining. The fact of the matter is that the PCI DSS (which is about compliance, NOT security) is a step in the right direction. Every one of the major breaches you've mentioned have been in infrastructures with a significant investment already in place, and for them, just meeting compliance is an expensive and time-consuming proposition. Why? Because adding in security (or in this case, compliance) after the fact is always more expensive than doing it right in the first place.

My point is that glaciers move slowly...but they do move. Until someone comes along with a better idea, this is what we've got. Does PCI have an effect? Sure...even a tiny, almost imperceptible effect is still something. Does the list of breaches at the Privacy Clearinghouse have an effect? I hope so, although I doubt it.