How anonymous is anonymous data? This question seems absurd, but it turns out that anonymous data isn’t always unidentifiable. For example, when the Massachusetts Group Insurance Commission (GIC) in the mid 1990’s released anonymized data on state employees that showed every hospital visit, graduate student at the time Latanya Sweeney was able to re-identify former Governor William Weld’s hospital information based on Cambridge’s complete voter rolls that she purchased for a mere $20. Sweeney later went on to show that 87% of all Americans could be identified using only ZIP code, birthdate, and gender (Anderson, 2009). Since Sweeney’s findings, computer scientists have discovered that nearly all information can be personal if associated with other useful and appropriate bits of data. Professor at the University of Colorado Law School Paul Ohm also recognizes the vast failures of anonymized data in his research paper on the topic from August of 2009. Ohm notes that as the amount of our information grows online, anonymizing data isn’t enough to keep it from becoming personally identifiable and from falling into the hands of others (Anderson, 2009).
This issue has also appeared in major digital companies like AOL and Netflix. When AOL released their database of anonymized search queries after cleaning the data of personal information, computer scientists were still able to pinpoint individual users out of the search queries. This led to a major lawsuit directed at AOL by its subscribers and a major PR disaster for the company. Netflix also faced similar issues when they released their database of movie recommendations for study. Even after a similar data scrubbing process, computer scientists were also able to identify individuals to unique queries (Anderson, 2009).
One issue that results in the reidentification of anonymized data is the lack of careful laws in place to prevent reidentification. It is admittedly difficult to keep up with the ever-changing culture and methods of information collection, both anonymized and not. Ohm suggests that regulators and law makers should expand privacy rules and regulations each time a new situation appears. However, this is far too tedious and nearly impossible to do since the culture and the methods of collecting information (particularly personally identifiable information) is changing so fast. Another proposed method is to aggregate personal data collected. However, this is not particularly useful because the usefulness of data can be diminished by doing such. What solutions are we left with? Ultimately, officials and lawmakers dealing with these issues must be shrewd and foreword thinking; they must be able to understand when personal data is appropriate to collect and when it is not. From the case studies above, the most seemingly harmless pieces of data can be used to create havoc in our lives.