Anonymous Data Isn’t so Anonymous After All

Who will find out what you watch?

How anonymous is anonymous data? This question seems absurd, but it turns out that anonymous data isn’t always unidentifiable. For example, when the Massachusetts Group Insurance Commission (GIC) in the mid 1990’s released anonymized data on state employees that showed every hospital visit, graduate student at the time Latanya Sweeney was able to re-identify former Governor William Weld’s hospital information based on Cambridge’s complete voter rolls that she purchased for a mere $20. Sweeney later went on to show that 87% of all Americans could be identified using only ZIP code, birthdate, and gender (Anderson, 2009). Since Sweeney’s findings, computer scientists have discovered that nearly all information can be personal if associated with other useful and appropriate bits of data. Professor at the University of Colorado Law School Paul Ohm also recognizes the vast failures of anonymized data in his research paper on the topic from August of 2009. Ohm notes that as the amount of our information grows online, anonymizing data isn’t enough to keep it from becoming personally identifiable and from falling into the hands of others (Anderson, 2009).

This issue has also appeared in major digital companies like AOL and Netflix. When AOL released their database of anonymized search queries after cleaning the data of personal information, computer scientists were still able to pinpoint individual users out of the search queries. This led to a major lawsuit directed at AOL by its subscribers and a major PR disaster for the company. Netflix also faced similar issues when they released their database of movie recommendations for study. Even after a similar data scrubbing process, computer scientists were also able to identify individuals to unique queries (Anderson, 2009).

One issue that results in the reidentification of anonymized data is the lack of careful laws in place to prevent reidentification. It is admittedly difficult to keep up with the ever-changing culture and methods of information collection, both anonymized and not. Ohm suggests that regulators and law makers should expand privacy rules and regulations each time a new situation appears. However, this is far too tedious and nearly impossible to do since the culture and the methods of collecting information (particularly personally identifiable information) is changing so fast. Another proposed method is to aggregate personal data collected. However, this is not particularly useful because the usefulness of data can be diminished by doing such. What solutions are we left with? Ultimately, officials and lawmakers dealing with these issues must be shrewd and foreword thinking; they must be able to understand when personal data is appropriate to collect and when it is not. From the case studies above, the most seemingly harmless pieces of data can be used to create havoc in our lives.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Anonymous Data Isn’t so Anonymous After All

  1. dkoleanb says:

    Lucas,
    Episodes of this kind are only becoming more and more frequent. For example, John McAffe, who was wanted for the murder of his neighbor in Belize (apparently committed while he was high on bath salts), was caught by Guatemalan police after the metadata on an iPhone picture revealed the geographical location of his whereabouts. He has since evaded deportation to Belize after faking a heart attack and is back in the United States. What a guy…But anyway, because there has been a huge increase in geotagging, not only your privacy, but also your geographical whereabouts at an exact time is published online, making it easy for anyone to find you.
    Another example I found quite humorous that I read about last year was regarding TomTom, a GPS manufacturer. TomTom changed its privacy policy, indicating that it has the right to send completely anonymous information to law enforcement agencies about vehicles disobeying the law (ie: speeding). Is this information really anonymous if the GPS reports a vehicle sits in the same spot for 12 hours every night (your home). Hmmm…
    This is definitely a very controversial issue, especially because law enforcement officials need significant technological background to understand the laws they are implementing.

  2. blevz says:

    With so much Data being created and stored it seems inevitable that we will reach a point where it becomes impossible to remain truly anonymous online. Without either a new architecture for computers or a redesigned way to peruse the internet we will be “stuck” in our online identities just as we are stuck within our bodies. For certain applications, online commenting on political articles and product reviews, this will be helpful and will make these services better; however, for services like medical advice and job swapping (looking for a job while you already have one) these privacy issues will have to be addressed either culturally or legally. With congress seemingly blind to most matters of technological importance it seems to me like the response will emerge from the end-users either allowing the increased infringement on privacy or demanding the ability to use certain services anonymously.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s