Legislators in both the US and EU have been carefully eying the collection of personal identity information by search companies, raising the specter of mandated time limits on the retention of this data. That has caught the attention of many companies, leading a number to set more rigorous privacy policies. Google is among those paying attention, as last year, it bowed to pressure from the EU and shaved six months off its retention of identity information. In a further move to avert potential legislation, Google announced yesterday that it would cut the figure in half: IP addresses in its logs will now be anonymized after nine months.
Google has some obvious interests in keeping track of specific IP addresses. All of its businesses, from search to directed advertising, rely on identifying connections among content and its readers, and IP addresses can help with that process. They're also essential in the identification of click fraud, which can reduce the value of the ad services Google provides. Finally, they can help the search giant identify malware attacks that either target its servers, or spread by using information obtained there.
But it's also in Google's business interests not to disclose exactly what's done with IP address information, lest competitors use that to reverse-engineer its secret sauce, and that has caused some significant public scrutiny. Its data retention policies have come up during a number of Congressional hearings, but the most significant concerns have been raised by the EU, where a working party on data protection called Article 29 has been formulating recommendations for legal policies on identity retention under the aegis of the Justice and Home Affairs office (motto: "Freedom, Security, and Justice").
Google's initial scaling back of identity retention came in response to Article 29, and the new policy comes along with a detailed response (PDF) to concerns raised by the group. The search giant now claims that it has improved the computer algorithms it uses to analyze visits to its site, and can extract sufficient information from nine months of data; the logs will be retained afterward, but IP addresses will be anonymized.
Google undoubtedly has legitimate business and security reasons for retaining specific IP addresses in its logs for a finite period of time, and it definitely has business reasons for not revealing precisely what it does with that information. Still, it's hard to read the company's defense of its policies without the sense that it's engaging in some significant hyperbole. Unless you read carefully, the frequent references to user security would lead you to suspect that merely performing a search would leave users at risk were IP addresses not retained. An earlier letter (PDF) to the chair of Article 29 actually argues that retention may be required by Sarbanes-Oxley regulations.
It seems likely that Google could afford to be more transparent about how retention of IP information in logs meets its various needs without being so transparent that it torpedoes its own business. And that, more than any algorithmic struggles to cut the retention times, may be what's needed to get legislators off its back.Posted on