In response to regulatory pressure, Google has announced a new data retention policy that reduces the duration that user IP addresses are stored in the company's logs. Google claims that IP addresses are now anonymized after nine months instead of 18 months.
Google's data retention policies have been a topic of significant contention. The company has faced enormous pressure from the European Commission's Article 29 workgroup, which is tasked with monitoring data protection issues. Google decided to implement the 18-month cycle for IP anonymization last year after receiving criticism from EU officials. Google's latest move to cut the retention period to 9 months appears to be similarly motivated. If it was, it appears to have been a step in the right direction for EU officials. EU Justice Commissioner Jacques Barrot told Reuters that the nine-month policy was "a good step in the right direction" even though it falls short of the EU's recommended retention period of six months.
Google also submitted to the Article 29 working group an open letter which explains in detail the reasons why Google believes that log data needs to be retained. According to Google, the logs are used to combat click fraud and search poisoning, to improve the overall quality of search results, and to detect abusive exploitation of search results. One example that Google cites is the recent Santy search worm, which used search queries to locate vulnerable targets. Google used the logs to identify Santy attack patterns and then implemented a filter to block them.
Google also discusses the privacy implications of ad-supported services relative to conventional commercial services. Google acknowledges that it uses the log data to provide contextually relevant advertisements in order to make its service financially sustainable. The company contends that this business model is offers a higher level of protection for consumer privacy than a conventional subscription-based business model.
"Google's search business is offered to the public for free, and is thus inherently superior from a privacy perspective to paid services because it does not require users' real names, billing addresses, credit card numbers or mandatory tax and accounting records," Google wrote in its letter to the Article 29 working group. "To support this free service, Google primarily relies on being able to serve relevant advertising to its users."
Although Google touts its plans for log anonymization as a major win for consumer privacy, some critics—such as security researcher Chris Soghoian—believe that Google's anonymization practices are inadequate and that the company's public statements are misleading.
We asked Google to explain how they anonymize the logs and got a response explaining that the exact method hasn't been determined yet, but that it will probably involve randomizing a few bits of the IP.
"We are still working on figuring out the anonymization algorithm we will use. After nine months, we will likely change some of the bits in the IP address in the logs (we have not yet determined how many); after 18 months we remove the last eight bits in the IP address and change the cookie information," a Google spokesperson told Ars. "We have focused on IP addresses, because we recognize that users cannot control IP addresses in logs. On the other hand, users can control their cookies. When a user clears cookies, s/he will effectively break any link between the cleared cookie and our raw IP logs once those logs hit the 9-month anonymization point. Moreover, we are continuing to focus on ways to help users exert better controls over their cookies."
Soghoian argues that removing the last eight bits doesn't provide adequate protection. As he points out, each truncated IP value in the database after 18 months would be associated with queries from a theoretical maximum of 255 users.
Although Google's approach would effectively make it impossible to detect the actual IP address behind individual queries, anyone with access to the data could still potentially use patterns in the queries associated with groups of IP addresses to ascertain the likely identity of the user behind a portion of them, in much the same way that attackers were able to do so with the search data that was accidentally leaked by AOL in 2006.
Google may have deflected regulatory smack-down in the short-term, but it's privacy practices are still not up to the same standards as its competitors. Microsoft, for instance, removes the entire IP address and all other identifiers after 18 months and Ask.com launched a new feature last year that allows users to search anonymously.Posted on