Last week, Google accused Bing, Microsoft’s search engine, of copying its search results. Since Google first made this accusation, Bing has vigorously defended itself and the spat has mushroomed into an all-out public feud.
This dispute between two market giants fascinates me. It raises all sorts of interesting questions: What is ethically appropriate for a search engine? Is there an objectively ideal set of search results or are searches necessarily shaped by their designers? And, most importantly, are there any viable intellectual property claims against Bing? I will focus primarily on this last question in this blog post.
First, a word regarding Bing. As most people know, Microsoft is the company behind the Bing search engine. Throughout this post, I will refer to the organization within Microsoft that designs and maintains the Bing search engine as “Bing.” Sometimes I may refer to Microsoft instead; but where I choose one name over the other, I am not attempting to draw any meaningful distinction. For the purposes of this post, the two are interchangeable.
Some Background. In May of last year, Google started noticing that Bing was suddenly much better at returning results for unusual misspellings, often returning the same set of results that Google might return. Danny Sullivan, over at Search Engine Land, provides a few examples: “torsoraphy” instead of “tarsorrhaphy” or “bombilete” instead of “bombilate”. Because Google has worked hard on what it characterizes as a cutting-edge spelling correction system, it took immediate notice of this apparent copying by Bing and started watching Bing more closely. In addition to correcting misspellings, Google also prides itself on its fairly robust “long-tail” searches—searches for obscure terms or phrases where there are few results. It seemed to Google that Bing was leveraging Google’s hard work in this area and getting a free ride.
Then, in October of last year, Google saw a “marked rise” in similarities between Google search results and Bing search results, including the result placed in the top spot for any given search. Because the results were so strikingly similar, Google decided to initiate a sting operation. You can read about exactly how Google’s sting operation worked at Search Engine Land. Suffice it to say that Google introduced artificial or synthetic search results for truly random words like “hiybbprqag” and “indoswiftjobinproduction”. At the start of the Google sting, these words produced no results on either search engine. Once Google hard-coded the fake results pages for these queries and sent its engineers home to run seed searches using Internet Explorer and the Bing toolbar, it wasn’t long before these results began to show up at bing.com, in roughly 7–9 percent of searches. Looks like a clear-cut case of copying right?
IE and the Bing Toolbar. Google suspected that Microsoft was doing its copying primarily by collecting information about Web surfing habits from users of two of its products: Internet Explorer (IE) and the Bing toolbar. IE offers users a “suggested sites” feature that, when enabled, transmits clickstream data (lists of sites users visit and searches performed at those sites) back to Microsoft. Similarly, Microsoft’s Bing toolbar, which users can install in IE or Firefox, collects and sends clickstream data back to Microsoft. Microsoft does notify users of its collection of such data in its install dialog, which links to its terms and conditions and privacy policy, and permits users to opt-out of such collection. Pundits and lawyers disagree about whether Microsoft does enough to notify users about precisely what it’s doing with the data. I won’t examine that issue in this post.
Microsoft admits to using such clickstream data as input to its search engine algorithms. It is, in essence, crowd sourcing its search results by watching what users search for and click on at websites across the Internet. However, Bing is careful to point out that the clickstream data is just “one small piece” of over 1,000 different signals that feed into the Bing ranking algorithm. Furthermore, Bing isn’t targeting Google per se; it’s just that, because of its relative popularity, Google makes up a significant portion of the Bing clickstream.
To complicate matters further, it is possible that Google is using clickstream data collected via its own Google toolbar to improve its own searches. Google has so far denied this but there is some evidence to the contrary.
Aether. What happened in October to cause a noticeable change in Bing’s search results? Bing rolled out a new (and presumably better) ranking algorithm (something code-named “Aether”). According to Bing, that explains the sudden shift Google witnessed.
So far, Google has not made any intellectual property infringement or misappropriation claims against Bing. And, as you’ll see below, it is doubtful that they will. Google’s argument in their very public dispute with Microsoft over Bing’s reliance (in part) on Google results as input for their search algorithm is that Bing is behaving unfairly, and possibly unethically in its use of Google results.
The refrain from Google since this story broke has been, “you’re copying our hard-earned search results!” And Bing’s has been, “Google’s results are 1 of 1,000 data points Bing uses to improve its search results.” As far as I can tell, they’re both right. So the question is, does this skirmish need a referee? Should the law, the government, step in to decide a winner and a loser in at least this battle of the larger search engine wars? The answer is probably no.
Within the legal regime that we have here in the US, in light of the intellectual property laws and internet/cyber laws that might apply, this type of competitive behavior is permitted. In short, Bing is not copying anything that’s protected by copyright. Whether Bing infringes any patents in its gathering clickstream data is conceptually unrelated to the question of improper copying here; either Bing infringes a patent covering clickstream analysis or not. It would not be Google-specific. Perhaps there is an open question around whether Microsoft is using improper means to discover a trade secret. But even that is probably a stretch, as we’ll see below. Regarding trademarks, copying search results does not raise a trademark issue beyond those raised by generating search results as a general matter.
Let’s explore the relevant intellectual property doctrines in more detail.
Trade Secrets. Google and Microsoft both rely on trade secret protection (and, to some extent, copyright) for their search algorithms. As with the formula for Coca-Cola or the manufacturing process of K-2 skis, when the value of an invention to its owner is directly proportional to its secrecy, then it makes perfect sense to choose trade secret protection over patent protection. Also, patents only provide protection for about 20 years while trade secrets are protected for as long as they remain secret.
As I have explained in a previous post, a trade secret is information not generally or publicly known for which reasonable precautions have been taken to protect it from public disclosure. If someone uses improper or wrongful means to discover a trade secret, that person can be liable for trade secret misappropriation. Improper or wrongful conduct is that which falls below generally accepted standards of commercial morality and reasonable conduct. Reverse engineering is not typically considered improper in the absence of a binding non-disclosure agreement (NDA).
Given this definition, based on what we know about Microsoft’s actions in using Google’s results in its own search results, the individual search results that Microsoft has collected are likely not trade secrets. They are public; anyone can obtain them by searching Google. And Microsoft’s collection via its user base is authorized by its end-user agreements. Perhaps an enterprising lawyer could argue successfully that Microsoft’s agreements with users of IE or the Bing toolbar are improper or wrongful. After all, Microsoft could do a better job disclosing to users that it may use the user data it collects to improve the Bing search engine. But supporting a claim of misappropriation based on this seems like a stretch, especially given the public nature of what’s being “copied.”
Copyright. What about copyright protection for the search results gleaned from Google by Bing via the IE Suggested Sites feature or the Bing Toolbar? There are several impediments to granting Google copyright protection for individual or collected search results.
Search results are merely a collection of facts about other websites, generated by a machine, whose order is based on an algorithm attempting to offer users useful information. Copyright does not protect facts and requires that copyrightable subject matter be (at least minimally) creative and original, not utilitarian. Of course there is more to it than that but that’s a layman’s summary of the relevant threshold to copyrightability for this kind of expression. Copyright lawyers will recognize that search results are more akin to the unprotectable “sweat of the brow” exemplified by the US Supreme Court in Feist. At most, Google could claim infringement of its copyright for the entirety of its search results as a compilation, or the look and feel of its results pages. But that would require that Bing (substantially) copy the results for a particular search term en masse, or the specific look and feel of the Google results pages.
Unfair Competition? Perhaps there is a viable state-law claim for unfair competition, maybe some sort of legally-actionable, deceptive practices on the part of either Bing or Google. Given the facts made public so far, however, it is not clear that either party is engaged in any behavior that would qualify under the various state laws in this area, especially once we have ruled out claims related to trade secret misappropriation and trademark infringement.
Conclusion. There do not appear to be any agreed-upon standards or industry norms for how a search engine company is to gather data to improve its search algorithm. In my opinion, Bing’s crowd-sourcing approach is not facially unethical and probably not illegal as a general matter. As suggested by Search Engine Land and others, this is most likely a series of PR maneuvers by Google and Bing that will eventually dissipate.
Bing characterizes Google’s sting operation as a “spy-novelesque stunt to generate extreme outliers in tail query ranking,” a “honeypot attack,” and even “click fraud.” To the engineers at Bing, clickstream data is fair game and Google is abusing its market power in search to stifle or reframe what is otherwise fair competition. In addition, according to Microsoft, Google has chosen the timing of its allegations of copying to deflect attention from the recent media and blogosphere stories about what many are calling a decrease in search results quality at Google.
Is it possible that all search engines, if given enough time and data points, would eventually converge on an ideal set of search results? Is that what’s happening here? Or is search inherently subjective, are search results always the result of an idiosyncratic set of decisions made by the designers of a particular search engine? If it’s the former then Google will increasingly find other engines returning similar results. However, it does appear that search engines are (and should be) a reflection of the priorities chosen by the designers, what Sullivan calls a search engine’s “search voice.” For Google this might mean weight X on inbound links and weight Y on social graph. For Bing the priorities might be weight X on actual user behavior (clickstream) and weight Y on inbound links. Blekko might emphasize more of a curated approach (as it appears to be doing).
Since a diversity in “search voice” ultimately benefits consumers and gives a search engine its competitive edge, Bing should naturally be incentivized to place less emphasis on the Google clickstream. In any case, it appears that Google will have to give Bing the leeway to innovate with the clickstream and its crowd-sourced approach to search. In the end, Blekko may have the most to gain from this squabble.