Verizon Reveals the Secrets of Yahoo Search

Three months after acquiring Yahoo, Verizon is giving away the secrets of a key Yahoo search tool. Today, Oath, the Verizon-owned company born of the merger between AOL and Yahoo, released the source code of a data-crunching tool called Vespa, which has long powered many features across the Yahoo empire.1 Now that it's open source, any company or individual can use or modify Vespa to power its own products or websites.

Open sourcing search technology might sound a little quaint, given that these days Yahoo actually uses Microsoft's Bing to power most of its web searches. But Vespa underlies searches within Yahoo, on sites like Flickr, which hosts millions of images. Yahoo also uses Vespa to power related-article recommendations and ad-targeting on many Yahoo-branded sites, including Yahoo News, Yahoo Sports, Yahoo Finance, and its advertising network. Oath systems architect Jon Bratseth says Vespa processes billions of requests per day.

Related Stories

Rob Bearden
Rob Bearden

Business

If you listen to the pundits, Yahoo isn't a technology company. And yet it spawned one of the most important software technologies of the last five years: Hadoop, an open source platform designed to crunch epic amounts of data using an army of dirt-cheap servers.

Image: Flickr/jakebouma
Image: Flickr/jakebouma

Business

Google has created many custom software platforms that take advantage of its massive server farms, and it made a habit of publishing academic papers that detail these innovations. That has led to a proliferation of open source clones that operate in much the same way. These include file systems for storing data, and processing platforms for crunching all that data. But what about the most famous Google innovation, the one that has used its sweeping server farms to the greatest effect? What about Google search?

Artificial Intelligence

Yahoo may not be known as much for its technological prowess these days. But its new open source AI comes with a pedigree.

Vespa's history traces back to the Norwegian search engine AlltheWeb, which Yahoo acquired in 2003. After the acquisition, the AllTheWeb team started retooling its search technology into a more general purpose tool that Yahoo developers could use internally to power different applications. The code has been almost completely rewritten since those early days.

By making Vespa open source, Oath VP of engineering for big data Peter Cnudde says the company hopes to replicate the benefits it has reaped from supporting Hadoop, an open-source software framework for managing big data. Yahoo hired Hadoop co-creator Doug Cutting in 2006, and paid other engineers to work on it as well. Eventually, Hadoop was adopted by the likes of Facebook, Twitter, eBay, and many others, whose employees added features and fixed bugs. As more people used Hadoop, it became easier for Yahoo to recruit people who were already familiar with the software. Cnudde says Oath hopes Vespa will follow the same path.

Hadoop isn't as good as Vespa for returning real-time results. And many real-time processing tools, such as Apache Storm, aren't designed to serve results to end users. So Oath uses Vespa, Hadoop, and Storm together. Until now, Vespa hasn't been available to developers outside of Oath, Yahoo, and Yahoo Japan.

"We would have loved to do it earlier," says Cnudde. "But open source doesn't come for free. You have to write the documentation, make sure it's acceptable, and be ready to manage a community."

It's unclear whether there's demand for Vespa outside of Oath. Hadoop was born open source, and came along just as companies needed it. But most large-scale internet companies have already solved the web-search problems that Vespa was designed to address. Plus, there are several open-source search engines available, including Solr and ElasticSearch. And let's face it: the Yahoo brand has seen better days. But for new and growing companies, Vespa might just fill an important niche.

1 Correction appended 7:05 pm ET: Vespa powers search and other features of Yahoo's network of sites. An earlier version of this story incorrectly implied that Vespa previously powered Yahoo web-search features that now are handled by Bing.