A Case Study in Web Search using TREC Algorithms

This work was performed when the authors were employees of AT&T Labs. Current contact information for both authors: Google, Inc., 2400 Bayshore Pkwy., Mountain View, CA 94043, USA, {singhal, martink}@google.com

Amit Singhal
AT&T Labs -- Research
180 Park Avenue
Florham Park, NJ 07932, USA

Marcin Kaszkiel
AT&T Labs -- Research
180 Park Avenue
Florham Park, NJ 07932, USA

Abstract:

Web search engines rank potentially relevant pages/sites for a user query. Ranking documents for user queries has also been at the heart of the Text REtrieval Conference (TREC in short) under the label ad-hoc retrieval. The TREC community has developed document ranking algorithms that are known to be the best for searching the document collections used in TREC, which are mainly comprised of newswire text. However, the web search community has developed its own methods to rank web pages/sites, many of which use link structure on the web, and are quite different from the algorithms developed at TREC. This study evaluates the performance of a state-of-the-art keyword-based document ranking algorithm (coming out of TREC) on a popular web search task: finding the web page/site of an entity, e.g. companies, universities, organizations, individuals, etc. This form of querying is quite prevalent on the web. The results from the TREC algorithms are compared to four commercial web search engines. Results show that for finding the web page/site of an entity, commercial web search engines are notably better than a state-of-the-art TREC algorithm. These results are in sharp contrast to results from several previous studies.

Keywords: Search engines, TREC ad-hoc, keyword-based ranking, link-based ranking