Search engine optimisation using past queries

Garcia, S 2007, Search engine optimisation using past queries, Doctor of Philosophy (PhD), Computer Science and Information Technology, RMIT University.

Document type: Thesis
Collection: Theses

Attached Files
Name Description MIMEType Size
Garcia.pdf Garcia.pdf application/pdf 1.67MB
Title Search engine optimisation using past queries
Author(s) Garcia, S
Year 2007
Abstract World Wide Web search engines process millions of queries per day from users all over the world. Efficient query evaluation is achieved through the use of an inverted index, where, for each word in the collection the index maintains a list of the documents in which the word occurs. Query processing may also require access to document specific statistics, such as document length; access to word statistics, such as the number of unique documents in which a word occurs; and collection specific statistics, such as the number of documents in the collection. The index maintains individual data structures for each these sources of information, and repeatedly accesses each to process a query.

A by-product of a web search engine is a list of all queries entered into the engine: a query log. Analyses of query logs have shown repetition of query terms in the requests made to the search system. In this work we explore techniques that take advantage of the repetition of user queries to improve the accuracy or efficiency of text search. We introduce an index organisation scheme that favours those documents that are most frequently requested by users and show that, in combination with early termination heuristics, query processing time can be dramatically reduced without reducing the accuracy of the search results.

We examine the stability of such an ordering and show that an index based on as little as 100,000 training queries can support at least 20 million requests. We show the correlation between frequently accessed documents and relevance, and attempt to exploit the demonstrated relationship to improve search effectiveness. Finally, we deconstruct the search process to show that query time redundancy can be exploited at various levels of the search process. We develop a model that illustrates the improvements that can be achieved in query processing time by caching different components of a search system. This model is then validated by simulation using a document collection and query log. Results on our test data show that a well-designed cache can reduce disk activity by more than 30%, with a cache that is one tenth the size of the collection.
Degree Doctor of Philosophy (PhD)
Institution RMIT University
School, Department or Centre Computer Science and Information Technology
Keyword(s) information retrieval
search engines
Version Filter Type
Access Statistics: 556 Abstract Views, 2010 File Downloads  -  Detailed Statistics
Created: Wed, 16 Feb 2011, 12:58:49 EST
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us