The 30th IPP Symposium

The MERL SpokenQuery System

Bhiksha Raj, Peter Wolf, MERL

In information retrieval systems one usually types the keyterms into a query engine, such as Google, which then returns all documents pertinent to the query. There are, however, several situations where it is inconvenient or impossible to type in queries, such as when the device used for information retrieval is too small for a keyboard, e.g., PDAs or cellphones, or when hands-free operation is required, e.g., while driving a car. In these cases it is much more convenient for the user to be able to speak the query, rather than to type it. The SpokenQuery system is an enabling technology for such a spoken interface for information retrieval systems, or more generally for database access.

The conventional approach to this task would be to use a speech recognizer to convert the spoken utterance to a text transcription, which would then be passed on to an information retrieval engine. The IR engine would be unaware that the query was spoken and not typed. There are several problems with this approach. 1) Speech recognition engines make mistakes, which can result in poor performance. 2) Speech recognition engines may not have the specialized words that characterize many documents in their vocabularies. 3) Text-based IR systems do not have an indexing mechanism that can cope with errors in the query.

The MERL SpokenQuery system architecture tackles all three problems. First, it uses the search space of the recognizer, as represented by the recognition lattice, instead of the recognition output, to perform retrieval. Second, it passes information from the index back to the recognizer in order to enable the recognizer to better identify important keyterms in the spoken query. Finally, the document index is based on an SVD-based representation that permits comparison of vector representations of the recognizers search space against the index. In this talk we will describe all of these components, and discuss their merits and limitations.