The30th IPP Symposium

Search the Speech, Browse the Video

Alex Cozzi, IBM Almaden

This talk describes search and browse technologies within CueVideo, a multimedia research project at IBM Almaden Research that consists of an automatic multimedia indexing system and a client-server video-retrieval system. Their approach to multimedia retrieval is "Search the speech, browse the video". The video and audio are considered as two parallel media streams of information that are related by a common time line. Thus, they take advantage of the two parallel streams, using the audio stream for search and the video stream for quick visual browsing in a complementary manner to provide the desired video search functionality. The video indexing automatically detects shot boundaries, generates a shots table, and extracts representative key-frames as JPEG files from each of the shots. Several browsable video summaries are generated for rapid browsing. The audio processing starts with speech recognition followed by text analysis and information retrieval. Several searchable speech indexes are created, including an inverted word index, a phonetic index and a phrase glossary index. On the search, this talk focuses on the word and phonetic speech indexing. On the browsing, this talk focuses on video summaries and visualizations that assist in rapid browsing.