The 30th IPP Symposium

Model-Based Retrieval of Multimodal Information and Biosurveillance

Chung-Sheng Li & John R. Smith, IBM Watson

Most existing information retrieval applications are based on similarity retrieval of templates or examples, such as similarity retrieval of text and image documents. In such retrievals, the query usually consists of a number of keywords or phrase (for text), or features of an image or a segment of image. Each of the documents (text or image) in the database or digital library is usually represented as one or more vector(s) in a multi-dimensional feature space. The query processing of such similarity retrieval usually involves identifying in the feature space those vectors that have the smallest Euclidean distance to the vector that corresponds to the query target. This similarity retrieval paradigm, however, is not entirely suitable for many scientific and business decision support applications, which are mostly based on models.

The main challenge of applying models to large archives is scalability. Although most of the applications require the retrieval of only a very small subset of the results that maximize or minimize the model, almost all existing methods require applying the model sequentially over the entire region of the data. In this talk, I describe the SPIRE project which uses model-based information retrieval framework to address this challenge. In particular, the focus will be on models for extracting and searching complex geology structures (reflectors, horizons, faults, river delta lobe), locating high risk regions that might be vulnerable to Hantavirus pulmonary disease, extracting boundaries of wetland, and the latest effort on detecting bioterrorism from nontraditional data sources.