Tech Report CS-94-27

Path Caching: A Technique for Optimal External Searching

Sridhar Ramaswamy and Sairam Subramanian

May 1994

Abstract:

External 2-dimensional searching is a fundamental problem with many applications in relational, object-oriented, spatial, and temporal databases. For example, interval intersection can be reduced to 2-sided, 2-dimensional searching and indexing class hierarchies of objects to 3-sided, 2-dimensional searching. Path caching is a new technique that can be used to transform a number of time/space efficient data structures for internal 2-dimensional searching (such as segment trees, interval trees, and priority search trees) into I/O efficient external ones. Let n be the size of the database, B the page size, and t the output size of a query. Using path caching, we provide the first data structure with optimal I/O query time $O(\log_B n + t/B)$ for 2-sided, 2-dimensional searching. Furthermore, we show that path caching requires a small space overhead $O((n/B)\log_2\log_2 B)$ and is simple enough to admit dynamic updates in optimal $O(\log_B n)$ amortized time. We also extend this data structure to handle 3-sided, 2-dimensional searching with optimal I/O query-time, at the expense of slightly higher storage and update overheads.

(complete text in pdf or gzipped postscript)