Tech Report CS-01-09

Index Structures and Algorithms for Efficient Profile Matching

Nesime Tatbul

April 2001

Abstract:

In an environment like the web, where new data is continuously being generated and where users continuously need to receive new data, traditional data management techniques fall short. When users periodically poll the data sources for updates, unacceptably high server loads result. Publish/subscribe systems provide a solution to the polling problem by delivering relevant data to interested users as new data gets generated. Users declare the data they are interested in through long-term queries called profiles, and the system sends data to these users as new data gets generated.

A major problem in a publish/subscribe system is efficiently deciding which profiles match new data when there are a large number of profiles. In this research, we have investigated several index structures and profile matching algorithms to provide scalable operation of a publish/subscribe system. We have also analyzed the effect of overlap among profiles on the performance of the indices.

(complete text in pdf or gzipped postscript)