BOOKQUE

No image available

Using Constraints to Query R*-trees

by University of Wisconsin--Madison. Computer Sciences Department , Jonathan Goldstein , Raghu Ramakrishnan , Uri Shaft , Jie-Bing Yu · 1996

ISBN: Unavailable

Category: Unavailable

Page count: 34

Abstract: "The R*-Tree index is a popular multidimensional index used in several extensible and GIS-oriented database systems. In this paper, we show that a simple refinement of the search algorithm of the R*- Tree -- which is common to all variants of the R-Tree -- offers significant speedups in most cases, with little or no worst-case performance penalty. The idea is essentially to use a conjunction of linear constraints (rather than a minimum bounding retangle) [sic] to approximate the query and to use this tighter bounding envelope to determine when the query overlaps with an R*-Tree node. This raises an important question: How can we efficiently check whether the query envelope overlaps the minimum bounding box for a tree node? Linear Programming (LP) offers one solution, but it is susceptible to numeric approximation errors. One of the contributions of this paper is a new algorithm for performing this check check that is more efficient than LP and free from numeric errors. We also present several theoretical results characterizing this algorithm. From a practical standpoint, adding the proposed constraint query refinement to existing R*-Tree implementations is straightforward. Using implementations of R*-Trees on top of the SHORE storage manager, we present experimental results (using TIGER census data for California and Wisconsin, and the Sequoia 2000 benchmark data set) that provide strong evidence in support of the proposed refinement. Our results demonstrate that the CPU overhead of our more complex overlap test, vis-a-vis the traditional minimum bounding box intersection test, is minor. On the other hand, the (CPU and I/O) gains can be considerable, especially for queries that are asymmetrically oriented with respect to the R*-Tree axes. The results are especially surprising given that we make no changes at all to the insertion and deletion algorithms. Finally, linear constraints are a very powerful tool for formulating queries, and a very important application of our results is that we can efficiently support a broad new class of such queries on multidimensional datasets drawn from a variety of domains that go well beyond GIS and spatial applications. We illustrate this, with experimental results, using queries over a five-dimensional projection of the widely used Compustat financial database (which contains over 300 dimensions!)."