https://www.sqlite.org/rtree.html
SQLite
Small. Fast. Reliable.
Choose any three.
* Home
* Menu
* About
* Documentation
* Download
* License
* Support
* Purchase
* Search
* About
* Documentation
* Download
* Support
* Purchase
[Search Documentation] [ ] [Go]
The SQLite R*Tree Module
> Table Of Contents
1. Overview
2. Compiling The R*Tree Module
3. Using the R*Tree Module
3.1. Creating An R*Tree Index
3.1.1. Column naming details
3.2. Populating An R*Tree Index
3.3. Querying An R*Tree Index
3.4. Roundoff Error
3.5. Reading And Writing At The Same Time
4. Using R*Trees Effectively
4.1. Auxiliary Columns
4.1.1. Limitations
5. Integer-Valued R-Trees
6. Custom R-Tree Queries
6.1. The Legacy xGeom Callback
6.2. The New xQueryFunc Callback
6.3. Additional Considerations for Custom Queries
7. Implementation Details
7.1. Shadow Tables
7.2. Integrity Check using the rtreecheck() SQL function
1. Overview
An R-Tree is a special index that is designed for doing range
queries. R-Trees are most commonly used in geospatial systems where
each entry is a rectangle with minimum and maximum X and Y
coordinates. Given a query rectangle, an R-Tree is able to quickly
find all entries that are contained within the query rectangle or
which overlap the query rectangle. This idea is easily extended to
three dimensions for use in CAD systems. R-Trees also find use in
time-domain range look-ups. For example, suppose a database records
the starting and ending times for a large number of events. A R-Tree
is able to quickly find all events that were active at any time
during a given time interval, or all events that started during a
particular time interval, or all events that both started and ended
within a given time interval. And so forth.
The R-Tree concept originated with Toni Guttman: R-Trees: A Dynamic
Index Structure for Spatial Searching, Proc. 1984 ACM SIGMOD
International Conference on Management of Data, pp. 47-57. The
implementation found in SQLite is a refinement of Guttman's original
idea, commonly called "R*Trees", that was described by Norbert
Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The
R*-Tree: An Efficient and Robust Access Method for Points and
Rectangles. SIGMOD Conference 1990: 322-331.
2. Compiling The R*Tree Module
The source code to the SQLite R*Tree module is included as part of
the amalgamation but is disabled by default. To enable the R*Tree
module, simply compile with the SQLITE_ENABLE_RTREE C-preprocessor
macro defined. With many compilers, this is accomplished by adding
the option "-DSQLITE_ENABLE_RTREE=1" to the compiler command-line.
3. Using the R*Tree Module
The SQLite R*Tree module is implemented as a virtual table. Each
R*Tree index is a virtual table with an odd number of columns between
3 and 11. The first column is always a 64-bit signed integer primary
key. The other columns are pairs, one pair per dimension, containing
the minimum and maximum values for that dimension, respectively. A
1-dimensional R*Tree thus has 3 columns. A 2-dimensional R*Tree has 5
columns. A 3-dimensional R*Tree has 7 columns. A 4-dimensional R*Tree
has 9 columns. And a 5-dimensional R*Tree has 11 columns. The SQLite
R*Tree implementation does not support R*Trees wider than 5
dimensions.
The first column of an SQLite R*Tree is similar to an integer primary
key column of a normal SQLite table. It may only store a 64-bit
signed integer value. Inserting a NULL value into this column causes
SQLite to automatically generate a new unique primary key value. If
an attempt is made to insert any other non-integer value into this
column, the r-tree module silently converts it to an integer before
writing it into the database.
The min/max-value pair columns are stored as 32-bit floating point
values for "rtree" virtual tables or as 32-bit signed integers in
"rtree_i32" virtual tables. Unlike regular SQLite tables which can
store data in a variety of datatypes and formats, the R*Tree rigidly
enforce these storage types. If any other type of value is inserted
into such a column, the r-tree module silently converts it to the
required type before writing the new record to the database.
3.1. Creating An R*Tree Index
A new R*Tree index is created as follows:
CREATE VIRTUAL TABLE USING rtree();
The is the name your application chooses for the R*Tree index
and is a comma separated list of between 3 and 11
columns. The virtual table creates three shadow tables to
actually store its content. The names of these shadow tables are:
_node
_rowid
_parent
The shadow tables are ordinary SQLite data tables. You can query them
directly if you like, though this unlikely to reveal anything
particularly useful. And you can UPDATE, DELETE, INSERT or even DROP
the shadow tables, though doing so will corrupt your R*Tree index. So
it is best to simply ignore the shadow tables. Recognize that they
hold your R*Tree index information and let it go as that.
As an example, consider creating a two-dimensional R*Tree index for
use in spatial queries:
CREATE VIRTUAL TABLE demo_index USING rtree(
id, -- Integer primary key
minX, maxX, -- Minimum and maximum X coordinate
minY, maxY -- Minimum and maximum Y coordinate
);
3.1.1. Column naming details
In the argments to "rtree" in the CREATE VIRTUAL TABLE statement, the
names of the columns are taken from the first token of each argument.
All subsequent tokens within each argument are silently ignored. This
means, for example, that if you try to give a column a type affinity
or add a constraint such as UNIQUE or NOT NULL or DEFAULT to a
column, those extra tokens are accepted as valid, but they do not
change the behavior of the rtree. In an RTREE virtual table, the
first column always has a type affinity of INTEGER and all other data
columns have a type affinity of NUMERIC.
Recommended practice is to omit any extra tokens in the rtree
specification. Let each argument to "rtree" be a single ordinary
label that is the name of the corresponding column, and omit all
other tokens from the argument list.
3.2. Populating An R*Tree Index
The usual INSERT, UPDATE, and DELETE commands work on an R*Tree index
just like on regular tables. So to insert some data into our sample
R*Tree index, we can do something like this:
INSERT INTO demo_index VALUES(
1, -- Primary key -- SQLite.org headquarters
-80.7749, -80.7747, -- Longitude range
35.3776, 35.3778 -- Latitude range
);
INSERT INTO demo_index VALUES(
2, -- NC 12th Congressional District in 2010
-81.0, -79.6,
35.0, 36.2
);
The entries above might represent (for example) a bounding box around
the main office for SQLite.org and bounding box around the 12th
Congressional District of North Carolina (prior to the 2011
redistricting) in which SQLite.org was located.
3.3. Querying An R*Tree Index
Any valid query will work against an R*Tree index. But the R*Tree
implementation is designed to make two kinds of queries especially
efficient. First, queries against the primary key are efficient:
SELECT * FROM demo_index WHERE id=1;
Of course, an ordinary SQLite table will also do a query against its
integer primary key efficiently, so the previous is no big deal. The
real reason for using an R*Tree is so that you can efficiently do
inequality queries against the coordinate ranges. To find all
elements of the index that are contained within the vicinity of
Charlotte, North Carolina, one might do:
SELECT id FROM demo_index
WHERE minX>=-81.08 AND maxX<=-80.58
AND minY>=35.00 AND maxY<=35.44;
The query above would very quickly locate the id of 1 even if the
R*Tree contained millions of entries. The previous is an example of a
"contained-within" query. The R*Tree also supports "overlapping"
queries. For example, to find all bounding boxes that overlap the
Charlotte area:
SELECT id FROM demo_index
WHERE maxX>=-81.08 AND minX<=-80.58
AND maxY>=35.00 AND minY<=35.44;
This second query would find both entry 1 (the SQLite.org office)
which is entirely contained within the query box and also the 12th
Congressional District which extends well outside the query box but
still overlaps the query box.
Note that it is not necessary for all coordinates in an R*Tree index
to be constrained in order for the index search to be efficient. One
might, for example, want to query all objects that overlap with the
35th parallel:
SELECT id FROM demo_index
WHERE maxY>=35.0 AND minY<=35.0;
But, generally speaking, the more constraints that the R*Tree module
has to work with, and the smaller the bounding box, the faster the
results will come back.
3.4. Roundoff Error
By default, coordinates are stored in an R*Tree using 32-bit floating
point values. When a coordinate cannot be exactly represented by a
32-bit floating point number, the lower-bound coordinates are rounded
down and the upper-bound coordinates are rounded up. Thus, bounding
boxes might be slightly larger than specified, but will never be any
smaller. This is exactly what is desired for doing the more common
"overlapping" queries where the application wants to find every entry
in the R*Tree that overlaps a query bounding box. Rounding the entry
bounding boxes outward might cause a few extra entries to appears in
an overlapping query if the edge of the entry bounding box
corresponds to an edge of the query bounding box. But the overlapping
query will never miss a valid table entry.
However, for a "contained-within" style query, rounding the bounding
boxes outward might cause some entries to be excluded from the result
set if the edge of the entry bounding box corresponds to the edge of
the query bounding box. To guard against this, applications should
expand their contained-within query boxes slightly (by 0.000012%) by
rounding down the lower coordinates and rounding up the top
coordinates, in each dimension.
3.5. Reading And Writing At The Same Time
It is the nature of the Guttman R-Tree algorithm that any write might
radically restructure the tree, and in the process change the scan
order of the nodes. For this reason, it is not generally possible to
modify the R-Tree in the middle of a query of the R-Tree. Attempts to
do so will fail with a SQLITE_LOCKED "database table is locked"
error.
So, for example, suppose an application runs one query against an
R-Tree like this:
SELECT id FROM demo_index
WHERE maxY>=35.0 AND minY<=35.0;
Then for each "id" value returned, suppose the application creates an
UPDATE statement like the following and binds the "id" value returned
against the "?1" parameter:
UPDATE demo_index SET maxY=maxY+0.5 WHERE id=?1;
Then the UPDATE might fail with an SQLITE_LOCKED error. The reason is
that the initial query has not run to completion. It is remembering
its place in the middle of a scan of the R-Tree. So an update to the
R-Tree cannot be tolerated as this would disrupt the scan.
It is also possible to express this kind of simultaneous read and
write on an R-Tree within a single query, for example if an UPDATE
statement tries to change the value of one row of the R-Tree based on
a complicated query from another row of the same R-Tree, perhaps
something like this:
UPDATE demo_index
SET maxY = (SELECT max(maxX) FROM demo_index AS x2
WHERE x2.maxY>demo_index.x2)
WHERE maxY>=35.0 AND minY<=35.0;
This is a limitation of the R-Tree extension only. Ordinary tables in
SQLite are able to read and write at the same time. Other virtual
tables might (or might not) also that capability. And R-Tree can
appear to read and write at the same time in some circumstances, if
it can figure out how to reliably run the query to completion before
starting the update. But you shouldn't count on that for every query.
Generally speaking, it is best to avoid running queries and updates
to the same R-Tree at the same time.
If you really need to update an R-Tree based on complex queries
against the same R-Tree, it is best to run the complex queries first
and store the results in a temporary table, then update the R-Tree
based on the values stored in the temporary table.
4. Using R*Trees Effectively
For SQLite versions prior to 3.24.0 (2018-06-04), the only
information that an R*Tree index stores about an object is its
integer ID and its bounding box. Additional information needs to be
stored in separate tables and related to the R*Tree index using the
primary key. For the example above, one might create an auxiliary
table as follows:
CREATE TABLE demo_data(
id INTEGER PRIMARY KEY, -- primary key
objname TEXT, -- name of the object
objtype TEXT, -- object type
boundary BLOB -- detailed boundary of object
);
In this example, the demo_data.boundary field is intended to hold
some kind of binary representation of the precise boundaries of the
object. The R*Tree index only holds an axis-aligned rectangular
boundary for the object. The R*Tree boundary is just an approximation
of the true object boundary. So what typically happens is that the
R*Tree index is used to narrow a search down to a list of candidate
objects and then more detailed and expensive computations are done on
each candidate to find if the candidate truly meets the search
criteria.
Key Point: An R*Tree index does not normally provide the exact
answer but merely reduces the set of potential answers from
millions to dozens.
Suppose the demo_data.boundary field holds some proprietary data
description of a complex two-dimensional boundary for an object and
suppose that the application has used the sqlite3_create_function()
interface to created application-defined functions "contained_in" and
"overlaps" accepting two demo_data.boundary objects and return true
or false. One may assume that "contained_in" and "overlaps" are
relatively slow functions that we do not want to invoke too
frequently. Then an efficient way to find the name of all objects
located within the North Carolina 12th District, one may be to run a
query like this:
SELECT objname FROM demo_data, demo_index
WHERE demo_data.id=demo_index.id
AND contained_in(demo_data.boundary, :boundary)
AND minX>=-81.0 AND maxX<=-79.6
AND minY>=35.0 AND maxY<=36.2;
In the query above, one would presumably bind the binary BLOB
description of the precise boundary of the 12th district to the
":boundary" parameter.
Notice how the query above works: The R*Tree index runs in the outer
loop to find entries that are contained within the bounding box of
longitude -81..-79.6 and latitude 35.0..36.2. For each object
identifier found, SQLite looks up the corresponding entry in the
demo_data table. It then uses the boundary field from the demo_data
table as a parameter to the contained_in() function and if that
function returns true, the objname field from the demo_data table is
returned as the next row of query result.
One would get the same answer without the use of the R*Tree index
using the following simpler query:
SELECT objname FROM demo_data
WHERE contained_in(demo_data.boundary, :boundary);
The problem with this latter query is that it must apply the
contained_in() function to millions of entries in the demo_data
table. The use of the R*Tree in the penultimate query reduces the
number of calls to contained_in() function to a small subset of the
entire table. The R*Tree index did not find the exact answer itself,
it merely limited the search space.
4.1. Auxiliary Columns
Beginning with SQLite version 3.24.0 (2018-06-04), r-tree tables can
have auxiliary columns that store arbitrary data. Auxiliary columns
can be used in place of secondary tables such as "demo_data".
Auxiliary columns are marked with a "+" symbol before the column
name. Auxiliary columns must come after all of the coordinate
boundary columns. There is a limit of no more than 100 auxiliary
columns. The following example shows an r-tree table with auxiliary
columns that is equivalent to the two tables "demo_index" and
"demo_data" above:
CREATE VIRTUAL TABLE demo_index2 USING rtree(
id, -- Integer primary key
minX, maxX, -- Minimum and maximum X coordinate
minY, maxY, -- Minimum and maximum Y coordinate
+objname TEXT, -- name of the object
+objtype TEXT, -- object type
+boundary BLOB -- detailed boundary of object
);
By combining location data and related information into the same
table, auxiliary columns can provide a cleaner model and reduce the
need to joins. For example, the earlier join between demo_index and
demo_data can now be written as a simple query, like this:
SELECT objname FROM demo_index2
WHERE contained_in(boundary, :boundary)
AND minX>=-81.0 AND maxX<=-79.6
AND minY>=35.0 AND maxY>=36.2;
4.1.1. Limitations
For auxiliary columns, only the name of the column matters. The type
affinity is ignored. Constraints such as NOT NULL, UNIQUE,
REFERENCES, or CHECK are also ignored. However, future versions of
SQLite might start paying attention to the type affinity and
constraints, so users of auxiliary columns are advised to leave both
blank, to avoid future compatibility problems.
5. Integer-Valued R-Trees
The default virtual table ("rtree") normally stores coordinates as
single-precision (4-byte) floating point numbers. If integer
coordinates are desired, declare the table using "rtree_i32" instead:
CREATE VIRTUAL TABLE intrtree USING rtree_i32(id,x0,x1,y0,y1,z0,z1);
An rtree_i32 stores coordinates as 32-bit signed integers. But it
still using floating point computations internally as part of the
r-tree algorithm.
6. Custom R-Tree Queries
By using standard SQL expressions in the WHERE clause of a SELECT
query, a programmer can query for all R*Tree entries that intersect
with or are contained within a particular bounding-box. Custom R*Tree
queries, using the MATCH operator in the WHERE clause of a SELECT,
allow the programmer to query for the set of R*Tree entries that
intersect any arbitrary region or shape, not just a box. This
capability is useful, for example, in computing the subset of objects
in the R*Tree that are visible from a camera positioned in 3-D space.
Regions for custom R*Tree queries are defined by R*Tree geometry
callbacks implemented by the application and registered with SQLite
via a call to one of the following two APIs:
int sqlite3_rtree_query_callback(
sqlite3 *db,
const char *zQueryFunc,
int (*xQueryFunc)(sqlite3_rtree_query_info*),
void *pContext,
void (*xDestructor)(void*)
);
int sqlite3_rtree_geometry_callback(
sqlite3 *db,
const char *zGeom,
int (*xGeom)(sqlite3_rtree_geometry *, int nCoord, double *aCoord, int *pRes),
void *pContext
);
The sqlite3_rtree_query_callback() became available with SQLite
version 3.8.5 (2014-06-04) and is the preferred interface. The
sqlite3_rtree_geometry_callback() is an older and less flexible
interface that is supported for backwards compatibility.
A call to one of the above APIs creates a new SQL function named by
the second parameter (zQueryFunc or zGeom). When that SQL function
appears on the right-hand side of the MATCH operator and the
left-hand side of the MATCH operator is any column in the R*Tree
virtual table, then the callback defined by the third argument
(xQueryFunc or xGeom) is invoked to determine if a particular object
or subtree overlaps the desired region.
For example, a query like the following might be used to find all
R*Tree entries that overlap with a circle centered a 45.3,22.9 with a
radius of 5.0:
SELECT id FROM demo_index WHERE id MATCH circle(45.3, 22.9, 5.0)
The SQL syntax for custom queries is the same regardless of which
interface, sqlite3_rtree_geometry_callback() or
sqlite3_rtree_query_callback(), is used to register the SQL function.
However, the newer query-style callbacks give the application greater
control over how the query proceeds.
6.1. The Legacy xGeom Callback
The legacy xGeom callback is invoked with four arguments. The first
argument is a pointer to an sqlite3_rtree_geometry structure which
provides information about how the SQL function was invoked. The
second argument is the number of coordinates in each r-tree entry,
and is always the same for any given R*Tree. The number of
coordinates is 2 for a 1-dimensional R*Tree, 4 for a 2-dimensional
R*Tree, 6 for a 3-dimensional R*Tree, and so forth. The third
argument, aCoord[], is an array of nCoord coordinates that defines a
bounding box to be tested. The last argument is a pointer into which
the callback result should be written. The result is zero if the
bounding-box defined by aCoord[] is completely outside the region
defined by the xGeom callback and the result is non-zero if the
bounding-box is inside or overlaps with the xGeom region. The xGeom
callback should normally return SQLITE_OK. If xGeom returns anything
other than SQLITE_OK, then the r-tree query will abort with an error.
The sqlite3_rtree_geometry structure that the first argument to the
xGeom callback points to has a structure shown below. The exact same
sqlite3_rtree_geometry structure is used for every callback for same
MATCH operator in the same query. The contents of the
sqlite3_rtree_geometry structure are initialized by SQLite but are
not subsequently modified. The callback is free to make changes to
the pUser and xDelUser elements of the structure if desired.
typedef struct sqlite3_rtree_geometry sqlite3_rtree_geometry;
struct sqlite3_rtree_geometry {
void *pContext; /* Copy of pContext passed to s_r_g_c() */
int nParam; /* Size of array aParam */
double *aParam; /* Parameters passed to SQL geom function */
void *pUser; /* Callback implementation user data */
void (*xDelUser)(void *); /* Called by SQLite to clean up pUser */
};
The pContext member of the sqlite3_rtree_geometry structure is always
set to a copy of the pContext argument passed to
sqlite3_rtree_geometry_callback() when the callback is registered.
The aParam[] array (size nParam) contains the parameter values passed
to the SQL function on the right-hand side of the MATCH operator. In
the example "circle" query above, nParam would be set to 3 and the
aParam[] array would contain the three values 45.3, 22.9 and 5.0.
The pUser and xDelUser members of the sqlite3_rtree_geometry
structure are initially set to NULL. The pUser variable may be set by
the callback implementation to any arbitrary value that may be useful
to subsequent invocations of the callback within the same query (for
example, a pointer to a complicated data structure used to test for
region intersection). If the xDelUser variable is set to a non-NULL
value, then after the query has finished running SQLite automatically
invokes it with the value of the pUser variable as the only argument.
In other words, xDelUser may be set to a destructor function for the
pUser value.
The xGeom callback always does a depth-first search of the r-tree.
6.2. The New xQueryFunc Callback
The newer xQueryFunc callback receives more information from the
r-tree query engine on each call, and it sends more information back
to the query engine before it returns. To help keep the interface
manageable, the xQueryFunc callback sends and receives information
from the query engine as fields in the sqlite3_rtree_query_info
structure:
struct sqlite3_rtree_query_info {
void *pContext; /* pContext from when function registered */
int nParam; /* Number of function parameters */
sqlite3_rtree_dbl *aParam; /* value of function parameters */
void *pUser; /* callback can use this, if desired */
void (*xDelUser)(void*); /* function to free pUser */
sqlite3_rtree_dbl *aCoord; /* Coordinates of node or entry to check */
unsigned int *anQueue; /* Number of pending entries in the queue */
int nCoord; /* Number of coordinates */
int iLevel; /* Level of current node or entry */
int mxLevel; /* The largest iLevel value in the tree */
sqlite3_int64 iRowid; /* Rowid for current entry */
sqlite3_rtree_dbl rParentScore; /* Score of parent node */
int eParentWithin; /* Visibility of parent node */
int eWithin; /* OUT: Visiblity */
sqlite3_rtree_dbl rScore; /* OUT: Write the score here */
/* The following fields are only available in 3.8.11 and later */
sqlite3_value **apSqlParam; /* Original SQL values of parameters */
};
The first five fields of the sqlite3_rtree_query_info structure are
identical to the sqlite3_rtree_geometry structure, and have exactly
the same meaning. The sqlite3_rtree_query_info structure also
contains nCoord and aCoord fields which have the same meaning as the
parameter of the same name in the xGeom callback.
The xQueryFunc must set the eWithin field of sqlite3_rtree_query_info
to one of the values NOT_WITHIN, PARTLY_WITHIN, or FULLY_WITHIN
depending on whether or not the bounding box defined by aCoord[] is
completely outside the region, overlaps the region, or is completely
inside the region, respectively. In addition, the xQueryFunc must set
the rScore field to a non-negative value that indicates the order in
which subtrees and entries of the query should be analyzed and
returned. Smaller scores are processed first.
As its name implies, an R*Tree is organized as a tree. Each node of
the tree is a bounding box. The root of the tree is a bounding box
that encapsulates all elements of the tree. Beneath the root are a
number of subtrees (typically 20 or more) each with their own smaller
bounding boxes and each containing some subset of the R*Tree entries.
The subtrees may have sub-subtrees, and so forth until finally one
reaches the leaves of the tree which are the actual R*Tree entries.
An R*Tree query is initialized by making the root node the only entry
in a priority queue sorted by rScore. The query proceeds by
extracting the entry from the priority queue that has the lowest
score. If that entry is a leaf (meaning that it is an actual R*Tree
entry and not a subtree) then that entry is returned as one row of
the query result. If the extracted priority queue entry is a node (a
subtree), then sub-subtrees or leaves contained within that entry are
passed to the xQueryFunc callback, one by one. Those subelements for
which the xQueryFunc callback sets eWithin to PARTLY_WITHIN or
FULLY_WITHIN are added to the priority queue using the score supplied
by the callback. Subelements that return NOT_WITHIN are discarded.
The query runs until the priority queue is empty.
Every leaf entry and node (subtree) within the R*Tree has an integer
"level". The leaves have a level of 0. The first containing subtree
of the leaves has a level of 1. The root of the R*Tree has the
largest level value. The mxLevel entry in the
sqlite3_rtree_query_info structure is the level value for the root of
the R*Tree. The iLevel entry in sqlite3_rtree_query_info gives the
level for the object being interrogated.
Most R*Tree queries use a depth-first search. This is accomplished by
setting the rScore equal to iLevel. A depth-first search is usually
preferred since it minimizes the number of elements in the priority
queue, which reduces memory requirements and speeds processing.
However, some application may prefer a breadth-first search, which
can be accomplished by setting rScore to mxLevel-iLevel. By creating
more complex formulas for rScore, applications can exercise detailed
control over the order in which subtree are searched and leaf R*Tree
entries are returned. For example, in an application with many
millions of R*Tree entries, the rScore might be arranged so that the
largest or most significant entries are returned first, allowing the
application to display the most important information quickly, and
filling in smaller and less important details as they become
available.
Other information fields of the sqlite3_rtree_query_info structure
are available for use by the xQueryFunc callback, if desired. The
iRowid field is the rowid (the first of the 3 to 11 columns in the
R*Tree) for the element being considered. iRowid is only valid for
leaves. The eParentWithin and rParentScore values are copies of the
eWithin and rScore values from the containing subtree of the current
row. The anQueue field is an array of mxLevel+1 unsigned integers
that tell the current number of elements in the priority queue at
each level.
6.3. Additional Considerations for Custom Queries
The MATCH operator of a custom R*Tree query function must be a
top-level AND-connected term of the WHERE clause, or else it will not
be usable by the R*Tree query optimizer and the query will not be
runnable. If the MATCH operator is connected to other terms of the
WHERE clause via an OR operator, for example, the query will fail
with an error.
Two or more MATCH operators are allowed in the same WHERE clause, as
long as they are connected by AND operators. However, the R*Tree
query engine only contains a single priority queue. The priority
assigned to each node in the search is the lowest priority returned
by any of the MATCH operators.
7. Implementation Details
The following sections describe some low-level details of the R*Tree
implementation, that might be useful for trouble-shooting or
performance analysis.
7.1. Shadow Tables
The content of an R*Tree index is actually stored in three ordinary
SQLite tables with names derived from the name of the R*Tree. These
three tables are called "shadow tables". This is their schema:
CREATE TABLE %_node(nodeno INTEGER PRIMARY KEY, data BLOB)
CREATE TABLE %_parent(nodeno INTEGER PRIMARY KEY, parentnode INTEGER)
CREATE TABLE %_rowid(rowid INTEGER PRIMARY KEY, nodeno INTEGER)
The "%" in the name of each shadow table is replaced by the name of
the R*Tree virtual table. So, if the name of the R*Tree table is
"xyz" then the three shadow tables would be "xyz_node", "xyz_parent",
and "xyz_rowid".
There is one entry in the %_node table for each R*Tree node. An
R*Tree node consists of one or more entries that are proximate to one
another. The nodes of an R*Tree for a tree. All nodes other than the
root have an entry in the %_parent shadow table that identifies the
parent node. Each entry in an R*Tree has a rowid. The %_rowid shadow
table maps entry rowids to the node that contains that entry.
7.2. Integrity Check using the rtreecheck() SQL function
The scalar SQL function rtreecheck(R) or rtreecheck(S,R) runs an
integrity check on the rtree table named R contained within database
S. The function returns a human-language description of any problems
found, or the string 'ok' if everything is ok. Running rtreecheck()
on an R*Tree virtual table is similar to running PRAGMA
integrity_check on a database.
Example: To verify that an R*Tree named "demo_index" is well-formed
and internally consistent, run:
SELECT rtreecheck('demo_index');
The rtreecheck() function performs the following checks:
1. For each cell in the r-tree structure (%_node table), that:
a. for each dimension, (coord1 <= coord2).
b. unless the cell is on the root node, that the cell is bounded
by the parent cell on the parent node.
c. for leaf nodes, that there is an entry in the %_rowid table
corresponding to the cell's rowid value that points to the
correct node.
d. for cells on non-leaf nodes, that there is an entry in the
%_parent table mapping from the cell's child node to the node
that it resides on.
2. That there are the same number of entries in the %_rowid table as
there are leaf cells in the r-tree structure, and that there is a
leaf cell that corresponds to each entry in the %_rowid table.
3. That there are the same number of entries in the %_parent table
as there are non-leaf cells in the r-tree structure, and that
there is a non-leaf cell that corresponds to each entry in the
%_parent table.