Berkeley DB Reference Guide: Query Processor

Berkeley DB Reference Guide:
Berkeley DB XML

Query Processor

Berkeley DB XML queries are expressed as XPath expressions. Berkeley DB XML query processing proceeds in four stages: the XPath Parser parses the expression, the Query Plan Generator creates an initial query plan, the Query Plan Optimizer optimizes the plan, and the Query Plan Execution Engine executes the optimized plan against a document container to produce the results.

The following sections describe each part of the XPath query execution process in detail.

XPath Parser

The XPath parser validates the query expression against the XPath grammar, and transforms it into a form understood by the Query Plan Generator. The XPath parser supports the syntax specified in the W3C XPath 1.0 recommendation document.

Query Plan Generation

The Query Plan Generator transforms the expression into a series of operations that can be performed against a container and its indices. This raw plan may consist of sequential container scans, index lookups, document filters and projections.

Query Plan Optimization

The Query Plan Optimizer transforms the raw query plan into a more efficient plan. Example optimizations are: early constant expression resolution, set operation short-circuit, index lookup optimization, and identification of candidate sets that match the result set.

Query Plan Execution

The Query Plan Execution engine uses run-time optimizations such as Boolean expression optimization, set operation optimization, filter optimization, and cost-based operation ordering to improve the efficiency of query plan execution.

Query plan execution proceeds in three phases: candidate set creation, candidate set filtering, and result set projection.

The index lookup operations produce a candidate set of documents that may match the query expression. The candidate set is then filtered to produce a set of result documents. Each document in the result set is parsed into a DOM tree and evaluated against the XPath expression to determine if it matches the expression. In cases where the query candidate set and result set will be equal, the filtration phase is not performed. For performance reasons the filter and project phases are combined.

The following example demonstrates how to query a container and iterate through the result set:

void example()
{
	// Create and open a container.
	XmlContainer container(0,"test.dbxml");
	container.open(0,DB_CREATE);

	// Insert a document into the container.
	XmlDocument document;
	std::string content("<book><title>Databases</title></book>");
	document.setContent(content);
	container.putDocument(0,document);

	// Query the container for the document.
	XmlResults results(container.queryWithXPath(0,"/book"));
	XmlValue value;
	while(!results.next(0,value))
	{
		XmlDocument document(value.asDocument(0));
		std::cout
		  << document.getID() << " = " << value.asString(0) << "\n";
	}
	container.close();
}

Query Context

All queries are executed within a context. Berkeley DB XML provides a class that encapsulates the context within which a query is performed against a container. The query context consists of a namespace mapping, variable bindings, and flags that indicate how the query result set should be determined and returned to the caller.

Namespaces

XPath query expressions can refer to namespace prefixes, but cannot define them. The Berkeley DB XML query context class provides methods that allow the application to manage namespace prefix to URI mappings. By default the prefix 'dbxml' is defined to be 'http://www.sleepycat.com/2002/dbxml'.

The following code example demonstrates the definition of a namespace within a query context:

void example()
{
	// Create and open a container.
	XmlContainer container(0,"test.dbxml");
	container.open(0,DB_CREATE);

	// Create a context, and define a namespace prefix.
	XmlQueryContext context;
	context.setNamespace("books","http://foo.bar.com/books.dtd");

	// Perform a query against the container within the context.
	container.queryWithXPath(
		0, "/*[books:title='Databases']", &context);
	container.close();
}

Variable Bindings

XPath expressions can refer to variables, but cannot define values for them. The Berkeley DB XML query context class provides methods that allow the caller to manage variable-to-value bindings.

The following code example demonstrates how to bind and reference variables within a query:

void example()
{
	// Create and open a container.
	XmlContainer container(0,"test.dbxml");
	container.open(0,DB_CREATE);

	// Create a context and define a variable.
	XmlQueryContext context;
	context.setVariableValue("title"," Databases");

	// Query the container within a context referring to a variable.
	container.queryWithXPath(0,"//*[title=$title]",&context);
	container.close();
}

Result Type

The Berkeley DB XML query context class allows the application to define whether the query should return candidate documents, result documents, or result values. A candidate document is a document that may match the XPath expression, a result document is a document that does match the XPath expression, and a result value is the result of executing the XPath expression against the resultant document.

For some expressions it might be known that the candidate set is equivalent to the result set. For these expressions there is no need to pass the candidate documents through a filter to eliminate false positives. The query processor can detect some expressions of this nature, but not all. The client application may request that the system return candidate documents so that the application may perform its own false-positive elimination.

Evaluation Type

The client application can specify that query results be computed either eagerly or lazily. Eager query evaluation means that the results will be returned to the client application once they have been fully generated and loaded into memory. Lazy query evaluation means that results will be gradually streamed back to the client application as the client application iterates through the result set.

Berkeley DB Reference Guide:Berkeley DB XML