The
luceneIndex
accessor is exported by the urn:org:ten60:netkernel:mod:lucene module.
Import this module to gain access to the accessor.
Syntax
URI
active:luceneIndex
Argument
Rules
Description
operand
Optional
the document to index
operator
Mandatory
an operator document containing an index URI, optionally a
flag to reset and empty the index, and a mandatory document id to index the optional
operand document under. Arbitrary fields can also be added to the index as name-value
pairs.
req=context.createSubRequest("active:luceneIndex");
req.addArgument("operator", [resource representation, aspect, or URI] );
result=context.issueSubRequest(req);
Purpose
Lucene is a full text indexing and searching technology from Apache.
Lucene provides low-level text indexing and searching facilities. This accessor adds
a layer over Lucene to support indexing over the content XML documents preserving
the xpath locations of the content. This approach allows content to be located down
to the element level across multiple documents.
The luceneIndex accessor supports creation and clearing of indexes.
Indexes are specified using the <index> element in the operator document.
The index location must be an ffcpl: schemed URI that points to a unique directory
space. The physical lucene index will occupy this directory and create files within
it.
Before documents can be indexed the index must be created. Specifying a <reset/>
flag in the operator document will initialise a new index or empty an existing one.
After adding to an index it must be closed and optimised for searching. Using a single
luceneIndex instruction with just the index specified in the operator and a <close/>
tag.
The index usually indexes the contents of the operand document against the contents
xpath within the document. This is useful when indexing well structured documents and search
for particular fields. By specifying the <flat/> flag in the operator document
all the text of the document is merged and indexed at the root. This results in smaller and
more efficient searches over more freeform text based documents.
The URI of indexed documents is stored within the index in addition to an independent id
field. This is specified with the <id> element within the operator
document.
Entries can be deleted from the index by specifying the <delete/>
element within the operator document and supplying an id. All entries with that id will be
deleted.
Additional, arbitrary fields can also be stored within the index (enabling searching by a
human-readable name instead of a system id for example) by adding a <fields>
element within the operator document that contains the required fields as name-value pairs,
where the field name is the element and the field value is the text of that element.
NOTE - as these field names and values are stored within the index it is good pratice to keep
them as short as possible.