luceneIndexluceneIndex
Lucene Full Text Indexing
Home > Books > NetKernel API and Services Reference > Accessor Listing > Lucene Full Text Search > luceneIndex

Rate this page:
Really useful
Satisfactory
Not helpful
Confusing
Incorrect
Unsure
Extra comments:


Module

urn:org:ten60:netkernel:mod:lucene

The luceneIndex accessor is exported by the urn:org:ten60:netkernel:mod:lucene module. Import this module to gain access to the accessor.

Syntax

URI
active:luceneIndex

ArgumentRulesDescription
operandOptional the document to index
operatorMandatory an operator document containing an index URI, optionally a flag to reset and empty the index, and a mandatory document id to index the optional operand document under. Arbitrary fields can also be added to the index as name-value pairs.

Example Usage

DPML

<instr>
  <type>luceneIndex</type>
  <operand>file:/addressbook.xml</operand>
  <operator>
    <luceneIndex>
      <index>ffcpl:/org/ten60/test/myIndex/</index>
      <reset />
      <close />
      <delete />
      <flat />
      <id>My Addressbook</id>
      <fields>
        <name1>value1</name1>
        <name2>value2</name2>
        <name3>value3</name3>
      </fields>
    </luceneIndex>
  </operator>
</instr>

NetKernel Foundation API

req=context.createSubRequest("active:luceneIndex");
req.addArgument("operator", [resource representation, aspect, or URI] );
result=context.issueSubRequest(req);

Purpose

Lucene is a full text indexing and searching technology from Apache. Lucene provides low-level text indexing and searching facilities. This accessor adds a layer over Lucene to support indexing over the content XML documents preserving the xpath locations of the content. This approach allows content to be located down to the element level across multiple documents.

The luceneIndex accessor supports creation and clearing of indexes. Indexes are specified using the <index> element in the operator document. The index location must be an ffcpl: schemed URI that points to a unique directory space. The physical lucene index will occupy this directory and create files within it.

Before documents can be indexed the index must be created. Specifying a <reset/> flag in the operator document will initialise a new index or empty an existing one.

After adding to an index it must be closed and optimised for searching. Using a single luceneIndex instruction with just the index specified in the operator and a <close/> tag.

The index usually indexes the contents of the operand document against the contents xpath within the document. This is useful when indexing well structured documents and search for particular fields. By specifying the <flat/> flag in the operator document all the text of the document is merged and indexed at the root. This results in smaller and more efficient searches over more freeform text based documents.

The URI of indexed documents is stored within the index in addition to an independent id field. This is specified with the <id> element within the operator document.

Entries can be deleted from the index by specifying the <delete/> element within the operator document and supplying an id. All entries with that id will be deleted.

Additional, arbitrary fields can also be stored within the index (enabling searching by a human-readable name instead of a system id for example) by adding a <fields> element within the operator document that contains the required fields as name-value pairs, where the field name is the element and the field value is the text of that element. NOTE - as these field names and values are stored within the index it is good pratice to keep them as short as possible.

© 2003-2007, 1060 Research Limited. 1060 registered trademark, NetKernel trademark of 1060 Research Limited.