luceneSearchluceneSearch
Lucene Full Text Indexing
Home > Books > NetKernel API and Services Reference > Accessor Listing > Lucene Full Text Search > luceneSearch

Rate this page:
Really useful
Satisfactory
Not helpful
Confusing
Incorrect
Unsure
Extra comments:


Module

urn:org:ten60:netkernel:mod:lucene

The luceneSearch accessor is exported by the urn:org:ten60:netkernel:mod:lucene module. Import this module to gain access to the accessor.

Syntax

URI
active:luceneSearch

ArgumentRulesDescription
operatorMandatory an operator document containing an index URI, and a search criteria

Example Usage

DPML

<instr>
  <type>luceneSearch</type>
  <operator>
    <luceneSearch>
      <index>ffcpl:/org/ten60/test/myIndex/</index>
      <query>red cat</query>
      <unique />
    </luceneSearch>
  </operator>
</instr>

NetKernel Foundation API

req=context.createSubRequest("active:luceneSearch");
req.addArgument("operator", [resource representation, aspect, or URI] );
result=context.issueSubRequest(req);

Purpose

Lucene is a full text indexing and searching technology from Apache. Lucene provides low-level text indexing and searching facilities. This accessor adds a layer over Lucene to support indexing over the content XML documents preserving the xpath locations of the content. This approach allows content to be located down to the element level across multiple documents.

The luceneSearch accessor supports searching over a single lucene index.

The <unique/> tag in the operator document causes the search results to be filter for on the best match per indexed docId.

Query Syntax

By default the search looks for complete words in the text content of the document. Multiple words can be specified and these are 'OR'ed together (matches will all of them score highest). 'AND' can be used to only find all keywords.

Examples:

  • cow only find documents with the word cow mentioned
  • blue cow only find documents with the words cow or blue mentioned
  • blue AND cow only find documents with the words cow and blue mentioned
  • cow AND basis:/animal/name only find documents with the word cow in elements with the path /animal/name
  • blue AND basis:colour only find documents with the word blue in any elements with the name colour
  • docid:addressbook.xml only find matches in the document indexed under the id of addressbook.xml

This may not be the whole story- digging deeper into the lucene document may reveal more.

Search result document

Example result document:

<luceneQuery>
  <match>
    <basis>/root/name</basis>
    <xpath>/root/name[1]</xpath>
    <uri>ffcpl:/org/ten60/ura/lucene/test/doc1.xml</uri>
    <docid>doc1.xml</docid>
    <score>1.0</score>
  </match>
  <match>
    <basis>/root</basis>
    <xpath>/root</xpath>
    <uri>ffcpl:/org/ten60/ura/lucene/test/doc1.xml</uri>
    <docid>doc1.xml</docid>
    <score>0.53795576</score>
  </match>
</luceneQuery>

<basis> contains an a basis xpath expression that describes the effective element type. Multiple elements may have the same basis.
<xpath> contains an a xpath expression locates a unique single element with the document.
<uri> contains the uri of the originally indexed document
<docid> contains the id that the document was indexed under
<score> contains a scoring for the match normalized between one and zero. One being a perfect match. A match is lower if it is found within a larger body of text. A match is lower if not all of multiple keywords matched.

Search results remain valid whilst no indexing operations are performed. Updating index causes previous search results to expire.

© 2003-2007, 1060 Research Limited. 1060 registered trademark, NetKernel trademark of 1060 Research Limited.