11.6. Search Tutorial

This section provides useful tutorials for the search service.

11.6.1. Making a domain object searchable

Making a domain object in an application searchable requires implementing and registering a metadata provider class. This adapter pattern enables the search service to extract the metadata it requires from domain objects without requiring any specific knowledge of the application domain. The APIs required to achieve this task are provided by a handful of classes from the com.arsdigita.search package:

11.6.1.1. Core search metadata

The first task is to implement the MetadataProvider.java interface, providing the core search metadata - the object's locale, title and summary blurb. All the methods in this interface have a DomainObject as the first argument. The object supplied as the value for this argument will be an instance (or subtype) of the object type specified when registering the provider with the MetadataProviderRegistry.

package com.example.binder;

import com.arsdigita.search.MetadataProvider;
import com.arsdigita.domain.DomainObject;
import com.arsdigita.util.StringUtils;

import java.util.Locale;

public class NoteMetadataProvider implements MetadataProvider {

     public Locale getLocale(DomainObject dobj) {
         Note note = (Note)dobj;
         // Note doesn't store the locale yet, so we just return null
         return null;
     }

     public String getTitle(DomainObject dobj) {
         Note note = (Note)dobj;
         return note.getTitle();
     }

     public String getSummary(DomainObject dobj) {
         Note note = (Note)dobj;
         // Truncate body text & remove any HTML.
         return StringUtils.truncateString(note.getBody(),
                                           200,
                                           true);
     }

     ...

Example 11-10. Core search metadata

11.6.1.2. Supplementary search metadata

The next set of optional methods allow for provision of auditing data that will be useful for narrowing search results, namely creation and modification date and the party who performed these two actions. This information can be trivially provided if the domain object already using com.arsdigita.auditing package:

    ...

    public Date getCreationDate(DomainObject dobj) {
         Note note = (Note)dobj;
         BasicAuditTrail audit = BasicAuditTrail.retrieveForACSObject(note);
         return audit == null ? null : audit.getCreationDate();
    }

    public Party getCreationParty(DomainObject dobj) {
         Note note = (Note)dobj;
         BasicAuditTrail audit = BasicAuditTrail.retrieveForACSObject(note);
         return audit == null ? null : audit.getCreationParty();
    }

    public Date getLastMofidiedDate(DomainObject dobj) {
         Note note = (Note)dobj;
         BasicAuditTrail audit = BasicAuditTrail.retrieveForACSObject(note);
         return audit == null ? null : audit.getLastModifiedDate();
    }

    public Party getLastMofidiedParty(DomainObject dobj) {
         Note note = (Note)dobj;
         BasicAuditTrail audit = BasicAuditTrail.retrieveForACSObject(note);
         return audit == null ? null : audit.getLastModifiedParty();
    }

    ...

Example 11-11. Supplementary search metadata

11.6.1.3. Content provision

The remaining method to be implemented in the MetadataProvider interface is the one that actually provides the searchable content. There are currently three formats in which searchable content can be provided, each of which are identified by a static constant in the com.arsdigita.search.ContentType class. Each search indexer will support a different set of formats, so applications should implement as many formats as make sense.

  • TEXT - plain text, with no markup. Equivalent to text/plain mime type.

  • RAW - an arbitrary document format such as HTML, OpenOffice, PDF, RTF. The indexer will auto detect which format and extract content for building the search index.

  • XML - a well formed XML document containing arbitrary elements. The indexer will extract content from the elements, keeping track of its XPath, enabling searches to be restricted by element name.

The Lucene implementation in WAF supports the text content type, and InterMedia supports the RAW and XML formats. The ContentProvider interface defines the API for providing content. The Note object allows its body text attribute to store HTML, so there can be two implements of ContentProvider, one for TEXT, and the other for HTML. These are typically written as package private, inner classes of the MetadataProvider implementation

    ...

    class TextContentProvider implements ContentProvider {
        privte Note m_note;

        public TextContentProvider(Note note) {
          m_note = note;
        }

        public String getTag() {
          return "Body Text";
        }

        public ContentType getType() {
          return ContentType.TEXT;
        }

        public byte[] getBytes() {
          String body = m_note.getBody();
          // Strip out html tags
          return StringUtils.htmlToText(body).getBytes();
        }
    }

    ....

Example 11-12. Text content provider

    ...

    class HTMLContentProvider implements ContentProvider {
        privte Note m_note;

        public HTMLContentProvider(Note note) {
          m_note = note;
        }

        public String getTag() {
          return "Body Text";
        }

        public ContentType getType() {
          return ContentType.RAW;
        }

        public byte[] getBytes() {
          String body = m_note.getBody();
          // Wrap the body HTML in a header and footer
          return ("<html><head><title>" + 
            m_note.getTitle() + 
            "</title></head><body>" +
            body + 
            "</body></html>").getBytes();
        }
    }

    ...

Example 11-13. Raw content provider

Finally to the getContent method in the MetadataProvider interface is used to wire in the supported ContentProvider implementations. This method returns an array of ContentProvider objects for each type, for example, allowing an object which '0..n' file attachments to return each file as raw content.

         ...

    public ContentProvider[] getContent(DomainObject dobj,
                                        ContentType type) {
        Note note = (Note)dobj;
        
        if (type == ContentType.TEXT) {
          return new ContentProvider[] {
             new TextContentProvider(note)
          };
        } else if (type == ContentType.RAW) {
          return new ContentProvider[] {
             new HTMLContentProvider(note)
          };
        } else {
          return null;
        }
    }

Example 11-14. Wiring up the content providers

11.6.1.4. Metadata provider registration

The search service maintains a registry of which metadata provider adapters to use for each object type. To activate searching of the Note domain object, its adapter must be registered. The registration process is typically done upon startup from the init(DomainInitEvent e) method of com.arsdigita.runtime.Initializer. When determining which adapter to use for extracting metadata, the search service will travel up the object type inheritance tree until it finds a registered adapter. Thus if an application has a number of object types all inheriting from a common parent, it may be sufficient to register the metadata provider against the parent type.

package com.example.binder;

import com.arsdigita.runtime.CompoundInitializer;
import com.arsdigita.runtime.DomainInitEvent;

import com.arsdigita.search.MetadataProviderRegistry;

public class NoteInitializer extends CompoundInitializer {

    public void init(DomainInitEvent evt) {
        super.init(evt);

        MetadataProviderRegistry.registerAdapter(
           Note.BASE_DATA_OBJECT_TYPE,
           new NoteMetadataProvider());
    }
}

Example 11-15. Registering a provider

11.6.2. Creating a search user interface

This section of the tutorial outlines the steps required to add user interface for performing searches of content within an application. The com.arsdigita.search.ui package contains a library of components which serve as the building blocks for an application's search UI. The main files of interest are:

11.6.2.1. Basic search form

The first step in building a search form is to create a form containing a text entry widget for the query string. The QueryComponent class is a general purpose, abstract Bebop container implementing the QueryGenerator interface. This interface provides a mechanism for retrieving a query specification using the current page state without requiring any knowledge about the structure of the data entry form (for that matter, there might not be any form at all!).

The BaseQueryComponent class is a simple subclass of QueryComponent providing a widget for specifying the query string and APIs for adding filter components (more on these later). This is the component to start with when creating a search UI for an application. The only pre-requisite is that is be contained within a Bebop form.

       QueryComponent query = new BaseQueryComponent();

       Form form = new Form("search");
       form.add(m_query):
       form.add(new Submit("Search notes"));

       add(m_query);

Example 11-16. Creating a basic search form

11.6.2.2. Displaying results

The next stage is to add functionality for processing the query specification and displaying the results. The ResultsPane takes care of both of these tasks, only requiring an implementation of the QueryGenerator interface be passed into its constructor, which of course is already fulfilled by the QueryComponent class.

       ResultPane results = new ResultPane(query);
       add(results);

Example 11-17. Adding a result pane

11.6.2.3. Filtering results

Since search is a system wide service, the result of the previous stage is a component that searches every single object type for which there is a search metadata provider registered. For internal application search, a filter is required to restrict results to one or more object types. There is a choice of two classes that can be used for this task, both part of the com.arsdigita.search.ui.filters package.

  • ObjectTypeFilterComponent a static component that restricts to the list of object types passed into its constructor.

  • ObjectTypeFilterWidget a dynamic widget that presents the list of object types passed into its constructor in a form, enabling the user to restrict the search.

For the binder application, there is only a single object type, Note, so the ObjectTypeComponent is most appropriate.

       FilterComponent objTypeFilter = 
          new ObjectTypeFilterComponent(Note.BASE_DATA_OBJECT_TYPE));
       query.add(objTypeFilter);

Example 11-18. Filtering by object type

11.6.2.4. Note search component

These three stages now all combine to form a NoteSearchComponent that can be dropped into pages in the Binder application (or indeed any other application wishing to Note objects.

package com.example.binder.ui;

import com.example.binder.Note;

import com.arsdigita.search.ui.BaseQueryComponent;
import com.arsdigita.search.ui.QueryComponent;
import com.arsdigita.search.ui.ResultPane;
import com.arsdigita.search.ui.FilterComponent;
import com.arsdigita.search.ui.ObjectTypeFilterComponent;

import com.arsdigita.bebop.Form;
import com.arsdigita.bebop.SimpleContainer;

import com.arsdigita.bebop.form.Submit;

public class NoteSearhComponent extends SimpleContainer {

    private QueryComponent m_query;
    private FilterComponent m_typeFilter;
    private ResultsPane m_results;

    public NoteSearchComponent() {
       super("note:search", Note.XML_NS);

       m_typeFilter = new ObjectTypeFilterComponent(Note.BASE_DATA_OBJECT_TYPE);
       m_query = new BaseQueryComponent();
       m_query.add(m_typeFilter);

       m_results = new ResultsPane(m_query);

       Form form = new Form("search");
       form.add(m_query):
       form.add(new Submit("Search notes"));

       add(m_query);
       add(m_results);
    }
}

Example 11-19. Completed search component

11.6.3. Providing a new Query engine

This section of the tutorial outlines the steps required to implement a new query engine for performing searches against an external index.

MORE NEEDED HERE

11.6.4. Providing a new Search indexer

This section of the tutorial outlines the steps required to implement an search indexer which receives update notifications when a domain object is changed, retrieves metadata and content and builds an index.

From Dan ... to manually hack the PL/SQL for the 'sync_index' method and remove the keyword 'online' from the code. This will do offline rebuilds of the search index. The potential problem with this is that this locks out any other threads touching the search index - including end user searches & metadata updates from authors editing content. As such it could severely impact on scalability, so EE is definitely preferable to this.

MORE NEEDED HERE