org.apache.nutch.clustering
Interface OnlineClusterer

All Known Implementing Classes:
Clusterer

public interface OnlineClusterer

An extension point interface for online search results clustering algorithms.

By the term online search results clustering we will understand a clusterer that works on a set of HitDetails retrieved for a user's query and produces a set of HitsCluster that can be displayed to help the user gain insight in the topics found in the result.

Other clustering options include predefined categories and off-line preclustered groups, but I do not investigate those any further here.

Version:
$Id: OnlineClusterer.java,v 1.1 2004/08/09 23:23:52 johnnx Exp $
Author:
Dawid Weiss

Field Summary
static String X_POINT_ID
          The name of the extension point.
 
Method Summary
 HitsCluster[] clusterHits(HitDetails[] hitDetails, String[] descriptions)
          Clusters an array of hits (HitDetails objects) and their previously extracted summaries (Strings).
 

Field Detail

X_POINT_ID

public static final String X_POINT_ID
The name of the extension point.

Method Detail

clusterHits

public HitsCluster[] clusterHits(HitDetails[] hitDetails,
                                 String[] descriptions)
Clusters an array of hits (HitDetails objects) and their previously extracted summaries (Strings).

Arguments to this method may seem to be very low-level, but in fact they are side products of a regular search process, so we simply reuse them instead of duplicating part of the usual Nutch functionality. Other ideas are welcome.

This method must be thread-safe (many threads may invoke it concurrently on the same instance of a clusterer).

Returns:
A set of HitsCluster objects.


Copyright © 2006 The Apache Software Foundation