This document has the following sections:
What is an Indexer?
Goal of an indexer
The main goal of an indexer is to create and setup some resource
automatically. The resources can be created depending on their
name or their extension. Once the resource has been created, the
indexer is also in charge of attaching the right frames to this
resource, like the HTTP frame, the filters and so on.
Each
DirectoryResource (and subclasses) is associated to an indexer,
if no indexer is specified the DirectoryResource is associated
to the "default" indexer.
Description of an indexer
- Class and attributes of an indexer
- Class:
- Usually, the indexer's class is
org.w3c.tools.resources.indexer.SampleResourceIndexer
- Identifier
- The name of the indexer, ex: "icons"
- Last Modified
- Unused, but resent as, internally, it is a resource.
- Super Indexer
- The name of the parent indexer used when the current indexer
fails to index. By default, the super indexer is the "default"
indexer.
- The sons of an indexer
- directories
- Used to index files matching exactly a name, mainly used to
index directories. You can specify that an "Icons" directory
will always be negotiable, for example. The default name (ie:
matching all directory names) is "*default*"
- extensions
- Used to index files with a specific extension. For example,
"html" is a FileResource with an HTTPFrame set to give the
"text/html" content type to this file. Then all the "foo.html"
files will be indexed as "text/html" type object when accessed
by HTTP. The default extension (ie: matching all the extension
names) is "*default*". To index files with no extensions, you
must use the name "*noextension*".
- content-types (only for the Content Type Indexer)
- In some cases the file extension is not the only criteria,
for example when a PUT request occurs the indexer should use
the Content-Type header coming with the request (if there is
a content-type header). This is the job of the Content Type
Indexer. The Content Type Indexer
(org.w3c.jigsaw.indexer.ContentTypeIndexer), has one more
child, the content-types node. The associations between mime
types and resources are stored in this new child.
Since 2.0.2 the ContentTypeIndexer accept generic mime
types like text:*, *:xml or even
*:*. For example, if you define text:* as a
FileResource using a HTTPFrame (with a content-type set to
*none*) all content types like text/html, text/plain,
text/xml will be accepted.
Note: The mime types stored in the indexer are not
"real" mime types, the '/' has been replaced by a ':'. We
decided that because the '/' can create some conflicts with
the URLs in Jigsaw.
You can find a sample indexer configuration in this page.
Indexers in JigAdmin
The Indexers Space is exactly the same thing than the Documents Space except that
indexers classes are available in the "Available Resources"
window. You are still able to add, delete, configure resources
and frames but only in the indexers nodes (directories,
extensions and sometimes content-types). Of
course, you can also create new indexers (under the
Indexers node).