MainOverviewWikiIssuesForumBuildFisheye

Chapter 8. JSEM - JSOM to Search Engine Mapping

8.1. Introduction

Compass provides the ability to map JSON to the underlying Search Engine through simple XML mapping files, we call this technology JSEM (JSON to Search Engine Mapping). The JSEM files are used by Compass to extract the required JSON elements at run-time and inserting the required meta-data into the Search Engine index. Mappings can be done explicitly for each JSON element, or let Compass dynamically add all JSON elements from a certain JSON element recursively.

Lets start with a simple example. The following is a sample JSON that we will work with:

{   
    "id": 1,
    "name": "Mary Lebow",
    "address": {
      "street": "5 Main Street"
      "city": "San Diego, CA",
      "zip": 91912,
    },
    "phoneNumbers": [
      "619 332-3452",
      "664 223-4667"
    ]
  }
}

Now, lets see different ways of how we can map this JSON into the search engine. The first option will be to use fully explicit mappings:

<root-json-object alias="addressbook">
    <json-id name="id" />
    <json-property name="name" />
    <json-object name="address">
        <json-property name="street" />
        <json-property name="city" />
        <json-property name="zip" index="not_analyzed" />
        <json-array name="phoneNumbers" index-name="phoneNumber">
            <json-property />
        </json-array>
    </json-object>
</root-json-object>

Here is the same mapping configuration using JSON:

{
  "compass-core-mapping" : {
    "json" : [
      {
        alias : "addressbook",
        id : {
          name : "id"
        },
        property : [
          {name : "name"}
        ],
        object : [
          {
            name : "address",
            property : [
              {name : "street"},
              {name : "city"},
              {name : "zip", index : "not_analyzed"},
            ]
            array : {
              name : "phoneNumbers"
              "index-name" : "phoneNumber",
              property : {}
            }
          }
        ]
      }
    ]
  }
}

Here is the same mapping configuration using programmatic builder API:

import static org.compass.core.mapping.jsem.builder.JSEM.*;

conf.addMapping(
    json("addressbook")
        .add(id("id"))
        .add(property("name"))
        .add(object("address")
                .add(property("street"))
                .add(property("city"))
                .add(property("zip").index(Property.Index.NOT_ANALYZED))
                .add(array("phoneNumbers").indexName("phoneNumber").element(property()))
        )
);

The above explicit mapping defines how each JSON element will be mapped to the search engine. In the above case, we will have several searchable properties named after their respective JSON element names (the name can be changed by using index-name attribute). We can now perform search queries such as street:diego, or phoneNumber:619*, or even (using dot path notation): addressbook.address.street:diego.

Many times though, explicit mapping of all the JSON elements is a bit of a pain, and does not work when wanting to create a generic indexing service. In this case, Compass allows to dynamically and recursively map JSON element. Here is an example where the JSON address element is mapped dynamically, thus adding any element within it dynamically to the search engine:

<root-json-object alias="addressbook">
    <json-id name="id" />
    <json-property name="name" />
    <json-object name="address" dynamic="true" />
</root-json-object>

The dynamic aspect can even be set on the root-json-object allows to create a completely generic JSON indexing service which requires only setting the id JSON element.

Now, in order to index, search, and load JSON objects, we can use the JsonObject API abstraction. Here is a simple example that uses a JsonObject implementation that is bundled with Compass called JSONObject and is based on the json.org site:

JsonObject jsonObject = new DefaultAliasedJSONObject("addressbook", "json string goes here");
// this will index the provided JSON
session.save(jsonObject);

// now we can load the Resource that represents it
Resource resource = session.loadResource("addressbook", 1);
resource.getValue("name"); // will get Mary Lebow

// we can also get back the JSON content and actual object when using content mapping (see later)
jsonObject = (JsonObject) session.load("addressbook", 1);

// Last, we can search
CompassHits hits = session.find("mary");
hits.lenght() // will print one
resource = hits.resource(0);
jsonObject = (JsonObject) hits.data(0);

8.2. JSON API Abstraction

Since there is no single object based API when working with JSON Compass has an interface based abstraction that can be used with any JSON implementation available. The APIs can be found under org.compass.core.json and include JsonObject, AliasedJsonObject, and JsonArray. Compass comes with several built in implementations. The first is taken from json.org site and is bundled with Compass under org.compass.core.json.impl. The second supports Grails JSON objects and has wrapper implementations around it under org.compass.core.json.grails. Another supports Jettison JSON objects and has a wrapper implementation around it under org.compass.core.json.jettison. The last (and probably the fastest) support jackson based JSON objects and is implemented under org.compass.core.json.jackson.

Implementing support for another framework that bundles its own JSON object based implementation should be very simple. It should basically follow the API requirements (probably by wrapping the actual one). The jettison implementation can be used as a reference implementation on how this can be done.

8.3. Content Mapping

By default, when mapping JSON using Compass, the JSON content itself is not stored in the search engine. If the JSON content itself is not store, then when searching and getting back results from the search engine only Resource can be used. Within the mapping definition, the actual JSON content can be store. This allows to get the JSON itself from the search engine using the Resource API, as well as converting back the search results into the actual JSON object (jackson, jettison, grails, the default, or a custom one).

The following mapping definition shows how to map JSON to also store its content:

<root-json-object alias="addressbook">
    <json-id name="id" />
    <json-property name="name" />
    <json-object name="address" dynamic="true" />
    <json-content name="content" />
</root-json-object>

This will cause Compass to store the actual JSON content under a Resource Property named content. Here is an example of how it can be retrieved back from the search engine:

Resource resource = session.loadResource("addressbook", 1);
resource.getValue("content"); // will get actual json string

In order to convert back to the actual JSON object, a converter instructing Compass how to convert the JSON string back to your favorite JSON object model should be registered with Compass. For example, to register the jettison based converter, the setting named compass.jsem.contentConverter.type should be set to org.compass.core.json.jettison.converter.JettisonContentConverter. In order to register the grails converter the setting should be set to org.compass.core.json.grails.converter.GrailsContentConverter. In order to register the jackson converter the setting should be set to org.compass.core.json.jackson.converter.JacksonContentConverter. And, in order to use the default build in Compass implementation the setting should be set to org.compass.core.json.impl.converter.DefaultJSONContentConverterImpl.

By default, the content converter registered with Compass is the default one.

8.4. Raw Json Object

When configuring a JSON content converter, Compass now knows how to convert a JsonObject to and from a JSON string. This allows to use Compass RawJsonObject and RawAliasedJsonObject. The raw json objects are simple JSON string holders that are then converted by Compass automatically when saved into the preferred JsonObject. This simplifies saving JSON objects by just constructing the raw objects based on json strings. Here is an example:

JsonObject jsonObject = new RawAliasedJsonObject("addressbook", "json string goes here");
// this will index the provided JSON
session.save(jsonObject);

8.5. Mapping Definitions

8.5.1. root-json-object

The root mapping of JSON. Maps to a JSON object.

<root-json-object
      alias="aliasName"
      analyzer="name of the analyzer"
      dynamic="false|true"
      dynamic-naming-type="plain|full"
      spell-check="optional spell check setting"
/>
    all?,
    sub-index-hash?,
    json-id*,
    (json-analyzer?),
    (json-boost?),
    (json-property|json-array|json-object)*,
    (json-content?)

Table 8.1. root-json-object mapping

AttributeDescription
aliasThe name of the alias that represents the JsonObject.
sub-index (optional, defaults to the alias value)The name of the sub-index that the alias will map to.
analyzer (optional, defaults to the default analyzer)The name of the analyzer that will be used to analyze ANALYZED properties. Defaults to the default analyzer which is one of the internal analyzers that comes with Compass. Note, that when using the json-analyzer mapping (a child mapping of root json object mapping) (for a json element that controls the analyzer), the analyzer attribute will have no effects.
dynamic (optional, default to false)Should unmapped json elements be added to the search engine automatically (and recursively).

8.5.2. json-id

The JSON element within the json object that represents the id of the resource

<json-id
    name="the name of the json id element"
    value-converter="value converter lookup name"
    converter="converter lookup name"
    format="an optional format string"
    omit-norms="true|false"
    spell-check="spell check setting"
/>

Table 8.2. json-id mapping

AttributeDescription
nameThe name of the JSON element within the JSON object that its value is the id of the element/resource.
value-converter (optional, default to Compass SimpleJsonValueConverter)The global converter lookup name registered with the configuration. This is a converter associated with converting the actual value of the json-id. Acts as a convenient extension point for custom value converter implementation (for example, date formatters). SimpleJsonValueConverter will usually act as a base class for such extensions. The value of this converter can also reference one of Compass built in converters, such as int (in this case, the format can also be used).
converter (optional)The global converter lookup name registered with the configuration. The converter will is responsible to convert the json-id mapping.

8.5.3. json-property

The JSON element within the json object that represents a property of the resource

<json-property
    name="the name of the json id element"
    index-name="the name it will be store under, default to the element name"
    naming-type="plain|full"
    store="yes|no|compress"
    index="analyzed|not_analyzed|no"
    omit=-norms="false|true"
    null-value="a value to store the index in case the element value is null"
    boost="boost value for the property"
    analyzer="name of the analyzer"
    reverse="no|reader|string"
    override="true|false"
    exclude-from-all="no|yes|no_analyzed"
    value-converter="value converter lookup name"
    format="a format string for value converters that support this"
    converter="converter lookup name"
    spell-check="spell check setting"
/>

Table 8.3. json-property mapping

AttributeDescription
nameThe name of the JSON element within the JSON object that its value is the property name of the element/resource.
index-name (optional, defaults to the element name)The name of the resource property that will be stored in the index. Defaults to the element name.
store (optional, defaults to yes)If the value of the xml property is going to be stored in the index.
index (optional, defaults to analyzed)If the value of the xml property is going to be indexed (searchable). If it does, than controls if the value is going to be broken down and analyzed (analyzed), or is going to be used as is (not_analyzed).
boost (optional, defaults to 1.0f)Controls the boost level for the xml property.
analyzer (optional, defaults to the xml mapping analyzer decision scheme)The name of the analyzer that will be used to analyze ANALYZED json property mappings defined for the given property. Defaults to the json mapping analyzer decision scheme based on the analyzer set, or the json-analyzer mapping.
exclude-from-all (optional, default to no)Excludes the property from participating in the "all" meta-data. If set to no_analyzed, not_analyzed properties will be analyzed when added to the all property (the analyzer can be controlled using the analyzer attribute).
override (optional, defaults to false)If there is another definition with the same mapping name, if it will be overridden or added as additional mapping. Mainly used to override definitions made in extended mappings.
reverse (optional, defaults to no)The meta-data will have it's value reversed. Can have the values of no - no reverse will happen, string - the reverse will happen and the value stored will be a reversed string, and reader - a special reader will wrap the string and reverse it. The reader option is more performant, but the store and index settings will be discarded.
value-converter (optional, default to Compass SimpleJsonValueConverter)The global converter lookup name registered with the configuration. This is a converter associated with converting the actual value of the json-property. Acts as a convenient extension point for custom value converter implementation (for example, date formatters). SimpleJsonValueConverter will usually act as a base class for such extensions. The value of this converter can also reference one of Compass built in converters, such as int (in this case, the format can also be used).
converter (optional)The global converter lookup name registered with the configuration. The converter will is responsible to convert the json-property mapping.

8.5.4. json-object

Maps to an embedded JSON object.

<json-object
    name="the name of the json object element"
    converter="optional converter lookup name"
    dynamic="false|true"
    dynamic-naming-type="plain|full"
/>
    (json-property|json-array|json-object)*

Table 8.4. json-object mapping

AttributeDescription
nameThe name of the json object element. Not required when mapping json-object within the json-array.
dynamic (optional, default to false)Should unmapped json elements be added to the search engine automatically (and recursively).

8.5.5. json-array

Maps to an embedded JSON array.

<json-array
    name="the name of the json object element"
    index-name="optional, the name of the internal mapping will be stored under"
    converter="optional converter lookup name"
    dynamic="false|true"
    dynamic-naming-type="plain|full"
/>
    (json-property|json-array|json-object)*

Table 8.5. json-array mapping

AttributeDescription
nameThe name of the json array element.
index-nameThe name of the json array internal mapping will be store under. Note, when using json array, there is no need to name its internal element, it is controlled by the json-array name/index-name.
dynamic (optional, default to false)Should unmapped json elements be added to the search engine automatically (and recursively).

8.5.6. json-content

Maps the actual JSON string into a resource property to be store in the search engine.

<json-content
    name="the name to store the json content under"
    store="yes|compress"
    converte="optional converter lookup name"
/>

Table 8.6. json-content mapping

AttributeDescription
nameThe name to store the JSON string under in the resource.
storeHow the JSON content will be stored. yes for plain storing, compress for compressed storing.

8.5.7. json-boost

Declaring a dynamic boost mapping controlling the boost level using the json-boost element.

<json-boost
    name="the json element that holds the boost value"
    default="the boost default value when no property value is present"
    converter="converter lookup name"
/>

Table 8.7. json-boost mapping

AttributeDescription
nameThe name of json element that its value will be used as the boost value.
default (optional, defaults to 1.0)The default boost value if no value is found.

8.5.8. json-analyzer

Declaring an analyzer controller property using the json-analyzer element.

<json-analyzer
    name="the json element that holds the analyzer value"
    null-analyzer="analyzer name if value is null"
    converter="converter lookup name"
/>

Table 8.8. json-analyzer mapping

AttributeDescription
nameThe name of json element that its value will be used as the analyzer lookup value.
null-analyzer (optional, defaults to error in case of a null value)The name of the analyzer that will be used if the property has a null value.

The analyzer json property mapping, controls the analyzer that will be used when indexing the JsonObject. If the mapping is defined, it will override the json object mapping analyzer attribute setting.

If, for example, Compass is configured to have two additional analyzers, called an1 (and have settings in the form of compass.engine.analyzer.an1.*), and another called an2. The values that the xml property can hold are: default (which is an internal Compass analyzer, that can be configured as well), an1 and an2. If the analyzer will have a null value, and it is applicable with the application, a null-analyzer can be configured that will be used in that case. If the resource property has a value, but there is not matching analyzer, an exception will be thrown.