Nuxeo Core Documentation

29.1. TODO: BS

Start from the previous documentation (OOo) and update it. :-)

This chapter targets developers that would like to use directly Nuxeo Core.

29.2. Overview

Nuxeo Core is the foundation of the Nuxeo ECM project. It defines and provides all the basic services and functionalities needed to build a complete ECM platform and applications:

a repository model,
schema and document type management,
a query service,
a security model,
a document life cycle service,
a flexible core event service.

Like every Nuxeo ECM component, Nuxeo Core is running on top of Nuxeo Runtime which defines an OSGi-compatible component model.

29.2.1. Main goals

The main goals of Nuxeo Core are:

to provide the common services needed to build a state-of-the-art ECM platform,
to be accessible both remotely or locally (i.e: to provide a common API accessible both from a remote JVM or directly on the local one),
to be deployable anywhere without any modification (through Nuxeo Runtime): in a Java EE application server like JBoss, or embedded in a desktop application like an RCP Eclipse application,
to be extensible and flexible; this is inherited from Nuxeo Runtime which provides an extensible component model.

29.2.2. Nuxeo Core Components

Nuxeo Core is composed by the following components:

NXCore: core ECM model, services and default implementation
NXCoreAPI: defines a client API for NXCore
NXCoreFacade: Java EE facade
NXJCRConnector: JCR storage backend that leverages jackrabbit this is the default NXCore storage backend

All these components are running on top of Nuxeo Runtime.

29.3. Nuxeo Core Architecture

The Nuxeo Core top level components are all roughly following the same style of development, which is structured in three layers:

model layer,
implementation layer,
facade layer.

There are also a number of services used by top-level components to provide them with common functionalities like the schema service, query service, life cycle or security. These services are simple and cannot operate on their own – they need a context to operate on. These services are exposed through top-level components and may not follow the layering presented below.

29.3.1. Model Layer (or Internal API)

Top level components provide a model (an API) that is internal to the Core – this means that they are not directly accessible from remote JVMs and should not be directly used by clients.

The model provides a generic API that defines the concepts used by the service and that may have several implementations (using different storage backends for example).

Usually this API cannot be accessed remotely since implementations may use local resources that cannot be sent over the network.

For example, the Repository model defines objects like Document, Property, Session, etc. The JCR-based implementation for the Repository model is directly wrapping JCR (Jackrabbit) nodes that cannot be detached from the local JVM and sent over the network.

29.3.2. Implementation Layer

Each service may have one or more implementations for their model. For example, the Repository service may have several implementation for the model it defines – this could be a JCR-based implementation, an SQL-based one, or something else. The same goes for the Directory service, it defines a model that could have an SQL-based implementation or an LDAP-based one.

Implementations may use very specific resources and configuration, and are hidden by the common model defined by the service. This means that implementation-specific objects or APIs are never used directly by other Core components, they are only accessed by the implementation of the internal API.

29.3.3. Facade Layer (or Public API)

On top of their model, components usually define a facade layer that enables external clients to remotely access service implementations.

This layer is also named the Public API because it defines the API exposed to clients. Any client, local or remote, must use the public API of the component, and must not make calls to the internal API.

The main requirement of the public API is to use only serializable objects that can be sent over the network and reconstructed on the client machine.

29.3.4. Deployment

The architecture presented above makes it possible to access the Core services when the Core is running inside the same JVM as the client application (e.g., when embedded in a desktop application) but also when it is on a remote JVM (e.g., deployed as a module inside an application server). In both cases the Core services are accessed in the same way – through the public API.

29.3.4.1. Local Access

29.3.4.2. Remote Access

29.3.5. Client Session

Usually a client opens a session on a Core service through the facade and can then send requests to the Core service until it closes the session.

While a client is connected to a Core service, the latter should track the client session and restore its state (if any) at each client request. When the session is closed by the client, the Core service releases any resource held by that session.

Any data passed between the client and the Core service is serializable and so it can safely be sent over the network. In this way a client can operate identically when running on the same JVM or when running on a remote one.

29.4. The Repository Model

The repository model is the main functionality provided by the Core; it represents the very raison d'être of the Core. Most of the other Core services were written as auxiliary components to perform specific needs of the repository model or to enrich it.

The repository model, as its name suggest, is describing a software component for managing repositories of documents. Repositories store documents in a tree-like structure that enables grouping documents inside folders in an hierarchic manner.

Besides storage, the repository provides functionalities like:

document versioning,
security management,
document life cycle,
annotations,
SQL-like query.

29.4.1. Document and Schemas

Documents are structured objects described by a set of properties. These properties may be used to store document meta-data (e.g., creation date, author, state, etc.) or the document data itself (e.g., binary or text files, attachments, etc.).

The properties that a document may have and their types and constraints are defined through several schemas.

The repository model natively supports XML Schemas to define document schemas.

A schema is therefore the way the structure and contents of a document is defined. Through schemas you can usually specify things like:

what properties are allowed,
the type of each property,
the default value of the property, if any,
the restrictions on the property values, if any,
whether a property is mandatory or not.

In order to create and use documents, you first need to define their structure. For this, you have to define a document type. Then you can create instances of documents of this type.

In some ways, document types and schemas are similar to Java classes and interfaces. A document type may implement some schemas in the same way that Java classes implement interfaces, and a document type can extend another document type in the same way that a Java class can extend another class.

Document Types define one or more schemas that the document structure must satisfy and some other extra properties like facets which will be discussed later.

In conclusion, the unit of work in a repository is the document. To create a new document, you must specify the document type and a path. You can either use existing document types or register new types as we will see in the Extension Points section.

For more information on document types and schemas see the section XXX.

29.4.2. Document Facets

A Facet is a behavioral property of a document. As schemas define the document structure and content, facets are used to describe behaviors or capabilities of a document.

For now, facets are simple strings attached to a document type to specify a capability for documents of that type. In the future, facets may evolve to more complex structures, for example to dynamically provide interfaces to manipulate documents according to a capability they offer.

Currently the Core defines two facets:

Folderish: adds folder capability to a document, so that it can have zero or more children documents,
Versionable: adds versioning capabilities to a document.

29.4.3. Document Annotations

As we've seen, a document structure and content is strictly defined by the schemas its type implements. But there are many situations where some application-specific data need to be dynamically attached to the document and retrieved later without having to modify the document schemas.

This is very useful for repository extensions that needs to store placeful (i.e., location-sensitive) information on a document – information that cannot be specified by any document schema since its type is not necessarily known in advance.

Annotations are not required to be stored through the same data storage as the document itself. For example one may choose to store document in a Jackrabbit-based repository and to store annotations in a dedicated SQL database.

These annotations usually keep some internal state or data about the document. For example, a tool that may use annotations is the workflow service.

29.4.4. Document Access Control

Usually, manipulating documents requires a set of privileges to be granted to the current user. Privileges given to a user over a document are very dependent on the current context and on the document itself.

Usually, privileges depend on:

the document location (i.e., privileges are placeful),
the access rules defined on document parents in the hierarchy,
the document state,
and generally on any rule that was defined over a particular location on the document parents.

Privileges are a standard example of extra information that need to be stored on the document in a placeful manner, so it may be a perfect candidate for the annotation service.

But since privileges are very dynamic and may require expensive computations on every document that is accessed, a separate Security Service exists to manage the storage as it sees fit - and not necessarily through annotations on the document. This is more efficient from a performance point of view.

In the following subsections, we will see what type of information is stored on the document to enforce security and how security checks are done. To ease comprehension of security concepts and evaluation, we will begin the presentation from the smallest unit of security information to the largest one that is stored at the document level.

29.4.4.1. Access Control Entry (ACE)

This is the smallest unit specifying a security rule. It is a very simple object containing three fields:

principal: an authenticated entity. For example the user that opened the session on the repository is a principal – but a principal may also be a group of users.
permission: the kind of action that may be granted or denied for a principal. This may also be a group of permissions. This corresponds to the Java concept of privilege.
granting: specifies whether the given permission is granted or denied to the given principal.

Examples:

DENY, John, Read: an access entry that specifies that the reading is denied for the principal John.
GRANT, Developers, Drink: an access entry that specifies that drinking is granted for any principal from the developer group.

29.4.4.2. Access Control List (ACL)

An ACL is an ordered list of ACEs. This means it represents a set of access rules. Why ordered? Because usually when evaluating access rules the order is important. This is because evaluation stops on the first DENY or GRANT rule that match the criteria check.

Here is a simple example showing how ordering may influence the security checks. Suppose that we have a principal John that belongs to the Readers group, and an ACL that contains the following two ACEs:

DENY, John, Read
GRANT, Readers, Read

Suppose we want to check whether principal John is granted reading. Every entry in the ACL is checked (in the order they were defined) and if an entry matches the security check the evaluation stops. Using the example above, John will be denied reading even if it is a member of the Readers group. But if you swap the order of ACEs in the ACL, John will be granted reading.

29.4.4.3. Access Control Policy (ACP)

An ACP is an ordered list of ACLs. Each ACL stored in the ACP is uniquely identified by a name. The ordering is important when security is checked – ACLs at the beginning of the list will be checked first.

The ACP is the object containing the security information that is attached to a document.

Note that ACLs are inherited so that a document will inherit any defined ACLs from its parents in the hierarchy. Inherited ACLs are evaluated after evaluating the local ACLs and from the nearest parent to the remotely related parent.

You may wonder why an ACP is containing several ACLs? And what about ACL names? In a typical situation where security information may only be changed by an administrator through a user interface, a single ACL is enough.

But a complex application may have complex rules to set privileges according to the current document state or context. This is the case for a workflow engine which may decide to revoke or grant privileges depending on the document state or the context.

This means that access rules are changed not only by administrators but also by services like the workflow. To avoid collisions, every tool that needs to change access rules may use its own (named) ACL for setting these rules. If the workflow service considers that its rules are more important than the ones explicitly set by the administrator, it simply places its ACL before the one reserved for the administrator so that it will be evaluated first.

Currently there are two predefined ACLs:

local: the local ACL

The local ACL is the only ACL an administrator may explicitly change through the User Interface.
inherited: the inherited ACL

This ACL is computed each time a security check is performed (unless caching is used). The inherited ACL is the ACL obtained by merging all existing ACLs on the document's hierarchy. This ACL is appended to the ACL list, so it will be evaluated last.

So from a simple security unit like the ACE we end up with a sophisticated structure like inheritable ACPs.

These use cases are not artificial, they are real use cases that a mature ECM product should satisfy.

29.4.4.4. Evaluating Privileges

The evaluation mechanism has been described above. Here is an example of how an evaluation is done.

Let's say the principal John is trying to edit the document D. Editing a document requires the Write permission. Suppose the document D has the path /A/B/C/D – it is a child of the document C which is a child of the document B which is the child of the document A.

To decide if the principal John can edit this document the following steps are taken:

The merged ACP for the document D is computed. This ACP is the local ACP set on the document D merged with all parent ACPs. ACLs imported from the parents are appended to the local ACLs so that they will be evaluated at last.
Each ACL is evaluated in respect to the order defined by the ACP.
Each ACE is evaluated in respect to the order defined by the ACL.
If an ACE match a security rule regarding the principal John (or a group which it belongs) and the permission Write (or a permission group from which Write belongs) then the evaluation ends and the access right of the matching ACE is returned
If no matching ACE is found then the privilege is denied.

29.4.5. Life Cycle

Within organizations, documents are often regulated. At a given time, a document has a state or is within a phase. The way the document transitions in compliance with regulations from one state to another (or from one phase to another) is in most of the cases defined and managed by business processes or workflows.

Nuxeo Core itself doesn't embed a workflow engine, or still a BPM engine, as such. It only provides a generic way to define document life cycles, the way the document properties related the life cycle are stored and a way to specify which document types follow which life cycles at deployment time.

Thus, the workflow engine that will be deployed along with Nuxeo Core will leverage the API exposed by Nuxeo Core to set the life cycle properties.

The APIs defined in Nuxeo Core regarding life cycle are highly inspired from the JSR-283 specifications that are still in a draft state at the time of writing this document.

Another advantage of such a design is the fact that the life cycle state of a document will be independent of the application (i.e.: workflow variables) and will be embedded within the document itself at storage time, and thus will be exported along with the document properties.

Nuxeo provides a BPM engine that knows how to leverages the Nuxeo Core life cycle API. See http://www.nuxeo.org.

29.4.5.1. Example of document life cycle

Here is a typical lifecycle schema example: XXX TODO.

29.4.5.2. Life cycle definition

Nuxeo Core allows one to define life cycle using extension points. (See the Nuxeo Runtime documentation for more information about extension points.). You will find at the end of this document the complete list of extension points defined by the core, you will find an example of life cycle definition there using the life cycle definition extension point.

The life cycle model defined by Nuxeo Core is simple stateful, or state-transition engine. Including the following elements:

Life cycle definition
Life cycle state definition
Life cycle state transition definition

Again, here, no policy regarding transitions are specified. The workflow or BPM engine will deal with this. Here are the reasons:

It gives more flexibility regarding the policy that needs to be applied on the documents by letting dedicated BPM engines deal with that. Thus this is possible to choose which workflow engine to use for your application. (see NXWorkflow)
Current JCR specifications doesn't include a default policy model regarding life cycle so it appears logical to not include this ourself at this layer of the architecture
It simplifies the model

This is important to note that the life cycle definition is fully independent from the document types themselves which allows the reuse if life cycle for different document types.

29.4.5.3. Life Cycle Manager

The life cycle manager is responsible of the storage of the life cycle related properties. One could think of storing the life cycle property within the JCR, which is the default implementation provided by NXJCRConnector, or still one could think about storing it in a separated RDBMS apart from the content storage.

Because of this, Nuxeo provides an abstraction for this storage allowing one to define a life cycle manager per life cycle definition.

Let's take a look at the life cycle manager exposed by Nuxeo Core:

You can see that the interface is fairly simple. It basically, only specifies how to store and retrieve the state and the life cycle policy of a given document.

For an example of JCR storage see the JCRLifeCycleManager definition :

http://fisheye.nuxeo.org/browse/~raw,r=4233/nuxeo/ECMPlatform/NXJCRConnector/trunk/src/org/nuxeo/ecm/core/jcr/JCRLifeCycleManager.java

Note this is how the JSR-283 current specifications specifies the life cycle storage repository side.

You can register your own life cycle managers using the lifecyclemanager extension point defined on the Nuxeo Core side. See the extension points chapter of this document for an example.

29.4.5.4. Document types to life cycles mapping definition

When your life cycle definitions are defined and you did specify the life cycle managers which will take care of the storage you will then need to specify associations in between document types and life cycle.

To achieve this, Nuxeo Core defines an extension point allowing one to specify, independently from the document type definition, such an associations. Please, check the example at the end if this document.

29.4.5.5. Core life cycle service

Nuxeo Core defines a dedicated life cycle service that is used by the Nuxeo Core internals. This service is not exposed at the facade layer because we don't need it there. This service is manipulating directly the repository document themselves (not references and thus is not suitable for remoting purpose).

Actually, the document model itself has been extended so that you can directly invoke this service through the document session itself at facade layer. See next chapter for an overview of the API.

This service is defined under this namespace org.nuxeo.ecm.core.lifecycle.LifeCycleService.

29.4.5.6. The life cycle document API and the exposure at the facade layer

The document model exposes a life cycle related API. You can take advantage of this API from the document itself if you are working at core level. Here is the API:

29.4.5.7. Core events and listeners

Nuxeo Core defines a service dedicated to core events. This service is only responsible of core events and allows third party code to register listeners that will get notified when events occur (and that can take specific actions themselves).

This service doesn't take advantage of event service such as JMS or still the NXRuntime event service at this level because it needs ti be really fast at event processing to not decrease the repository performances for instance.

By using event listener extensions, you can hook up and bridge on another synchronous or asynchronous messaging systems. Let's take some examples.

Nuxeo Core defines a bridge to Nuxeo Runtime forwarding events on the NXRuntime event service in an asynchronous way. It defines like that a local event loop shared by all components running on top of NXRuntime.
The NXEvents component, not part of the Nuxeo Core, registers a JMS listener bridging Nuxeo Core events to a dedicated JMS topic. It allows message driven beans in the Nuxeo Enterprise Platform to get the Nuxeo Core events. (for instance NXAudit)

You could define whatever listeners you need to forward the Nuxeo Core events on an external messaging system. See the end of this document for an example of such a registration.

29.4.6. Query Engine

The query engine is designed to provide an SQL-like language, called NXQL, to perform document and directory queries.

NXQL offers standard SQL functionality to search records, but can also take advantage of the hierarchical nature of the content repository to provide path-based searches. NXQL is used as the uniform query syntax to access several kinds of repositories. The query engine itself must process and optimize the query, and dispatch it to the different backends and tables that are referenced in the query.

Updates or creation statements are not covered and must be performed through the repository API.

For more information about the query engine, refer to the document about NXQL.

29.4.7. The Public API

As we've seen the internal repository model is not remotely accessible. Because the Nuxeo Core deployment model requires supporting both local and remote clients, the APIs are separated between an internal API and a Public API, designed to fulfill the deployment needs. Any client should use the public API to connect to a Nuxeo Repository.

This public API has only one limitation: any object transferred between the client and the core must be serializable. This way it can be sent over the network and restored on the client side.

So the public API is in fact is a serializable view of the repository model. This has a performance drawback compared to the internal API since it should transform any model object like a Document into a serializable form, but has the benefit of being totally independent from the JVM where the Core runs.

The main interfaces composing the public API are the:

DocumentModel: the serializable view of a Document.
DataModel: the serializable view of a document subpart described by a schema.
CoreSession: a session to the Core repository.
CoreInstance: the gateway to the Core. It uses session factories to create new sessions (connections) to the Core.

29.4.7.1. DocumentModel

The document model is a data object that completely describes a document. You can see it as a serializable view of document.

Apart from being a data object, this object also provides some logic. For example a document model is able to lazily load data from the storage if not already loaded or it may check permissions for a given user on the document it represent.

The data contained by document model is grouped in DataModel objects.

For each document schema, there is a DataModel that contains concrete data as specified by the corresponding schema. You can see a DataModel as a data object described by a schema (i.e. a schema instance).

A document contains also data that is not defined by schemas like its internal ID, its name, its parent etc. Thus, apart from these data models there is some information stored as members on the document model like the document ID, the document name, a reference to the parent document, the ACP information (used for security checks), the session ID etc.

Also the data model contains the list of facets that the document type defines.

One of the most important ability of the document model is to lazily load data the first time the data is required from the client. This feature is important because the document may contain a lot of schemas and fields and it will be a performance problem to load all these data from the storage each time a document model is created.

Usually, the client application is using only few DataModel fields like the Tile, Description, CreationDate, etc. These are the fields commonly displayed by a tree – like explorer of the repository.

When the client is displaying or editing the document properties – then the document model will load missing data models.

To achieve this, there are schemas or fields that are declared to be lazily loaded. When creating a document model from a document, only the non-lazy schemas and fields are fetched from the storage. For example, a blob field will be always lazy.

29.4.7.2. DataModel

As detailed above, the data model is an object containing the concrete data for a document schema.

Each data model is described by the schema name and the map of fields. The data model contains no logic, it is a pure data object.

Apart from the fields map, the data model contains information about dirty fields (fields that have been modified by the client), so that when saving changes to the repository only modified fields are saved.

29.4.7.3. CoreSession

The CoreSession is a session to the Nuxeo Core. The session is opened and closed by a client and gives the client the possibility to interact with the Core.

The Core a session connects to can be located in a separate JVM or in the current one. To create remote or local sessions, you need to use a specific CoreSessionFactory object. These objects are usually specified using extension points but you can also use them programatically.

After creating a session, you can begin to retrieve and modify documents through the API exposed by the CoreSession interface.

Example of creating and using a session:

29.4.7.4. CoreInstance

This is the gateway to a Core instance. As mentioned above, the Core may be located in a remote JVM. The CoreInstance uses CoreSessionFactory objects (declared through extension points) to connect to a Core instance and to create a session.

29.4.8. Integration with Applications Servers

The repository is plugged into an application server using the a resource adapter as specified by the J2EE Connector Architecture (JCA).

The resource adapter is write over the repository model so it is not dependent on the repository implementation (like for example JackRabbit).

Currently the resource adapter has been tested only on JBoss AS

The resource adapter enables the repository to take part on transactions managed by the application server.

29.5. Extension Points

This section aims to cover all existing extension points defined by core components and to give some examples of creating new extensions.

29.5.1. Session Factories

Declaring component: org.nuxeo.ecm.core.api.CoreService

Extension point name: sessionFactory

This extension points is for registering new session factories. Session factories are used to create new Core Sessions.

Currently two session factories are provided:

a local session factory – that create sessions to a local Core (that is running in the same JVM as the client)
a remote session factory – that create sessions to a remote Core (running in a remote Application Server)

Example 29.1. Example Title XXX

29.5.2. LifeCycle Managers

Declaring component: org.nuxeo.ecm.core.lifecycle.LifeCycleService

Extension point name: lifecyclemanager

This extension points is for registering new life cycle managers. A life cycle manager is responsible for managing and storing document life cycle information.

Example 29.2. Example Title XXX

Prev	Home	Next
Chapter 28. Nuxeo Runtime	Professional Open Source ECM by Nuxeo	Chapter 30. Nuxeo Core Import / Export API