Frequently Asked Questions

A.1. Deployment and Operations

Frequently Asked Questions: Nuxeo EP Deployment and Operations

A.1.1. Operation Platforms
Q: Which platforms are supported/certified (hardware, OS, RDMBS, application server)?
Q: Which is the reference platform used for development, deployment and testing? Which is the reference platform recommended by the software vendor?
Q: What are the minimum requirements for the software installation (CPU, memory, available disk space, etc.)?
A.1.2. Available Documentation
Q: Are Releases Notes available alongside software release (including fixed bugs, improvements, known issues and limitations)?
Q: Is there an Installation Guide (including migration procedures between software releases)?
Q: Is there an Operation Guide?
Q: Is there an User Guide?
Q: Is there an Developer Guide?
Q: Is there a Reference Manual?
Q: Are there other documents?
Q: What is the reference language for the documentation? Into which languages is this documentation translated?
Q: Does Nuxeo EP's reference documentation includes the documentation of software packages bundled with it (such as Lucene, Jena, etc.)? If not included, how the documentation of bundled software is accessible?
A.1.3. Upgrades
Q: What is the release policy of Nuxeo (major / minor, release cycle, etc.)?
Q: What configuration steps are required to set up a system ready for operations? A complete procedure is expected.
Q: How Nuxeo EP (and applications built on it) can be monitored using common monitoring tools: process to monitor, alert levels, scenarii, etc.?
A.1.4. Service Continuity
Q: Does Nuxeo EP work with Sun Cluster 3.x (and is it certified for it)? Is an HA agent available? Is there references in operation?
Q: Is it possible to set up Nuxeo EP so that it can be bound to a dedicated virtual IP address (and only answer to request on this one), separated from the physical node IP address? Since we do not have a cluster for testing, this could be tested using a virtual/alias interface using a dedicated IP.
Q: Is Nuxeo EP (and related software packages) absolutely independent from the hostname of the physical cluster node? This could be tested/validated by changing the hostname of the test server and restarting the system.
Q: Fail-over mode: automatic or manual? Is it transparent for the end-user? What are the other requirements (ex: switches, load-balancers, etc.)? If data replication is required: what are the replication mechanisms and what about the risk of data integrity loss / desync?
Q: How can Nuxeo EP enforce data integrity after a software or hardware crash hosting a Nuxeo Content Repository (since it uses a RDMBS database and a filesystem storage that can even be stored on different nodes).
Q: Is there any procedure for data integrity check and/or repair of inconsistency, if it appears?
Q: If a repository is corrupted, what is the maximal impact on the service's operations for end-users? How are reported errors due to a data corruption (ex: logs, notifications, etc.)?
A.1.5. Backup & Restore
Q: Does Nuxeo EP offer applicative incremental backup and restore features? If not, is there any third party software or component that offers this?
Q: What is the measured performance of backup and restore tools (processed content objects by time unit, backup time / restore time ratio, impact on application)?
Q: What is the procedure to achieve a consistent backup / restore of a repository? What are the measured performances? Impacts on service / operations?
Q: What is the procedure to restore one content object (files + metadata)? Idem for a set of content object (ex: all document for a given user, all documents from a folder, all document matching a set a criteria)?
Q: How can a job scheduling / management service manage backup procedures (such as cron, anacron, CA Unicenter Autosys Job Management, etc.)? Is there any supported scripts?

A.1.1. Operation Platforms

Q:

Which platforms are supported/certified (hardware, OS, RDMBS, application server)?

A:

Nuxeo supports and certifies the following hardware and software:

Hardware:

  • Intel/AMD 32-bit & 64-bit

  • SPARC 32-bit & 64-bit

Operating Systems:

  • RedHat 3.x, 4.x, 5.x

  • Debian 4.0, Ubuntu Server 6.06 LTS and 7.04

  • Solaris 10

  • Windows Server 2003

  • MacOS X 10.4.x

RDBMS:

  • PostgreSQL 8.x

  • MySQL 5.x

  • Oracle Database 9i, Oracle Database 10g

Java Runtime Environment (JRE):

  • Java 5 aka 1.5.0 (update 11 recommended)

  • Java 6 aka 1.6.0 (update 11 recommended)

Java EE application servers:

  • JBoss AS 4.0.4 GA and 4.0.5 GA

  • JBoss AS 4.2.0 GA (in progress)

  • Glassfish v2 (in progress)

  • BEA WebLogic 10 (in progress)

Q:

Which is the reference platform used for development, deployment and testing? Which is the reference platform recommended by the software vendor?

A:

The most used configuration is JBoss AS 4.0.4 GA using JRE 1.5.0_11 on RedHat AS 4.x running on Intel x86 hardware.

Q:

What are the minimum requirements for the software installation (CPU, memory, available disk space, etc.)?

A:

Intel-based hardware (bi-Dual Core, Quad Core or bi-Quad Core), 4GB of RAM. The required disk space only depends on the data volume to store (raw requirements to be secure: size of files to manage * 2).

A.1.2. Available Documentation

Q:

Are Releases Notes available alongside software release (including fixed bugs, improvements, known issues and limitations)?

A:

Each release (major and minor) is delivered with an upgrade procedure, new features list, improvements list, fixed bugs list and known bugs/limitations list. Moreover the issue tracker is public (it allows everybody to see the status of the software, known/ongoing bugs and issues, features/improvement roadmap, etc.).

Q:

Is there an Installation Guide (including migration procedures between software releases)?

A:

An installation guide is available. Upgrade procedures are delivered along each release.

Q:

Is there an Operation Guide?

A:

An administration guide is available and updated with each release.

Q:

Is there an User Guide?

A:

A User Guide is delivered with each release.

Q:

Is there an Developer Guide?

A:

A developer guide is delivered and constantly improved.

Q:

Is there a Reference Manual?

A:

The reference manual assembles all the documentation available for users, developers, operation teams, etc.

Q:

Are there other documents?

A:

Nuxeo provides several other documents/resources such as the API (Javadoc), some tutorials, specific Archetypes for Maven 2 (useful to quickly bootstrap new plugins / projects), etc.

Q:

What is the reference language for the documentation? Into which languages is this documentation translated?

A:

The documentation is available in the English language. Translation to French, Spanish, German and Italian are supported/provided by the community (if you require a specific language, you can order it from Nuxeo).

Q:

Does Nuxeo EP's reference documentation includes the documentation of software packages bundled with it (such as Lucene, Jena, etc.)? If not included, how the documentation of bundled software is accessible?

A:

The documentation of included software are either included in the reference documentation if it's useful for common operations, either linked if not. All the documentation of Nuxeo EP and bundled software packages is freely available.

A.1.3. Upgrades

Q:

What is the release policy of Nuxeo (major / minor, release cycle, etc.)?

A:

Yearly major release and quarterly minor release. The high-level roadmap is published by Nuxeo and updated frequently. Detailed roadmap is available from the issue tracker (all details are available on each issue such as comments, status, votes, related commit in the SCM, etc.).

Q:

What configuration steps are required to set up a system ready for operations? A complete procedure is expected.

A:

See installation procedure. XXX Add link.

Q:

How Nuxeo EP (and applications built on it) can be monitored using common monitoring tools: process to monitor, alert levels, scenarii, etc.?

A:

The "Administration and Operation Guide" describes available monitoring points. In short, Nuxeo EP offers a set of JMX services to monitor all critical points of the application (standard Java EE applications monitoring system). Moreover, logs can be broadcasted using log4j capabilities (SNMP, email, etc.). Both should be usable by all major monitoring software.

A.1.4. Service Continuity

Q:

Does Nuxeo EP work with Sun Cluster 3.x (and is it certified for it)? Is an HA agent available? Is there references in operation?

A:

Nuxeo EP is fully based on Java EE 5 and supports related clustering and HA features. JBoss Clustering is the recommended clustering and HA solution for Nuxeo EP's services. Nuxeo EP services can be configured for performance clustering and/or HA clustering (depending on the capabilities and requirements of each service).

Q:

Is it possible to set up Nuxeo EP so that it can be bound to a dedicated virtual IP address (and only answer to request on this one), separated from the physical node IP address? Since we do not have a cluster for testing, this could be tested using a virtual/alias interface using a dedicated IP.

A:

This is possible through configuration of the application server (ex: JBoss AS / Tomcat). Nuxeo EP relies on the application server for all the network configuration.

Q:

Is Nuxeo EP (and related software packages) absolutely independent from the hostname of the physical cluster node? This could be tested/validated by changing the hostname of the test server and restarting the system.

A:

Nuxeo EP entirely depends on the Java EE application server for all network related configuration. It is not bind in any way to the physical network configuration of the server. Hence it is possible to change the hostname of the server and restart the machine without causing any problem to Nuxeo EP.

Q:

Fail-over mode: automatic or manual? Is it transparent for the end-user? What are the other requirements (ex: switches, load-balancers, etc.)? If data replication is required: what are the replication mechanisms and what about the risk of data integrity loss / desync?

A:

Fail-over relies on JBoss Clustering for Nuxeo EP services. Here is the HA system used for each category of services:

  • Nuxeo Core (Content Repository): HA clustering only. It relies on the native RDMBS replication system (Oracle RAC or PostgreSQL replication solutions). Data integrity has to be trustable and enforced.

  • Nuxeo Search (Search Engine): HA and performance clustering. It can use a shared filesystem (if indexes are stored on the filesystem) or can rely on the RDMBS replication solution. If data integrity is corrupt, a reindexing of the content is be sufficient to restore it.

  • EJB3-based Services: HA and performance clustering. Use native EJB3 clustering and load-balancing from JBoss Clustering. Services using data persistence rely on RDMBS replication (for HA) that needs to be trustable and enforced.

  • Web Client/App: can use HA and performance clustering (using JBoss Clustering). Does not need data sync.

Q:

How can Nuxeo EP enforce data integrity after a software or hardware crash hosting a Nuxeo Content Repository (since it uses a RDMBS database and a filesystem storage that can even be stored on different nodes).

A:

To achieve the highest level of data integrity, Nuxeo recommends storing binary files as BLOBs directly in the database (hence use a RDBMS offering optimized BLOBs storage (such as Oracle or PostgreSQL). Using this mechanism, Nuxeo EP can store all its data into the RDBMS (including request/search engine indexes) and relies on it to enforce data integrity. Moreover, Nuxeo EP is fully transactional and relies on JTA (+ XA) for transaction management (that enforce data integrity across data sources).

Q:

Is there any procedure for data integrity check and/or repair of inconsistency, if it appears?

A:

Nuxeo EP has been designed to completely rely on RDBMS data integrity (that can be considered trustable nowadays). One can use RDBMS tools to check data integrity/consistency and data failure if any. If the data model is corrupted, Nuxeo EP warns about it when the repository starts. Indexes (from Nuxeo Search) can be verified and easily be rebuilt by reindexing the content if any problem occurs.

Q:

If a repository is corrupted, what is the maximal impact on the service's operations for end-users? How are reported errors due to a data corruption (ex: logs, notifications, etc.)?

A:

The maximal impact is service downtime and data restoration from backups. Data integrity errors are reported in the logs and can then be sent via email notifications, SNMP and any log4j capabilities.

A.1.5. Backup & Restore

Q:

Does Nuxeo EP offer applicative incremental backup and restore features? If not, is there any third party software or component that offers this?

A:

Nuxeo EP offers an applicative data import/export service (using XML serialization of documents) that can be used as an incremental backup/restore system. For an efficient backup system, Nuxeo recommends using native RDBMS tools (that can offer incremental backups, snapshots, hot restore, etc.).

Q:

What is the measured performance of backup and restore tools (processed content objects by time unit, backup time / restore time ratio, impact on application)?

A:

The restoration speed is the native database write performances. We do not have more statistics yet (but should be available by July 2007, benchmark are in progress on this point).

Q:

What is the procedure to achieve a consistent backup / restore of a repository? What are the measured performances? Impacts on service / operations?

A:

When all datasources for storage are using the same database (the recommended setup), the RDBMS can achieve a consistent backup (usually at low cost for the user service). Restore can only be launched when the system is stopped.

Q:

What is the procedure to restore one content object (files + metadata)? Idem for a set of content object (ex: all document for a given user, all documents from a folder, all document matching a set a criteria)?

A:

Content object restore can be done using the import/export service. Here is the procedure to achieve this:

  1. Get IDs of content object to restore using, for example, the audit service/log (ex: get all DocIds from "CreateObject" log entries for a particular user).

  2. Get those document from a backup (done via the export service) and copy them in a directory (the standard export format use one directory per content object which is easing a lot this operation).

  3. Use a command line import/export client to (re-)import document in this directory.

  4. You're done.

Q:

How can a job scheduling / management service manage backup procedures (such as cron, anacron, CA Unicenter Autosys Job Management, etc.)? Is there any supported scripts?

A:

RDBMS backup can be handled as usual using legacy backup scripts for this RDBMS. Applicative backups can be launched using the import/export client CLI. There is not supported scripts at the moment (but they could easily be written).