Author: Rob van Maris
Date: 2004-11-04
This software is OSI Certified Open Source Software. OSI Certified is a certification mark of the Open Source Initiative.
The license (Mozilla version 1.0) can be read at the MMBase site. See http://www.mmbase.org/license
The most important new concept introduced by XML Importer is merging objects.
The XML Importer code will handle most of the details for you, and in order to put this to work, all you will have to do is provide implementations for these interfaces:
SimilarObjectFinder
ObjectMerger
The XML Importer provides basic implementations for both of these, but some additional work will be necessary to meet your needs.
In this document we'll have a look at some of the issues involved, and give some guidelines.
When we have populated a transaction with (access and input) objects, we can merge all objects of a given type. In order to do so, the XML Importer performs these actions:
- Walk through the list of objects in the transaction - in the order they were added to the transaction - of the given type.
- For each such object, look for a similar object.
- If a similar object is found, merge both objects to a single object.
- If more than one similar object is found, the transaction cannot proceed, unless the user can choose the object to merge with.
The SimilarObjectFinder is needed to implement step 2. One fairly general way to do this is implemented by BasicFinder, which makes a distinction between exact matches (i.e. indistinguishable) and non-exact matches (e.g. different, but considered to be the same, based on some specified criteria - i.e. fuzzy comparison):
- Walk through the list of objects in the transaction, that were added before this one, compare these with this object, and keep the results as a list of exact matches and a list of non-exact matches.
- Look for exact matches in the persistent cloud
- If exact matches were found in step 1 or 2, these are returned as the result of the search.
- Otherwise, look for objects in the persistent cloud that are close enough to warrant further inspection, and compare these with this object.
- If non-exact matches were found in step 1 or 4, these are returned as the result of the search.
This strategy has these characteristics:
- If the transaction introduces a number of similar objects, these are merged one by one, in the order they were added to the transaction.
- If an exact match is found, the non-exact matches are ignored.
- Searching the persistent cloud for non-exact matches occurs only if no exact match is found (performance optimimalization).
- Searching the persistent cloud for non-exact matches is performed in two parts: selecting objects that are close enough, followed by comparing these objects with this object (performance optimalization, since this reduces the number of objects to be compared).
For examples of implementation based on BasicFinder, see MoviesFinder and PersonsFinder in the XML Importer examples code.
The ObjectMerger is needed to implement merging two objects to a single object. In order to do so, the XML Importer performs these actions:
- If one of the objects represents a persistent object, this object is made the merge target, e.g. the object that will hold the merge result.
- The fields of both objects are merged - the resulting fields are set on the merge target.
- The relations of both objects are merged - the resulting relations are set to the merge target.
- If step 3 results in duplicate relations, the duplicating relations are deleted.
- Of the two objects, only the merge target is retained - the object that is not the merge target is deleted.
- If there was no similar object to merge with, this object will only be kept in the transaction if the ObjectMerger specifies so (see method
isAllowedToAdd()
in ObjectMerger).
A fairly general implementation is provided by BasicMerger, which has these characteristics:
- The fields of the merge target are unaffected (e.g. the merge result has the same fields of the merge target).
- The relations of both objects are moved to the merge result.
- Relations are considered duplicates when of same type and with same source and destination (e.g. in this case the duplicating relations are deleted).
- Objects for which no similar object is found, are kept in the transaction.
- As a general rule, keep the transactions small.
- If a number of transactions involve merging with the same objects over and over, it is worthwhile to combine these into a single transaction.
- Within a transaction, the same object in the persistent cloud can be accessed repeatedly (e.g. <accessObject mmbaseId="12345" id="id12345">), provided the same id is used each time. This proves handy when the XML file containing the TCP code is generated by a stylesheet transformation, where it can be hard to establish if an object had been accessed within the same transaction already. Just use the mmbaseId to create a unique id, so the same id will be used when accessing the same object again.
- When it comes to performance and resources, merging objects can be expensive, therefore try to use the merging mechanism only when really needed. As an alternative, in many cases objects can be be accessed directly using their mmbase id.
- If you want to see wich objects are merged, set logging priority for the class
Transaction
to "debug", and look at the log output of the Transaction method commit()
. - Set the timeOut to a value sufficient for the transaction to be completed under normal circumstances. Keep in mind that transactions take longer to complete in interactive mode.
Merging objects can put a heavy stress on the MMBase server and database, so it is important to be aware of the following perfomance issues.
For a full understanding of the XML Importer, it is recommended to read the following documents, available on the MMBase website
- TCP 1.0 documentation (see Temporary Cloud Project).
- XML Importer overview (see XML Importer Project).
- The javadoc documentation of the
org.mmbase.applications.xmlimporter
package.