Object Interconnections:CORBA and XML, Part 2 — XML as CORBA Data

Douglas C. Schmidt and Steve Vinoski

(来源:http://www.cuj.com)

In our previous column [1], we discussed the relationship between CORBA and XML. We noted that the hype surrounding XML often conceals its real utility, which ultimately boils down to providing for structured — yet flexible — description and definition of data. XML's capabilities therefore mesh well with those of CORBA, which primarily focuses on system functionality rather than system data.

We then explored the CORBA versioning problem, using an example bug tracking system to show how modifying IDL-defined data types to match system evolution could cause the system to break. Anytime we changed our data types using a type-safe approach, we needed to recompile and redeploy all our client and server applications. If we instead used a non-type-safe approach, our code became unwieldy to develop and maintain. We then briefly explained how using XML instead to define the data allowed our type definitions to evolve as needed without requiring us to recompile and redeploy all client and server applications based on that data.

In this column, we explore more of the relationship between CORBA and XML. Since our previous column advocated the passing of XML-defined data between client and server, we first discuss various alternatives for doing this. We then conclude the column with a brief discussion of SOAP and Web Services and how they relate to CORBA.

XML as Strings

Because XML is textual, passing it around a CORBA system as a string is the most obvious way to handle it. Using the bug tracking system definitions from our previous column, for example, we could modify the BugTracker interface slightly to support XML-based strings for the Bug type definition:

interface BugTracker {   
   exception NoSuchBug { long bugnum; };
   typedef string Bug; // assume XML string contents
   Bug get_details(in long bugnum) raises(NoSuchBug);
   // ...
};

There are multiple problems, however, with this approach:

  • There is no way to guarantee that every implementation of the BugTracker interface will return an XML string describing the details of the desired bug. Because the get_details operation is defined to return a string, the ORB's marshaling subsystem will guarantee that a valid string is returned, or it will throw a CORBA::MARSHAL exception. But the ORB can do nothing more than that — it's not allowed to examine the contents of the string on behalf of the application to guarantee the string contents are valid XML.
  • The string return type cannot handle character sets other than the CORBA standard ISO Latin-1, which will not work for some BugTracker implementations. It's not unusual to expect that under some circumstances, a bug might contain internationalized text — for example, a bug report could contain localized error messages as part of its detailed description.
  • This approach can be inefficient. For example, the target might implement the get_details operation by creating an in-memory DOM (Document Object Model) tree [2], populating it with data for the requested bug, converting the DOM tree into an XML string, and returning it. Then, to examine or display the bug data, the invoking client might convert the XML string back into a DOM tree. This means that the application is essentially taking on the burden of performing its own marshaling and demarshaling above the ORB level. In other words, it's letting the ORB handle the marshaling steps required to send data over the network, but it's treating the IDL string data that's exchanged at the ORB level as another marshaled form — specifically, a stringified DOM tree — representing its own application-specific data.

Naturally, the severity of these problems depends entirely on the nature of the application. After all, there are CORBA applications already in production today that pass XML data between client and target in the form of IDL strings, and they work just fine. But not all applications fit this description. In particular, the severity of the third problem for a given application depends on the lengths of the XML strings the application deals with, and on the efficiency of the DOM implementation that the application employs.

As always, there are alternative approaches that applications can use to avoid these problems. For example, one obvious way to avoid the character set internationalization problem described above is to use wstring rather than string, which allows your XML documents to contain characters from character sets other than just ISO Latin-1. There are drawbacks to even this simple fix, however. For one thing, not all ORBs support the wchar or wstring types yet. Even for those ORBs that do support wstring, the needless complexity of the code-set negotiation aspects of GIOP might mean that your ORB's handling of wide characters is neither robust nor efficient, and thus not production-ready. Moreover, if for some reason your application requires the use of ORBs from two or more suppliers, the same code-set negotiation complexities may cause interoperability problems between the different ORBs. (In an ideal world, ORB suppliers would deliver functional and conforming implementations fully supporting wchar and wstring, but in practice this is not always the case.)

Alternative Representations?

Given that passing XML data around your CORBA systems in string format seems to be fragile, we need to find still another format for representing XML data in a system. Reverting to structs to represent our bug data is one approach, but given that we started out with structs in our previous column and abandoned them in favor of XML due to versioning issues, it doesn't make much sense to go back there. The IDL valuetype suffers from the same versioning issues as structs.

Let's reexamine an example of our XML-based bug definition from our previous column:

<bug>   
   <bugnum>49938</bugnum>   
   <synopsis>DynStruct broken</synopsis>   
   <owner>vinoski</owner>   
   <reported_by>schmidt</reported_by>   
</bug>

In XML terms, we have a bug element with four child elements named bugnum, synopsis, owner, and reported_by. Because XML fundamentally provides a way to represent structured information, this bug definition, as well as any XML document, can be naturally thought of as a hierarchy of nodes. All documents contain a "root node" that contains all other nodes in the document as child nodes. Other nodes might themselves have child nodes too. In our example, the root node has one child node named bug, which in turn has four child nodes as listed above. When handling such an XML hierarchy programmatically — using DOM for example — your program manipulates such a tree structure. For example, a pure XML-oriented program might first parse an XML file to create an in-memory tree representing the XML data, then manipulate either the nodes in the tree or the contents of the nodes (or both), and then finally write the tree back out to another file.

An alternative to a string XML representation in a CORBA application, therefore, might be as an XML data tree. One way to represent a tree structure in IDL is to use structs and sequences. For example, you could define a tree node in IDL like this:

   struct XMLElement;   
   typedef sequence<XMLElement> XMLElementSeq;   

   struct XMLAttr {   
      wstring name;   
      wstring value;   
   };   
   typedef sequence<XMLAttr> XMLAttrSeq;   

   struct XMLElement {   
      wstring name;   
      wstring value;   
      XMLAttrSeq attributes;   
      XMLElementSeq children;   
      // ...   
   };   

Like a string, this tree structure can be passed from one CORBA application to another. It differs from the string approach, however, in that it avoids the need for the application to convert its XML data to and from a string representation when sending it to another process. Instead, it allows the application to send and receive its XML data directly as a tree structure, avoiding XML string parsing and its associated overhead.

Note that XMLElement contains a sequence of itself to simulate a tree structure. This type of definition normally requires the use of an anonymous type for the sequence member, but because anonymous types are problematic [3], we've used a new IDL feature for avoiding anonymous types: the forward struct declaration. By forward declaring the XMLElement type, we can then correctly declare a sequence of its type, which we can then use inside the XMLElement type definition. Your ORB may not yet support this, but if your ORB supplier keeps up to date with respect to CORBA versions, you should have support for it soon. The old way of declaring a recursive struct type using an anonymous sequence will also still work, but it has been deprecated as of CORBA 2.4 [4].

While this approach to modeling an XML tree in IDL will indeed work within an application, it's somewhat cumbersome and error-prone. In particular, the XMLElement type is a plain old data structure, and we know from years of OO programming that exposing fields in data structures is asking for trouble. Moreover, sequences have a minimal interface in the OMG C++ Mapping, making it hard when you want to add or remove children from the middle of the sequence.

In April 2001, the OMG formally adopted a new specification that solves these issues for us. The specification is called "XMLDOM: DOM/Value Mapping Specification" [5]. The DOM/Value mapping, like the original DOM, specifies an API that allows your application to build and traverse XML parse trees in memory. Unlike the original DOM specification, though, the DOM/Value mapping uses IDL valuetypes to represent the nodes in the XML tree. This approach has certain advantages, which we describe below.

DOM/Value Mapping

If you look closely at the XMLElement type defined above, you'll see that it differs significantly from our previous approaches to defining data in IDL. Specifically, it avoids defining the data directly as we did with our Bug struct in our previous column, like this:

struct Bug {   
   long bugnum;   
   string synopsis;   
   string owner;   
};

Instead, the XMLElement type defines the characteristics of an XML element independent of the data it might hold. This is an important distinction because it allows XMLElement and its associated types to represent any XML data hierarchy, regardless of the shape of the hierarchy or the data it contains.

This approach to representing an XML tree in IDL is a key aspect of the OMG DOM/Value mapping. The specification contains definitions for node types that can compose an XML tree, but these node types are specified using IDL valuetypes, rather than IDL interfaces as in the original DOM specification. Because valuetypes support operations, it means that nodes can supply operations for their own manipulation, rather than directly exposing their data members to the whole application. More importantly, because valuetypes are transmissible, unlike interfaces (remember, CORBA objects are themselves not transmissible, only their object references are), it means that applications can send and receive XML data directly as a tree structure rather than having to convert to and from a stringified format. For applications that manipulate a lot of XML data, this feature can help greatly with performance by avoiding the inefficiencies associated with stringified format conversions. Note that this feature differs from that of the original DOM, where trees are represented as hierarchies of CORBA objects, meaning that references into a tree may be passed from one process to another, but not the tree itself. (Keep in mind that DOM was defined before valuetypes or local interfaces were added to IDL).

The DOM/Value mapping supports two different usage scenarios for XML documents:

  • The dynamic information scenario is where the meaning of the elements in the document is not defined. In other words, the application has only the contents of the XML document available to it via the DOM/Value mapping. The DOM/Value mapping represents the contents of the XML document as an in-memory tree. An application can access or modify the document contents by performing operations on the general valuetypes composing the tree.
  • The static information scenario builds on the dynamic information scenario by including statically known information about the elements making up the XML document. This information is represented in DTDs (Document Type Definitions), which are essentially type definitions for XML elements (in many ways, a DTD is to an XML document what a C++ class is to a class instance). Rather than using general valuetypes to represent the XML data, specific valuetypes based on the DTD are code-generated, allowing the data to be represented using DTD-specific valuetypes. These specific valuetypes are derived from the general valuetypes used for the dynamic information scenario, meaning that the general view of the tree is always available to the application. This can be important if your application sends a valuetype tree to another application that does not contain definitions for some or all of the DTD-specific valuetypes. Unfortunately, this is achieved by using truncatable inheritance between valuetypes in the DOM/Value specification, but not all ORBs support truncatable, even if they support valuetypes. (The truncatable IDL keyword specifies that a valuetype can safely be truncated to a base valuetype in the marshaling process without loss of information for the application.)

We don't know of any implementations of the DOM/Value mapping currently available today, but if you're interested in using this capability to deal with XML data in your CORBA applications, you should ask your ORB supplier about their plans for supporting this specification.

But What about SOAP?

All this talk about XML and CORBA naturally leads to the question of how SOAP — the Simple Object Access Protocol [6] — fits in with CORBA. SOAP is an XML-based protocol intended to exchange information between senders and receivers, usually over HTTP, in decentralized distributed environments. Its protocol consists of three parts:

  1. An envelope that describes the contents of a message and how to process it,
  2. A set of encoding rules for expressing instances of application-defined datatypes, and
  3. A convention for representing remote procedure calls and responses.

Our discussion of the relationship between XML and CORBA has thus far focused on how CORBA applications can make use of XML. We've explored alternatives for sending and receiving XML messages in CORBA, but we haven't focused at all on how IIOP represents such messages. Like IIOP, SOAP is a protocol for conveying messages between applications. Unlike IIOP, which represents message data in binary format, SOAP represents its message content using XML.

Using XML at the application level is not made any easier or more difficult using either SOAP or IIOP. This is because one of the central ideas of CORBA is to hide the complexity and detail associated with underlying communication transports and protocols used to convey messages between applications. In other words, the use of XML as a transfer syntax for SOAP is completely orthogonal to whether an application uses it for its own data.

Nevertheless, the relationship between SOAP and CORBA is definitely worth exploring. Today SOAP is equated with Web Services, which is being widely hyped as the next Silver Bullet that will magically solve all of your computing problems (though not necessarily all of your venture capital acquisition problems if you work for a dot com...). In our next column, we'll explore how CORBA, SOAP, and Web Services all relate to one another.

Concluding Remarks

In this column, we explored alternatives for representing and manipulating XML data in CORBA applications. Although a typical approach used by some CORBA applications today is to represent XML data as IDL strings, this comes at a price:

  1. It's inefficient because it requires transformation to or from string form whenever an application wants to send or receive XML data from another CORBA application.
  2. It's error-prone because it relies heavily on programming conventions

In April 2001, the OMG published the DOM/Value Mapping Specification, which allows applications to represent and manipulate XML data as valuetype trees. These standard valuetypes are based on the original DOM specification, which used IDL interfaces to represent XML tree nodes. Like interfaces, valuetypes support attributes and operations, meaning that XML data can be encapsulated per standard OO best practices. Unlike interfaces, however, valuetypes are transmissible, allowing entire XML data trees to be sent and received by CORBA applications. Programmatic access and transmissibility make the DOM/Value mapping a superior alternative to string-based systems or homegrown mappings based on structs.

In our next column, we'll finish our exploration of CORBA and XML by delving into SOAP and Web Services.

References

[1] D.C. Schmidt and S. Vinoski. "CORBA and XML, Part 1: Versioning," C/C++ Users Journal C++ Experts Forum, May 2001, www.cuj.com/experts/1905/vinoski.htm.

[2] Document Object Model (DOM) Level 2 Core Specification. W3C Recommendation. World Wide Web Consortium, http://www.w3.org/TR/DOM-Level-2-Core/, November 13, 2000.

[3] Michi Henning and Steve Vinoski. Advanced CORBA Programming with C++ (Addison Wesley, 1999).

[4] Object Management Group. CORBA 2.4.2 Specification. http://www.omg.org/cgi-bin/doc?formal/01-02-33, 2001.

[5] Object Management Group. XMLDOM: DOM/Value Mapping Specification. http://www.omg.org/cgi-bin/doc?ptc/2001-04-04, 2001.

[6] Simple Object Access Protocol (SOAP) 1.1. W3C Note. World Wide Web Consortium, http://www.w3.org/TR/SOAP/, May 8, 2000.

About the Authors

Steve Vinoski is chief architect and vice president of Platform Technologies for IONA Technologies and is also an IONA Fellow. A frequent speaker at technical conferences, he has been giving CORBA tutorials around the globe since 1993. Steve helped put together several important OMG specifications, including CORBA 1.2, 2.0, 2.2, and 2.3; the OMG IDL C++ Language Mapping; the ORB Portability Specification; and the Objects By Value Specification. In 1996, he was a charter member of the OMG Architecture Board. He is currently the chair of the OMG IDL C++ Mapping Revision Task Force. He and Michi Henning are the authors of Advanced CORBA Programming with C++, published in January 1999 by Addison Wesley Longman.

Doug Schmidt is an associate professor member at the University of California, Irvine. His research focuses on patterns, optimization principles, and empirical analyses of object-oriented techniques that facilitate the development of high-performance, real-time distributed object computing middleware on parallel processing platforms running over high-speed networks and embedded system interconnects. He is the lead author of the book Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects, published in 2000 by Wiley and Sons. He can be contacted at [email protected].