Object Interconnections:CORBA and XML, Part 2 — XML as CORBA DataDouglas C. Schmidt and Steve VinoskiIn our previous column [1], we discussed the relationship between CORBA and XML. We noted that the hype surrounding XML often conceals its real utility, which ultimately boils down to providing for structured yet flexible description and definition of data. XML's capabilities therefore mesh well with those of CORBA, which primarily focuses on system functionality rather than system data. We then explored the CORBA versioning problem, using an example bug tracking system to show how modifying IDL-defined data types to match system evolution could cause the system to break. Anytime we changed our data types using a type-safe approach, we needed to recompile and redeploy all our client and server applications. If we instead used a non-type-safe approach, our code became unwieldy to develop and maintain. We then briefly explained how using XML instead to define the data allowed our type definitions to evolve as needed without requiring us to recompile and redeploy all client and server applications based on that data. In this column, we explore more of the relationship between CORBA and XML. Since our previous column advocated the passing of XML-defined data between client and server, we first discuss various alternatives for doing this. We then conclude the column with a brief discussion of SOAP and Web Services and how they relate to CORBA. XML as StringsBecause XML is textual, passing it around a CORBA system as a string is the most obvious way to handle it. Using the bug tracking system definitions from our previous column, for example, we could modify the BugTracker interface slightly to support XML-based strings for the Bug type definition: interface BugTracker { exception NoSuchBug { long bugnum; }; typedef string Bug; // assume XML string contents Bug get_details(in long bugnum) raises(NoSuchBug); // ... }; There are multiple problems, however, with this approach:
Naturally, the severity of these problems depends entirely on the nature of the application. After all, there are CORBA applications already in production today that pass XML data between client and target in the form of IDL strings, and they work just fine. But not all applications fit this description. In particular, the severity of the third problem for a given application depends on the lengths of the XML strings the application deals with, and on the efficiency of the DOM implementation that the application employs. As always, there are alternative approaches that applications can use to avoid these problems. For example, one obvious way to avoid the character set internationalization problem described above is to use wstring rather than string, which allows your XML documents to contain characters from character sets other than just ISO Latin-1. There are drawbacks to even this simple fix, however. For one thing, not all ORBs support the wchar or wstring types yet. Even for those ORBs that do support wstring, the needless complexity of the code-set negotiation aspects of GIOP might mean that your ORB's handling of wide characters is neither robust nor efficient, and thus not production-ready. Moreover, if for some reason your application requires the use of ORBs from two or more suppliers, the same code-set negotiation complexities may cause interoperability problems between the different ORBs. (In an ideal world, ORB suppliers would deliver functional and conforming implementations fully supporting wchar and wstring, but in practice this is not always the case.) Alternative Representations?Given that passing XML data around your CORBA systems in string format seems to be fragile, we need to find still another format for representing XML data in a system. Reverting to structs to represent our bug data is one approach, but given that we started out with structs in our previous column and abandoned them in favor of XML due to versioning issues, it doesn't make much sense to go back there. The IDL valuetype suffers from the same versioning issues as structs. Let's reexamine an example of our XML-based bug definition from our previous column: <bug> <bugnum>49938</bugnum> <synopsis>DynStruct broken</synopsis> <owner>vinoski</owner> <reported_by>schmidt</reported_by> </bug> In XML terms, we have a bug element with four child elements named bugnum, synopsis, owner, and reported_by. Because XML fundamentally provides a way to represent structured information, this bug definition, as well as any XML document, can be naturally thought of as a hierarchy of nodes. All documents contain a "root node" that contains all other nodes in the document as child nodes. Other nodes might themselves have child nodes too. In our example, the root node has one child node named bug, which in turn has four child nodes as listed above. When handling such an XML hierarchy programmatically using DOM for example your program manipulates such a tree structure. For example, a pure XML-oriented program might first parse an XML file to create an in-memory tree representing the XML data, then manipulate either the nodes in the tree or the contents of the nodes (or both), and then finally write the tree back out to another file. An alternative to a string XML representation in a CORBA application, therefore, might be as an XML data tree. One way to represent a tree structure in IDL is to use structs and sequences. For example, you could define a tree node in IDL like this: struct XMLElement; typedef sequence<XMLElement> XMLElementSeq; struct XMLAttr { wstring name; wstring value; }; typedef sequence<XMLAttr> XMLAttrSeq; struct XMLElement { wstring name; wstring value; XMLAttrSeq attributes; XMLElementSeq children; // ... }; Like a string, this tree structure can be passed from one CORBA application to another. It differs from the string approach, however, in that it avoids the need for the application to convert its XML data to and from a string representation when sending it to another process. Instead, it allows the application to send and receive its XML data directly as a tree structure, avoiding XML string parsing and its associated overhead. Note that XMLElement contains a sequence of itself to simulate a tree structure. This type of definition normally requires the use of an anonymous type for the sequence member, but because anonymous types are problematic [3], we've used a new IDL feature for avoiding anonymous types: the forward struct declaration. By forward declaring the XMLElement type, we can then correctly declare a sequence of its type, which we can then use inside the XMLElement type definition. Your ORB may not yet support this, but if your ORB supplier keeps up to date with respect to CORBA versions, you should have support for it soon. The old way of declaring a recursive struct type using an anonymous sequence will also still work, but it has been deprecated as of CORBA 2.4 [4]. While this approach to modeling an XML tree in IDL will indeed work within an application, it's somewhat cumbersome and error-prone. In particular, the XMLElement type is a plain old data structure, and we know from years of OO programming that exposing fields in data structures is asking for trouble. Moreover, sequences have a minimal interface in the OMG C++ Mapping, making it hard when you want to add or remove children from the middle of the sequence. In April 2001, the OMG formally adopted a new specification that solves these issues for us. The specification is called "XMLDOM: DOM/Value Mapping Specification" [5]. The DOM/Value mapping, like the original DOM, specifies an API that allows your application to build and traverse XML parse trees in memory. Unlike the original DOM specification, though, the DOM/Value mapping uses IDL valuetypes to represent the nodes in the XML tree. This approach has certain advantages, which we describe below. DOM/Value MappingIf you look closely at the XMLElement type defined above, you'll see that it differs significantly from our previous approaches to defining data in IDL. Specifically, it avoids defining the data directly as we did with our Bug struct in our previous column, like this: struct Bug { long bugnum; string synopsis; string owner; }; Instead, the XMLElement type defines the characteristics of an XML element independent of the data it might hold. This is an important distinction because it allows XMLElement and its associated types to represent any XML data hierarchy, regardless of the shape of the hierarchy or the data it contains. This approach to representing an XML tree in IDL is a key aspect of the OMG DOM/Value mapping. The specification contains definitions for node types that can compose an XML tree, but these node types are specified using IDL valuetypes, rather than IDL interfaces as in the original DOM specification. Because valuetypes support operations, it means that nodes can supply operations for their own manipulation, rather than directly exposing their data members to the whole application. More importantly, because valuetypes are transmissible, unlike interfaces (remember, CORBA objects are themselves not transmissible, only their object references are), it means that applications can send and receive XML data directly as a tree structure rather than having to convert to and from a stringified format. For applications that manipulate a lot of XML data, this feature can help greatly with performance by avoiding the inefficiencies associated with stringified format conversions. Note that this feature differs from that of the original DOM, where trees are represented as hierarchies of CORBA objects, meaning that references into a tree may be passed from one process to another, but not the tree itself. (Keep in mind that DOM was defined before valuetypes or local interfaces were added to IDL). The DOM/Value mapping supports two different usage scenarios for XML documents:
We don't know of any implementations of the DOM/Value mapping currently available today, but if you're interested in using this capability to deal with XML data in your CORBA applications, you should ask your ORB supplier about their plans for supporting this specification. But What about SOAP?All this talk about XML and CORBA naturally leads to the question of how SOAP the Simple Object Access Protocol [6] fits in with CORBA. SOAP is an XML-based protocol intended to exchange information between senders and receivers, usually over HTTP, in decentralized distributed environments. Its protocol consists of three parts:
Our discussion of the relationship between XML and CORBA has thus far focused on how CORBA applications can make use of XML. We've explored alternatives for sending and receiving XML messages in CORBA, but we haven't focused at all on how IIOP represents such messages. Like IIOP, SOAP is a protocol for conveying messages between applications. Unlike IIOP, which represents message data in binary format, SOAP represents its message content using XML. Using XML at the application level is not made any easier or more difficult using either SOAP or IIOP. This is because one of the central ideas of CORBA is to hide the complexity and detail associated with underlying communication transports and protocols used to convey messages between applications. In other words, the use of XML as a transfer syntax for SOAP is completely orthogonal to whether an application uses it for its own data. Nevertheless, the relationship between SOAP and CORBA is definitely worth exploring. Today SOAP is equated with Web Services, which is being widely hyped as the next Silver Bullet that will magically solve all of your computing problems (though not necessarily all of your venture capital acquisition problems if you work for a dot com...). In our next column, we'll explore how CORBA, SOAP, and Web Services all relate to one another. Concluding RemarksIn this column, we explored alternatives for representing and manipulating XML data in CORBA applications. Although a typical approach used by some CORBA applications today is to represent XML data as IDL strings, this comes at a price:
In April 2001, the OMG published the DOM/Value Mapping Specification, which allows applications to represent and manipulate XML data as valuetype trees. These standard valuetypes are based on the original DOM specification, which used IDL interfaces to represent XML tree nodes. Like interfaces, valuetypes support attributes and operations, meaning that XML data can be encapsulated per standard OO best practices. Unlike interfaces, however, valuetypes are transmissible, allowing entire XML data trees to be sent and received by CORBA applications. Programmatic access and transmissibility make the DOM/Value mapping a superior alternative to string-based systems or homegrown mappings based on structs. In our next column, we'll finish our exploration of CORBA and XML by delving into SOAP and Web Services. References[1] D.C. Schmidt and S. Vinoski. "CORBA and XML, Part 1: Versioning," C/C++ Users Journal C++ Experts Forum, May 2001, www.cuj.com/experts/1905/vinoski.htm. [2] Document Object Model (DOM) Level 2 Core Specification. W3C Recommendation. World Wide Web Consortium, http://www.w3.org/TR/DOM-Level-2-Core/, November 13, 2000. [3] Michi Henning and Steve Vinoski. Advanced CORBA Programming with C++ (Addison Wesley, 1999). [4] Object Management Group. CORBA 2.4.2 Specification. http://www.omg.org/cgi-bin/doc?formal/01-02-33, 2001. [5] Object Management Group. XMLDOM: DOM/Value Mapping Specification. http://www.omg.org/cgi-bin/doc?ptc/2001-04-04, 2001. [6] Simple Object Access Protocol (SOAP) 1.1. W3C Note. World Wide Web Consortium, http://www.w3.org/TR/SOAP/, May 8, 2000. About the AuthorsSteve Vinoski is chief architect and vice president of Platform Technologies for IONA Technologies and is also an IONA Fellow. A frequent speaker at technical conferences, he has been giving CORBA tutorials around the globe since 1993. Steve helped put together several important OMG specifications, including CORBA 1.2, 2.0, 2.2, and 2.3; the OMG IDL C++ Language Mapping; the ORB Portability Specification; and the Objects By Value Specification. In 1996, he was a charter member of the OMG Architecture Board. He is currently the chair of the OMG IDL C++ Mapping Revision Task Force. He and Michi Henning are the authors of Advanced CORBA Programming with C++, published in January 1999 by Addison Wesley Longman. Doug Schmidt is an associate professor member at the University of California, Irvine. His research focuses on patterns, optimization principles, and empirical analyses of object-oriented techniques that facilitate the development of high-performance, real-time distributed object computing middleware on parallel processing platforms running over high-speed networks and embedded system interconnects. He is the lead author of the book Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects, published in 2000 by Wiley and Sons. He can be contacted at [email protected]. |