Enabling Full Service Surrogates Using the Portable Channel Representation | |
Micah Beck* Terry Moore* Innovative Computing Laboratory Department of Computer Science University of Tennessee Knoxville, TN 37996-3450, USA +1 865 974 3548 {mbeck, tmoore}@cs.utk.edu |
Leif Abrahamsson Chistophe Achouiantz Patrik Johansson Lokomo Systems AB Svärdvägen 27 aSE-182 33 Danderyd, Sweden +46 8 5490 4380 {leif, chris, patrik }@lokomo.com |
Copyright is held by the author/owner(s). WWW10, May 1-5, 2001, Hong Kong. ACM 1-58113-348-0/01/0005. CONTENTS ABSTRACT ABSTRACT Categories and Subject Descriptors General Terms Keywords * The work of Micah Beck and Terry Moore was supported by the National Science Foundation Internet Technologies Program under grant # ANI-9980203, the University Corporation for Advanced Internet Development and a grant from the IBM Corporation.
1. INTRODUCTION - TWO VIEWS OF SURROGATES For those who share the goal of creating wide area information systems that are ubiquitous, universally accessible, and media rich, the simplicity of the original Web model represents both a profound strength and a profound liability. It is a strength because it makes it relatively easy for people who have publishable content to set up a Web site and make that material widely available; this ease of use for content publishers is one of the primary reasons that the Web spread with such incredible speed when it was introduced. But this same simplicity also makes the basic approach liable to scalability problems that have likewise been apparent from early on [5]. In the basic Web model a client generates a Hypertext Transfer Protocol (HTTP) request that can be fulfilled at a unique server, and the server's response takes the form of a set of objects delivered in an HTTP response [4]. For simple cases the response to a given request is stable over at least short periods of time, and when it changes, it changes in a predictable manner [7]. Because each request is fulfilled at a single unique server, only one server must be configured to respond to any particular request. Since Web clients are distributed across the globe, however, the more numerous and media hungry they become, the more bandwidth the responses to their requests consume across an increasingly congested Internet backbone. Poor performance for users, in the form of high interaction latencies and slow transfer times, tends to be the result. There are several recent and ongoing commercial efforts to solve this problem by reengineering the Web to create high performance Content Distribution Networks (CDNs). CDNs are based on the use of surrogates, i.e., on the deployment of multiple nodes within the network that can, under the control of the content provider, fulfill the service requests of users in the appropriate manner. According to the working definition currently used in the discussions of IETF's Web Replication and Caching Working Group, a surrogate is A gateway co-located with an origin server, or at a different point in the network, delegated the authority to operate on behalf of, and typically working in close co-operation with, one or more origin servers. Responses are typically delivered from an internal cache [12]. Although this wording weights the idea of a surrogate towards cache-based implementations, surrogates of other forms have been well known and widely used to achieve the same purpose for some time. The full service surrogate model that we propose below draws on this alternative tradition in order to create an approach to CDNs that we believe has novel capabilities and strengths. This "full-service" approach derives from a characteristic analysis of how a Web service, and therefore a service surrogate, is constituted. On this view a Web service node generally consists of a server process running in a conventional operating system environment. The state of this server is defined by two kinds of files: configuration files and stored source objects. When the server process receives a typical request, such as an HTTP GET, it uses the information in the HTTP header to interpret the control information in the configuration files to determine which source objects must be retrieved in order to fulfill the request, what their type is, and how they must be interpreted. In some cases, the request generates a call-out to some other Web service node, and the response generated by that node is relayed back to the client. Note that this description of a Web service node is quite general, encompassing both Web caches and other Web servers for various protocols. Indeed, in our view a Web cache can be characterized as a Web service node whose stored objects are previous responses to Web requests that have been generated by origin servers and captured by the cache. Capturing HTTP responses according to a cache management policy is the most convenient way to implement a surrogate, since it does not require the operation of the origin server to be duplicated. As the widespread use of Web caching and cache-based CDNs suggest, this approach is highly effective where requests are predictable and sufficiently stable over time. Unfortunately, many services implemented in the Web today do not fit the simple form of the caching model. Some services, for example, are implemented dynamically through the execution of a program by the Web server [14]. Important types of dynamic, non-cacheable content services include (1) content generated from a database query, (2) quickly changing content (e.g. live content), and (3) highly interactive interfaces (e.g. those needing an applet). The most common mechanisms for implementing such dynamic services are programs invoked through the Common Gateway Interface (CGI) and Java servelets executed as part of the server process [10, 15]. With either mechanism, however, the service request leads the server to a stored, executable object, and this object is subsequently executed using an interpreter determined by its type. To replicate such non-cacheable services, you have to replicate the server itself, creating an identical copy that can act as a full service surrogate, invoking executable replicas of the appropriate source objects on every request it fulfills. Now the desire to create such a full service approach to content distribution was one of the primary motivations for the Internet2 Distributed Storage Infrastructure (I2-DSI) project. This project attempted to draw on ideas from traditional Internet mirroring in order to implement a general, scalable network of servers for the replication of both static content and dynamic content services across heterogeneous operating environments [2]. In I2-DSI the basic unit of replication is characterized as a channel, i.e. as "… a collection of content which can be transparently delivered to end user communities at a chosen cost/performance point through a flexible, policy-based application of resources." From the beginning this concept of a channel explicitly included the kinds of dynamic content that cache-based approaches have problems addressing. But the idea of mirroring channels with dynamic content faces challenging problems of its own. This fact is evident to anyone who has tried to use a standard mirroring approach to replicate Web servers with executable source objects. It proves too hard to do because, as our analysis above shows, the behavior of such a Web server depends on two critical factors and both of them are problematic when you try to replicate them:
Taken together these two factors mean that trying to create a surrogate by simply copying the configuration files and source objects to the target node only works where the servers are identical. Where they are not identical, the mirroring operation must take into account any heterogeneity in the architecture, operating system, or server software on the target node, and the resulting copy must also be compatible with the other operations the target node is configured to perform. For this reason, porting a sophisticated Web site to a non-identical server node can be a frustrating, time-consuming task, where even substantial amounts of effort cannot always guarantee that the result will be an identical copy of the site. The concept of a Portable Channel Representation (PCR) was developed in the context of the I2-DSI project to attack precisely this problem, and thereby make possible a full service alternative to content distribution mechanisms based on caching technology alone [1]. Because PCR focuses on the problems associated with mirroring of source objects, it draws from the substantial experience in cooperative Internet mirroring that predates the Web [9, 11, 13]. It is also informed by more recent work on active networks from the past decade, the same work that has been incorporated into an Extensible Proxy Framework in order to add dynamic services to cache-based content distribution networks, which are still based only on capture and replay of Web responses [18].
2. CONTENT PORTABILITY AND CHANNEL REPLICATION A useful place to begin the discussion of content portability is with the common distinction between "active" and "static" content. Our view is that in the area of content distribution, the distinction between active (dynamically interpreted) content and static (or passive) data is blurred to the point of being meaningless. One reason for this confusion is that the most common forms of Web programming are declarative, and for that reason are not considered to be forms of programming at all by the author/programmer. But examining a few cases at close range suggests otherwise. The most legitimately "static" content on the Web is a file delivered by FTP; admittedly in this case, the bits stored on the disk are not interpreted at all by the FTP server, but are simply passed over a network link. But at only a slightly higher level of complexity, simple HTML files are in fact interpreted by the HTTP server to generate an HTTP response, even though they are often thought of as static content. The most universal form of this interpretation occurs when the server rewrites the URLs in hyperlinks, or even more ambitiously, when the server processes directives in the HTML and generates text to replace them in the response. Moreover, HTML files are augmented by metadata that determines how they are processed and what the nature of the response generated should be: <META> tags can cause redirection to another URL entirely, among other altered behaviors, and password protection alters the behavior of the server, although not the contents of the delivered file. It is convenient to think of an HTTP response as if it were a simple copy of the HTML file which generates it, as this allows us to conceive of the "static" portion of the World Wide Web as a file delivery mechanism, i.e., a form of communication network. Caching technology can then be thought of as an extension of that network. But the conclusion that we draw from a review of the facts is that almost every form of Web content is in some measure interpreted, and therefore liable to encounter portability issues. For that reason a sounder operating assumption to make is that source files are not passive data but programmed objects that must be interpreted in order to generate the server's response. We believe that if solutions to the Web's scalability problems take the distinction between active and static content for granted and focus on caching technology alone, they will be ill equipped to deal with the Web as it really is, i.e., as a distributed system with chronic programming and portability issues. A few examples from ordinary Web authoring and content management make obvious the relevance of this point of view to questions of portability. While standard HTML processing is usually portable, any server side includes that pages contain are server-dependent and they may make reference to auxiliary files by file name (rather than URL). This will tend to make them non-portable. URLs for non-HTML file types, such as streaming media objects, use metadata files that invoke local auxiliary programs and can make explicit reference to file names. These local metadata files are not, as a rule, portable. Again, many HTTP server features, such as multi-lingual processing and security, are controlled through server configuration files that are not standardized and that typically make reference to local directory names. Finally, CGI programs and servelets commonly make use of local interpreters, files, and other resources through interfaces that are not portable across servers. Once we begin to think of the Web as a distributed system with standard programming and portability issues, then it becomes clear that, as with other such systems, the key to portability is the use of standard languages and application programming interfaces (APIs) which can be interpreted uniformly on a broad class of execution platforms. As things currently stand, the Web falls far short of satisfying this condition. HTML may be a standard language, but it comes with a distressingly broad choice of APIs. These range from standard HTML with no server-side directives, to HTML with a small class of directives supported by many available HTTP servers, and finally to HTML with powerful database access extensions supported only by a small number of HTTP servers. Similarly, in typical HTTP implementations, although servelets have a well-defined language (Java) and API, CGI programs are arbitrary executable files that are invoked by the HTTP server with no notion of their language or API. CGI programs are typically implemented in an interpreted language such as Perl, but a given CGI program may be compiled binary. Interpreted languages, such as Java byte code or Perl, have the benefit of placing an intermediary layer of software between the code and the full power of the operating system and the machine architecture. However, incompatibility between different versions of the same language and the use of powerful, non-portable APIs can eliminate such benefits. As a result, CGI programs may be highly non-portable, and there is no metadata available to determine their portability characteristics. Now there are basically two strategies for achieving portability in the face of this diversity of APIs. One approach says that we must all agree on a single API and use only the features of that API in order to achieve "write once, run anywhere" status. This is what Java promises. The other approach requires only that code port safely, not universally. According to this point of view, it is not necessary for everyone to use a universally implemented language and API, just that the choice be known and that the interpreter be safe even if the API is violated. We term this freer and more open approach descriptive, in contrast to prescriptive, one-language-only approaches to portability. It is worth noting that the one-language-only strategy for content portability was attempted unsuccessfully in the early Java-only Content Distribution products from Marimba in the context of "desktop push." More generally, we believe that the more expansive goal of "write once, run anywhere" cannot be practically achieved in a world where languages, APIs and execution platforms change constantly and the behavior of the developer community is not under centralized control. The approach to portability we have developed, which is based on the Portable Channel Representation (PCR), is an instance of a descriptive, metadata-based strategy that offers more freedom to developers to choose their language and API, but requires them, in return, to provide the CDN with critical information characterizing the portability of the resulting content. While PCR does not promise to make every Web site portable to every platform, once the information characterizing portability is encoded as PCR metadata, management software can check to see whether that content can be safely ported to a given target server.
3. THE PORTABLE CHANNEL REPRESENTATION 3.1 The PCR Data Model
The PCR description and the file store together define a complete channel description that can be correctly implemented on any platform that correctly interprets both elements. PCR's descriptive approach associates with each user request both a source file and a type. The type specifies the language and API of that file and is used to determine the interpreter that will be used to generate a response. Since these types are not reflected in the contents of the source file, but are specified by the PCR metadata external to the file, a single source file may be treated as having different types when accessed through different requests (e.g. using different protocols and servers). Possible types include standard HTML, GIF, 3Mb/s MPEG-1 video, Perl with a standard minimal API (no file or network access), Java bytecode with an SQL database access API. The PCR data model is a language intended for the specification of the behavior of a server. This language does not support arbitrary behaviors (i.e., it is not a general model such as Java), but instead works within a highly structured server behavior model, shown in Figure 1. The central notion in PCRs server behavior model is a request fulfillment rule, which can be thought of as a pair consisting of a pattern and an action. When a request is presented to a server and matches a pattern, the server responds by performing the associated action. In concrete terms,
Every type is associated with a method or interpreter, and the action specified by the rule is to invoke that method on the specified data file. For example, a request associated with a stored file with type "Standard HTML" would be interpreted by a standard HTTP server allowing no non-standard extensions. Every source object type defines syntactic rules for data objects of this type. We refer to these syntactic rules as the language in which the object is expressed. If there is a single language with multiple variants, we refer to these variants as APIs. Thus, HTML is a source object language, but a specific set of allowable server includes defines an API. Perl and Java byte code are also source object languages, and each of them may have multiple APIs. The combination of language and API together define determine the object type. The server must perform a combination of install- and run-time checks to ensure that a particular object conforms to its declared type. The sum of all APIs constituting the channel is referred to as the Channel API. Extending the functionality of PCR requires introducing a new type, defining the interpreter and extending the Channel API to include the new object type. New service types can easily be added as they emerge, thus the PCR approach is highly extensible. The current implementation of PCR (Section 5) supports HTML, streaming media and server-side HTML extensions, but it will be extended to support CGI execution (Perl, Java), database access (read-only) and other APIs. In section 4 below we present the RDF schema for PCR in detail. But to understand the different aspects of that schema, as well as the File Store API that complements it, it is helpful to be familiar with the way in which it used to create CDNs based on full service surrogates. The next few sections describe the different facets of the PCR approach to creating CDNs. 3.2 PCR-based Content Distribution The distribution process can be divided into four subprocesses, sequenced as pictured below (Figure 3). They include
In the distribution of a channel the metadata describing the different aspects of the channel are separated from the source objects (application files) that constitute its substance (Figure 4). This separation is handled by the PCR tools and servers and is transparent to both the content providers and end users. The different software components of this system are shown in Figure 5 below. A PCR-encoding of an Internet application is generated using a Channel Creator or Channel Authoring tool. Channel authoring tools, which can be an extension of current Web authoring tools, create PCR representation of this application and, together with the application files (i.e., the channel source objects), publishes the channel to the Distribution Server. The Distribution Server distributes the channel to a number of Channel Servers. The Channel Server (or actually the Installer of the Channel Server) generates a local representation of the channel, using the PCR-representation. To complete the distribution process, the local representation of the channel at the Channel Server is activated, giving the end-user to local access to the channel with enhanced quality of service. 3.3 Content Authoring for Highly Portable Surrogates
Tools that implement a mixed approach are also possible. Structured authoring tools, which maintain their own internal metadata structures, could also generate PCR as a publication option. Thus a tool like Microsoft FrontPage might become a PCR Channel-authoring tool, much as Microsoft Word has become an HTML authoring tool. The key point in all these cases, however, is that the discovery and encoding of the essential PCR metadata should be as automated as possible, so as to maximize the completeness and accuracy of the encoding and minimize effort for the user. 3.4 PCR Publication PCR introduces a level of indirection between the identity of a source file and the service request that accesses it. A file may exist in the file store, but unless the current PCR view of the channel accesses it, it has no impact on the behavior of the node in serving the channel. The ability to switch instantaneously between PCR views allows us to atomically update a channel. Instantaneous switching between PCR views is possible because the PCR view is a much smaller data structure than the source file, and a node can easily hold more than one. Thus, a PCR file can be delivered and interpreted but held in an inactive state until a synchronization event makes it the active view of the channel. This feature allows simple, secure and seamless updates of channels, be it a complete update of the entire channel or a partial update of near-real-time information. It is even possible to maintain different views of the channel for different sessions, if, for instance, a session ID is encoded in the service request.
4. THE RDF SCHEMA FOR PCR Figure 6: The PCR RDF Schema 4.1 The Channel Schema
One of the basic services offered by Web servers in addition to interpretation of source objects is protection of requests in a variety of protocols. In this simple version of PCR, a single pattern specifies a set of objectnames to protect, and then a set of access rules can be specified for objects matching that pattern.
4.2 Publication Metadata
4.3 Resolution Metadata
4.4 Published Channel
4.5 The PCR FileStore
We have chosen instead to factor the storage of files and their binding to names into a separate facility we call the PCR FileStore. A FileStore is simply a mechanism for storing files and associating them with names that are local to the FileStore. When a FileStore is moved between servers, the binding of names to files does not change. This means that FileStore names can be used in the PCR description file without any loss of portability. The metadata associated with requests that access the files is still implemented in the PCR description. The problem with using the native file system for name-to-file binding becomes more acute when files are accessed during the interpretation of content, be it from a HTTP server-side include or as input to a program written in Perl. In every case, current implementations result in a local file name being used by the interpreted code, and any use of naming which reflects global knowledge of the file system directory structure will be non-portable. The FileStore also solves this problem, as PCR allows for the naming of local files that reside in the FileStore. The combination of PCR and the FileStore can thus implement some services normally provided by the local file system, eliminating dependences on non-portable mechanisms. A FileStore can be as simple as a tar or zip file archive which is copied between servers as a single file transfer, or it can be a directory which is distributed using an efficient differential update mechanism such as rsync or even a proprietary block-level replication mechanism implemented by a mass storage archive [19]. The intent is to allow data movement to be implemented in the most efficient and cost-effective manner independent of the distribution of the PCR description.
4.5.1 The PCR FileStore API Examples of File Store delivery mechanisms are transfer of an archival image, rsync between file systems or block level mirroring of disks. PCR is usually delivered as a stream to a connected socket, although file transfer protocols such as FTP may also be used. 4.5.1.1 Transaction Management
4.5.1.2 FileStore Management
4.5.1.3 Object Management
5. SOFTWARE IMPLEMENTATION OF PCR-BASED FULL SERVICE SURROGATES Swedish based Lokomo Systems has developed a solution to the problem of the end-to-end management Content Distribution Networks in heterogeneous environments using a full service surrogate approach based on PCR. Lokomo's software suite supports the creation and management of Web sites and other content-based services that can be automatically replicated to CDN edge servers acting as full service surrogates. More detailed examples showing how PCR technology makes this approach possible are given below. The Lokomo's CDN software suite (Figure 7) is divided in four components:
Since a key goal of this approach is to allow content providers to focus on a single version of their content, and yet support the local distribution of that content on CDNs managed by different Internet Service Providers, this suite supports a variety of operating platforms, network protocols, and web server types. The Lokomo software suite has been designed to embody all the important characteristics that CDN builders will expect from an adequate CDN software suite, regardless of the approach it takes: the scalability and redundancy to handle a large number of complex nodes under extremely heavy traffic; the flexibility to support various special system configurations and network topologies; requisite portability to manage heterogeneity in the edge server environment; robust extensibility to third party application servers that can provide new types of content and services to end users; and ease of use that supports agile control of the CDN from a single point in the network. Experiences we have already had from two widely different applications illustrate some of the qualities that set the PCR approach apart. In particular Lokomo software has been used to create PCR-based full service surrogates (i.e., channel servers) that support active content in the form of Apache Server Side Includes (SSIs) and Java Servlets. In addition, the use of PCR extends beyond web sites to interactive services. We successfully use the representation for distributing services like streaming media and games. Both these application examples have provided useful experience in supporting active web content with PCR technology. The examples described here illustrate very different requirements on the interpretation of content in the surrogate in two examples:
5.1 Example 1: Distributed Web Hosting with Server Side Includes Web sites located at the ISP Web hotel are translated into PCR by an authoring tool (the Channel Creator) automatically extracts metadata from the installed source files. The language and API used by the content provider in constructing their Web site is standard HTML extended with Lokomo's dialect of Apache Server Side Includes (Lokomo Apache SSIs). SSIs are directives embedded in HTML pages that the server parses and interprets before the page is served to the Web client. Interpretation of a directive generates an HTML fragment that replaces the SSI directive in the page. While there are several different dialects of HTML augmented with different directives implemented by specific Web servers, Apache was chosen due to its popularity. Every Apache SSI directive takes an argument that specifies some value or file from the server's execution environment, and the safety of the directive depends on which directives are allowed and the values allowed for their parameters.
5.2 Example 2: Distributed Media Management using Java Servelets The executable portion of this service is implemented using Java servlets (JSP), which are invoked from directives embedded in standard HTML pages just as they are for SSI. And as in the case of SSI, files names are mapped to the FileStore using PCR bindings. However, the standard Java servelet API includes very powerful features. For example, a servlet can create a socket and connect to another server or access a local read-only SQL database. Such features can cause conflicts on a surrogate shared by many different services and content providers. In this specific example, few restrictions have been applied to the standard API, and the ASP has designed his service carefully, making sure that the servlets will port to his distributed environment. Misuse of this API, could cause the service to fail.
6. CONCLUSION The World Wide Web was created around a simple and highly structured notion of content (HTML) and a standard protocol for delivering it (HTTP). Of the many benefits that could accrue from the adoption of the Web as the a universal fabric for information interchange, some derive from the use of HTML as a uniform content language, including the ability to use a single encoding of content for many diverse purposes and the ability to use a single encoding of content across many different computing platforms. Other benefits accrue from using HTTP as uniform delivery mechanism, including the ability for a single server platform to fulfill requests from diverse clients and the ability to develop networking infrastructure which is adapted to the characteristics of that protocol. As the Web has developed, the growing domination of the latter has meant the progressive diminution of the former. The source of the problem is that, as a language, HTML succumbed to a universal tendency in the development of computing systems: programmers will modify any tool until it becomes a fully general computing environment, with little or no respect for the strong properties intended by the original designer. That is how functional programming languages get augmented with imperative constructs, graphics formats become multimedia scripts, and declarative text markup languages become page layout tools. The Web has been augmented by sources of content, such as CGI scripts, which bear no resemblance to HTML, but which do conform to HTTP and magnify the power of the Web as a delivery mechanism. As a consequence, the Web becomes something amazing: a medium for commerce and entertainment, a competitor for the television and the telephone, a fabric for human interactions of all sorts. In the process, however, many of the strong properties that might have made Web content more manageable have been lost. The move towards the use of XML in the Web is providing a framework for many communities to define highly structured notions of content that are intended to provide manageability, and their intent is to defend those tools against extensions which would violate their fundamental design principles. The Portable Channel Representation is an attempt to define a language that factors out, from the myriad mechanisms (languages and APIs) for generating HTTP responses, enough commonality and structure to allow for automated management of content. If it succeeds it will restore to the management of Web content a property that some people are not even fully aware has been lost - the independence of content from the execution environment of the server, i.e. portability. Standards activity in Web content has focused on the format and interpretation of source objects: HTML, XML, GIF, JPEG, MPEG, etc. These activities have enabled a generation of content authorship and management tools that can accurately preview the behavior of Web browsers, publish entire Web sites into heterogeneous operating environments, and modify and combine Web content that has been developed independently. It has not been possible, however, to achieve the same degree of platform independence for more highly interpreted content due to a lack of standards, and this has limited the degree to which content management can be automated in an interoperable manner. Acceptance of a representation standard for interpreted content generally, such as PCR, would overcome this limitation and enable a much greater degree of automation in content management across heterogeneous platforms.
|