This page has not been updated since release 1.0alpha3. Even though the
ideas haven't changed that much, some class have been renamed as
part of an API cleanup (check the
Release
Notes). A new architecture overview is under construction
here.
Jigsaw Architecture
Jigsaw is made of two distinct modules, linked through a set of
interfaces:
-
The daemon module deals with the HTTP protocol:
it handles new incomming connections, create new client objects, decode
requests, and send replies.
-
The resource module is some representation
of your information space. It is responsible for generating reply objects
out of the incoming request objects.
As these two modules are linked through a set of Java interface specifications,
you can replace each of them independantly of the other, provided they
implement adequatly the interfaces.
The daemon module
As a server administrator, you probably won't have to deal much with this
part of
Jigsaw, although it might be a good idea to read this section
(or at least the one on terminology), just to get a filling of what's happening
behind the scene. This section will goes through a bit of
terminology,
it will then step through the
life-time
of a connection, and introduce the resource module.
Terminology
The protocol module deals with a number of objects. To get you into this
world, we will start by describing the most important.
-
httpd
-
This is the object whose main
method will actually run the server. It has two purposes: the first one
is to run the accepting thread, i.e. the thread that will loop waiting
for new incomming connections to come by. The second purpose of this object
is to manage the set of other objects, responsible for handling part of
the server behavior. Among them, there is the logger
(responsible for loggin requests), the authentication
realm manager (responsible for the list of authentication realms defined
attached to the server), the client pool (responsible
for handling accepted connections), the root resource
of the server, and last but not least, the resource
store manager. We will describe all these objects more precisely in
the comming sections.
-
logger
-
Each time a request processing terminates (be it with success or not),
the server will call back the logger so that it can keep track of all handled
requests. The current version of Jigsaw comes with a simple logger,
compliant with the Common
Log file format (ie it will emit a one line record for each processed
request).
-
realm
manager
-
The realm manager keeps track of all the authentication realms defined
by the server. Each created authentication realm is assigned a symbolic
name, that the web admin will use to refer to it when configuring the server.
This name will also be used as the HTTP realm name, so it should
be uniq within the server scope. The sample implementation of this object
manages a persistent catalog of realms, that can be edited through a special
tool called JigAdmin (see Jigsaw configuration
manual).
-
client
pool
-
The client pool object is responsible for handling new incomming connections.
It should make its best effort to guess what protocol the other end wants
to speak on this connection, and create an appropriate client object, to
handle it. The current sample implementation will always assume that new
connections are for speaking the HTTP/1.0 protocol (with the addition of
persistent connections). The other role of the client pool is to optimize
as much as possible thread creations. Thread creation can be a costly process,
wo its worth trying to avoid it as much as possible. The sample implementation
will maintain a ready-to-run set of client objects, so that it won't re-create
them from scratch upon each new connections.
-
root
resource
-
The root resource is the object that will link the protocol module
to the resource module. This object should
implement the appropriate interface (right now, it should be an instance
of the ContainerResource,
but this is likely to change in the very near future).
-
resource
store manager
-
As you will see in the next section, Jigsaw serves each file or
directory by wrapping it into some Resource
instance. As their number might become fearly large, the server will keep
track of the one that haven't been accessed for a while, and unload them
from memory. The resource store manager is responsible for this piece of
the server behavior: it keeps track of all the loaded resources, and unload
them when it thinks appropriate.
Given these definitions, we can now explain how the server handles new
incomming connections.
Life-time of a connection
The life time of a connection can be divided into the following steps:
-
The accepting thread is notified of it.
-
It gets handled by the client pool object
-
A thread starts waiting for incoming requests
-
The request is handed out to the resource module, for actual processing
-
The resource module generated reply is emited
-
The server logger is called back to log the request
The first stage in processing a new connection, is to hand it out as quickly
as possible to the client pool (so that the accepting thread can return
as fast as possible to the
accept system call). The client pool
then look for an idle
Client
object, if one is found, it is
bind
to the accepted connection, which makes it run its main loop. If no client
is available, if we have reached the
maximum
allowed number of connections the new connection gets rejected
(by closing it), otherwise a fresh client is created and bound to the connection.
By the end of stage 2, the client pool has either rejected the connection,
got a new client to handle it, or created a fresh client for this connection.
At stage 3, the client object is bound to the connection, and awaken to
actually process it. The client thread enters the client main loop.
The client main loop starts by getting any available request. When such
a request has been read from the network, it is handed out to the resource
module. This latest module is responsible for generating an appropriate
Reply object, which
is then emited (stage 5) by the client thread, back to the browser. Finally
the server logger is invoked with the request, its reply and the number
of bytes sent back to the browser.
At the end of this request processing, the client object tests to see
if it can keep the connection alive. If so, it loops back to stage 2, otherwise,
the client notify the client pool that it has become idle. The client pool,
in turn, decides if this client object should be spared for future use
or not.
The Resource module
The resource module is the one that manages the information space. In
Jigsaw
each exported object is mapped to an
HTTPResource
instance, which is created at configuration time, either manually or by
the
resource factory.
We will describe here
what are resources and then
sketch the way
Jigsaw looks up the specific
target resource of a request. We finally present the
filter
concept.
Resources
Resources are full Java objects, defined by two characteristics:
-
Their Java class define their behavior (how they implement the HTTP
methods)
-
Their state is described through a set of attributes
The
AttributeRegistry
keeps track of all the attributes of each classes. As instance variables,
attributes are inherited along the normal sub-class relationship. Each
resource attribute is described by some instance of the
Attribute
class (or some of its sub-class). This description is made of
-
the name of the attribute,
-
some flags indicating if the attribute is mandatory, editable, computed
or whatever
-
the methods to pickle (dump) and unpickle (restore) this attribute values
-
a method to check if some Object instance can be used as a value for this
attribute.
Given this description,
Jigsaw is able to make resources persistent,
just by dumping their class-name, and all their attribute values. Unpickling
(i.e. restoring) a resource is just creating an empty instance of its class,
and filling its attributes with their saved values.
Resource
instances are the basic resources. The HTTPResource
class is the basic class for resources that are accessible through the
HTTP protocol. Instances of this class define a number
of attributes along with the method that implements the HTTP methods
(e.g. GET, POST, PUT, etc. which are mapped resp. to the get,
post
put,
etc methods of the class). These methods are all trigered through the perform
method of the class, whose role is to dispatch a request to the
appropriate handler.
Remember in the previous section, we said that request were handed out
the the resource module. The perform method of HTTPResource are
the one that get called by the daemon module once the target resource of
the request has been looked up. The next section explains how this lookup
is performed.
Looking up resources
In the paragraph about the
life-time
of a connection, we mention that at stage 4 the parsed request was
handed out to the resource module. The first thing the resource module
does when it receives a request, is to look up the target resource. This
paragraph explains briefly how this is performed.
Jigsaw defines a special subclass of the HTTPResource
class, the ContainerResource,
whose role is to implement the look up strategy for the sub-space it controls.
All servers (i.e. all instances of the httpd
class) keeps a pointer to their root resource. This root resource
must be a container resource: it must implement the lookup
method. Given a request, this method must return a suitable target resource
for processing it. However, there is no constraints on how the lookup
is performed. We will briefly sketch how directory resources implement
their lookup method.
The directory resource's lookup
method starts by checking that it has an up-to-date list of children. What
is meant by up-to-date here might not be what you expect: Jigsaw
caching strategy can make this notion quiet complex. Anyway, once the directory
resource thinks its list of children is up-to-date, it looks up the first
component of the URL in its children set. For example, if the URL is /foo/bar,
it starts by looking up foo in itself. This can lead to three
cases, depending on the result of this:
-
The directory resource doesn't have a child resource named foo.
In this case, it throws an exception to signal a not found error.
-
The directory resource has a child named foo. As the looked up
URL contains more components, the directory resource check that the found
resource is a container resource. If this is not the case, then a not
found error is signaled by throwing an appropriate exception. Otherwise,
the looked up component is removed from the look up state, and the directory
resource calls up the found child resource's lookup method.
This look up process is just one example of how the look-up operation can
be implemented. It has several advantages in the specific case of handling
directory resources, but other situations may require other algorithms.
One important property of the directory resource's lookup algorithm is
that it is able to
delegate sub-space naming to the resource
that actually handles the sub-space.
Resource filters
We have briefly described
Jigsaw resource module. The last thing
you need to understand is
Jigsaw's concept of resource
filters.
You might have been surprised that until now, we haven't said a word on
authentication. In
Jigsaw authentication is implemented as a special
resource filter. Resource filters are a special kind of resource (i.e.
they are persistent, and can define any kind of attributes), that are attached
to some
target resource. Filter instances are called back twice
during request processing:
-
At lookup stage, the filter's ingoingFilter
method is called. It is provided with the request whose URL we are looking
up.
-
After the target resource has generated its reply, the filter's outgoingFilter
method gets (optionally - see below) called, with both the original request
and the reply has parameters.
For a resource to support filters, its class must be a subclass of the
FilteredResource
class. Most resource classes provided with
Jigsaw distribution are
sub-classes of it.
Back to authentication now. As we said above, authentication is handled
by a special
filter, whose ingoingFilter method tries to authenticate the request.
If this succeeds, normal processing of the request continues: it is performed
by its target resource, and the corresponding reply is emited back to the
browser. In the case of the authentication filter, as all the work is done
only in the ingoing way (while the target resource is being looked up),
there is no need to have the outgoingFilter method called. A filter ingoingFilter
method can return a special value DontCallOutgoing
to indicate that it has performed all its job, in such cases, the server
won't spend time invoking its empty outgoingFilter method. More
return codes are available, see the api
documentation for the ResourceFilter to get into the details.
Further reading
The best way to continue your
Jigsaw tour now, is to
install
it, and to read the following tutorials:
-
Jigsaw administration guide is
the reference guide for Jigsaw configuration.
-
The configuration tutorials
will go through the configuration process, explaining you the basics of
Jigsaw configuration.
-
Once you are familiar with the configuration process, you might want to
know how to define new frame classes. The frames
tutorial walk through a full example of this.
-
You might want to know also how to define a new resource classe. The resources
tutorial walk through a full example of this.
-
Finally, you may want to learn more about resource filters, by reading
the filter tutorial, which
explains how to write new resource filters.