Xiangping Chen, Prasant Mohapatra, Huamin Chen
3115 Engineering Building
Department of Computer Science and Engineering
Michigan State University,
East Lansing, MI 48824
Email: {chenxia1, prasant, chenhuam}@cse.msu.edu
Copyright is held by the author/owner.
WWW10, May 2-5, 2001, Hong Kong
ACM 1-58113-348-0/01/0005.
keywords: Admission Control, Bounded Response Time, Internet, PACERS, QoS, Service Differentiating Internet Servers
The response delay of Internet service is determined by two factors: the quality (delay, delay jitter, and loss rate, etc.) of network transmission, and the processing capacity of the server. Study of quality of service (QoS) in network transmission are active in recent years, including efforts of building Integrated Services (IntServ) architecture [3] with end-to-end QoS guarantee, and Differentiated Service (DiffServ) [4] architecture with alternative levels of services provisioning. However, network layer QoS is not sufficient in providing user perceivable performance if the server does not support any service differentiation. A premium class of flow with end-to-end QoS guarantee may still experience service rejection when the server is overloaded. With the increase of resource requirements of web based applications, value added services will emerge with competitive differentiation of service offerings based on profits instead of the best-effort service discipline.
Recent studies on QoS support in web servers have addressed the technology of prioritized processing in web servers [5,6,7,8]. We call this kind of servers as service differentiating Internet servers (SDIS), which complements the network-level QoS efforts. The basic attributes of SDIS include classification of client requests into groups with different service requirements, resource allocations based on the task groups, and prioritized scheduling and task assignments schemes. A detailed study on the concept and performance evaluation of SDIS is reported in [9].
Prioritized scheduling in web servers has been proven effective in providing significantly better delay performance to high priority tasks at relatively low cost to lower priority tasks. However, it is still possible that a significant amount of high priority tasks are dropped under extremely high load situations and user acceptable response delay cannot be ensured. Effective admission control (AC) mechanisms are needed to assure the drop rate and the delay bounds of tasks.
Most contemporary web servers use a rather naive AC scheme, namely tail-dropping AC, in which incoming requests are dropped when the number of tasks awaiting exceeds a predefined threshold. The tail-dropping AC scheme requires careful system capacity planning and works well only in steady workload situations. In a highly variable workload environment, more flexible AC schemes are needed to adapt to the dynamics of traffic. For example, secure transactions on the Web, which is popular in E-commerce, consume much more CPU resources than regular transactions due to encryption/decryption and multiple handshaking overheads. The AC algorithm should be designed to consider the characteristics of a wide variety of requests.
In this study we propose a simple and effective AC algorithm, PACERS (Periodical Admission Control based on Estimation of Request rate and Service time), to provide bounds on response delay for incoming requests under highly variant workload environments. The PACERS algorithm dynamically allocates system resources for each priority group by estimating the request rate of tasks. Admission of a request is decided by comparing the available computation power for a duration equivalent to the predetermined delay bound with the estimated service time of the request. The service time estimation is done on the basis of the request types. A double-queueing organization is used to handle the inaccuracy in estimation while exploiting the spare capacity of the system. Theoretical analysis and experimental study show that the PACERS algorithm bounds the response delay for tasks in different priority groups, assures the service availability to high priority tasks even under high load, and preserves system throughput under various workload.
The rest of the paper is organized as follows. Section 2 answers how to estimate request rate and service times. Section 3 discusses the PACERS algorithm in detail and its implementation issues. Section 4 provides a theoretical proof of the bounded delay provided by the PACERS algorithm. Simulation results and performance evaluation of the algorithm are reported in Sections 5 and 6, respectively. Related works are discussed in section 7. Section 8 concludes the study.
Admission control policies are used extensively to provide congestion control and to enforce desirable performance in computer and communication systems. The workload on web servers are different from that of the network traffic, thus existing network-level AC algorithms might not suit well in web server environments. Apparent self-similarity and long-term dependency are prevalent in the WWW traffic pattern [10,11,12]. Processing of web requests are much more complicated than the handling of network packets. To explore a simple and effective admission control algorithm which fits to the web server environments, we analyze the traces of a busy web server to track trends and characteristics of resource requirements of the requests. The trace data is collected for a week from the web server of the Department of Computer Science and Engineering at Michigan State University, which encountered about a million requests during the period.
Figures 1 and 2 show the request intensity and traffic volume per hour in a week. Distinct seasonal/periodic behaviors according to the time series can be observed from the figures. For example, the access intensity continues to increase from early morning at 10AM, reaches the high intensity area, and stays high till late afternoon, then decreases slowly to the lowest point in a day, which is about 3AM in the next day. Similar traffic pattern repeats around the same time each day, which suggests that both short and long term history should be used as predictors of future workloads.
The fluctuation in traffic intensity can be several orders of magnitudes, which causes the high variances in access rate and bandwidth requirements, thus resulting in high average waiting time of tasks in a system. Care should be taken to eliminate the effects of variances. An interesting phenomenon is that the access pattern of the web server tends to be consistent in a short time window. For example, the coefficient of variation (CoV) of access rates decrease to around 1 when they are measured on an hourly basis. The decrease for CoV is even more obvious during busy hours. Similar observations have been reported in [13,14,15], which suggest that a multi-state Modulated Markov Poisson Process (MMPP) can be used to approximate or predict the burstiness of the aggregate input of the web server, which is discussed in more detail in Section 3.3.
Note that during a week's observation period, there is one overload point which causes request access rate of about 10 times higher than the common peak load of the day. We examined the logs and found that the abnormal overload was caused by a group of careless CGI based requests. Those requests directly caused the web server failing to respond to any other request for 21 minutes. This is one of the examples which justify the need for appropriate AC to prevent system resources being wasted in careless or malicious request attempts.
Table 1 lists the requested web object types and corresponding traffic distribution of the traces. Web object types are categorized on the same basis as reported in [16]. It can be observed that requests for small and static web objects, i.e., HTML and image files still dominate the incoming requests and web traffic. The data show that the CoV of web traffic is as low as 1~ 2 for each type of web objects. However, the CoV of the aggregated web traffic can be as high as 14.41.
There are non-negligible percentage of dynamic objects. Previous studies [17] have estimated that a dynamic object requires 10 to 100 times of service time than a typical static object. The processing time difference was confirmed by recent benchmark tests of popular commercial web servers [18] and the following service time characteristic study. The throughput of serving dynamic tasks are usually lower than serving static tasks by a factor of 10. Continuous media (CM) objects (audio and video clips) are now being used more extensively in the web environments (compared to the previous results in [16]). These data types requires high storage capacity (thus uncacheable), high density I/O processing, and sustained bandwidth with bounded delay constraints. The average CPU time for serving a CM object is also much higher than the average CPU time for serving small static objects. Significant differences in size of different type of web objects, high volume of CM objects, and computation-intensive dynamic requests suggest that the service time variance of the requested web objects should be considered in estimating the server capacity and response time.
Previous studies [19,20,21] suggested that, the service time of retrieving static web objects such as HTML and image files can be divided into two parts: an almost fixed URL processing time, and the content transfer time which increases linearly with the file sizes. To collect the service time attributes, we set up an experimental environment consisting of one Apache (version 1.3.12) web server and three WebStone (version 2.5) clients, connected through a 10 Mb Ethernet. The web server hardware is based on a Pentium II 233MHZ CPU with 128 MB memory. The operating system used in the server was Linux 2.2.9. The clients were installed in Sun UltraSparc 10 with 128 MB memory, running Solaris 2.6. The CPU cycles are used as time scale to obtain a precise time resolution. The performance monitoring counters provided by the Intel P6 family processors are used to record the elapsed busy CPU cycles of the current process.
The service time is defined as the CPU cycles consumed between the time that the server accepts a client request, and the time that the server finishes sending the response. Figure 3 depicts the mean service time of static objects versus requested file size under different maximum number of processes (MNP) in the server. It is well known that the MNP has direct influence in service time distribution, which are discussed in detail in following paragraphs. CPU cycles have been converted into seconds in the figure.
It can be observed from the figure that the curves have two phases. If the files are smaller than 64 Kbytes, the service times with the same MNP are almost constant. If the file sizes exceed 64 Kbytes, the mean service times increase linearly with the file sizes and the MNP value. We call this phenomenon the 64 KB leap. The 64 KB leap is mainly due to the underlying operating system kernel. In Linux socket implementation, a 64KB send buffer limit is imposed for every transmission. Any send request of more than 64KB is fragmented and the first 64KB is processed while the remaining parts wait for the buffer space. The kernel can thus avoid crash from any imprudent code asking for more memory than what the kernel could provide. This limit is tunable via ioctl() function. However, a large send buffer will not necessarily improve the overall performance since smaller transmission may spend more time in the queue waiting for the larger requests to be transmitted. Some other factors also contribute to the 64 KB leap. For example, asynchronized disk I/O is widely used in current UNIX operating systems, and the default size for read-ahead operation is 64 KB. At most 64 KB can be loaded from the hard disk in one I/O operation.
The slopes of service time increase linearly with the number of maximum
processes, because the increase of process number caused higher
probability of page faults, higher context switching, and
synchronization overhead.
Based on the observation from Figure 3,
the service time of a task can be estimated by the file
size KB and MNP value of .
(1) |
where is the service time for small static web objects, i.e., requests for files smaller than 64 KB. is the data transferring time factor, and is the context switching overhead factor. Using linear regression, we get the relative value for , and as: .
Rechecking the file size distributions, we find that less than 1% of HTML and image files are larger than 64 KB. Thus the service time of most HTML and image files are rather uniform. Service times of dynamic objects, mainly CGI requests, depend on the computation complexity of the URL processing instead of response size. Using the CGI scripts test set provided by WebStone 2.5 test set, the average service time of a CGI request is around one order of magnitude higher than the service times of static objects with a file size less than 64 KB. The experimental results indicate that the object type is a good indicator of the requested CPU time, which can be derived from the requested URL. Besides, classification of object types introduces less overhead than retrieving file size information, which requires one fstat() system call in UNIX operating systems.
This section presents how the web server workload characteristics impacts the design of our AC algorithm. The goal of the PACERS algorithm is to provide response delay bounds to incoming requests while preserving the system throughput of busy web servers.
Usually a queue of incoming requests is maintained in a web server awaiting to be processed. Using the tail-dropping AC scheme, incoming requests are put in the queue until the queue is full. Queue length is not always a good indicator of system load, especially when the variance of processing time is high. Without effective AC, the server response time and throughput deteriorate drastically when the aggregate request rate exceeds server capacity, indiscriminately affecting all clients. Abdelzaher and Bhatti [19] reported that as much as half of the web system's processing capacity is wasted on eventually aborted/rejected requests when the load is high.
To accommodate the high variance in request rate and service time of web servers, we propose a simple and adaptive AC algorithm, Periodic AC based on Estimation of Request rate and Service time (PACERS), to ensure the availability and delay performance of prioritized tasks. The predictive strategy estimates the periodic request rate of each priority group, and guarantees the performance of higher priority tasks by restricting admission of lower priority tasks. The request rate estimation is based on the history of access pattern. Service time of each task is estimated based on the request types which is more or less delineated by the size of the responses. Inaccurate estimations are dynamically adjusted by a double-queue architecture as described later.
We first examine a non-prioritized system to have a systematic
understanding of the proposed algorithm.
Assume that the time is divided into discrete slots , ,
..., ... . For simplicity, we use the beginning point of
each time period, , to represent the duration .
Let be the unit processing capacity of the system,
be the predicted system processing capacity in period ,
be the service time needed by the
task at period , and be the number of admitted
tasks in period . If the server has a precise knowledge of service
time needed by each task,
the admission decision can be made based on the
following expression.
If expression 2 is true, then the request is admitted, otherwise it is rejected. The maximum response delay of each task is bounded by the value of if the service time of each task is known prior to the admission decision phase.
In a prioritized system, the periodic system capacity seen by different priority groups changes with the prediction of resource requirements of each priority group. By adjusting the assigned system capacity to each priority group, the PACERS algorithm provides service quality assurance to prioritized tasks. There are two kinds of service disciplines that can be provided by the PACERS algorithm: prioritized resource allocation (PRA) and weighted fair allocation (WFA). PRA is implemented by assigning resources equal to the whole system capacity to the highest priority tasks (or premium tasks). Lower priority tasks get the remaining resources. The WFA is realized by setting shares of system resources in each priority group, where each priority group get at most/least their shares. In this study we only discuss the PRA control scheme. The WFA scheme can be easily extended from the PRA scheme by setting a minimum or maximum resource ratio for each priority group.
The objective of the server is to ensure QoS to high priority tasks whenever their arrival rate is lower than the system capacity. Thus, for high priority tasks, the available resources are equal to the system period capacity. For lower priority tasks, the available resources are the system capacity minus the predicted resource requirements of higher priority requests during a time period. Since the priority ratio and distribution of the types of incoming tasks vary over time, dynamic assignment of system capacity is needed to preserve the system throughput.
Assume all the requests are classified and are assigned a
priority
, wherein denotes the
highest priority. Let
be the
predicted inter-arrival rates of tasks for each priority group. The system
capacity available to priority group at time
, denoted as , is:
A task with priority is admitted if the service time is equal or less than available capacity .
The resources allocated to each priority group is based on the
prediction of the request rate of incoming prioritized tasks.
Apparent seasonal workload patterns corresponding to daily cycles
discussed in the previous section can be used to predict current
traffic intensity based on the workload history.
On the other hand, reports in [14,15] suggested
that the aggregate web traffic tends to smooth out as Poisson traffic
in short observation time windows. This fact
was further proved by Morris and Lin in [13].
Based on the above published results, we decide to use
Markov-Modulated Poisson Process (MMPP) described in
[22] to capture the seasonal behavior of the aggregate
workload of web servers, while preserving the tractability of modulated
Poisson process.
The request arrival process is described as a Markov
process with state space
. State has arrivals with Poisson process at rate .
To follow the seasonal behavior, especially the day/night cyclic behavior
of the web server load, the observed traffic data is chopped into
subsets for each hour on a daily basis.
The Markov transition matrix
for each hour can be easily calculated by quantizing the inter-arrival rate
in each observation period, and
calculating the frequency at which is
switched from state to state .
The predicted access rate
can be expressed by the following equation:
(4) |
where is the state vector of measured inter-arrival rate in the previous period. can be further adjusted by comparing the differences between predicted data and measured data, to catch up with the long term trends of a web server load. In the experiment, we use three state ( ) transition matrices to capture the traffic trends in each observation period. is the probability of increment request rate, the probability of the same request rate, and the probability of decrement request rate. The value for increment and decrement is 10% of measured value. The experiment shows that there is not much difference in the capability of capturing the traffic trends in each observation period between a three state transition matrix and more complicated state transition matrices.
While deciding to accept a request, the system should ensure that the sum of service time of accepted tasks do not exceed the system capacity. In reality, it is not possible to know the service time in advance. High variance in resource requirement is a widely recognized characteristic of web server workload. As indicated in the previous section, however, the service time and bandwidth requirement of the same type of requests are more or less consistent. Service time of web objects can be thus estimated based on the distribution of the types of requests to the server. We approximate the service time of each task by using weighted computation quantums (CQ) matching the CPU processing time of different type of tasks.
When a new request arrives, the system checks if
there are enough CQ available to grant to the request.
Only requests that are granted enough CQ can be enqueued and served eventually.
The number of CQ issued to each task is determined
by the resource requirement of the request.
Here we denote as the number of types of requests to a web
server, which can be derived from the workload characterization
of a web server.
Let be the number of requests of type in period ,
be the weighted CQ matching the CPU processing time for
type tasks. Then Equations (2) and (3) can be
approximated as:
As in the case of Equations (2) and (3), a request is admitted if Equations (5) and (6) hold true, otherwise is rejected.
Since Equations (5) and (6) are approximations of Equations (2) and (3), care needs to be taken to amortize the accumulated delay influence of over-admission during a time period. Assume that a restricted prioritized processing is enforced inside a queue, i.e., no lower priority tasks get served if there is a higher priority task waiting, incoming requests are queued in the order from high priorities to low priorities. Requests in the same priority are queued in FCFS order. When over-admission happens, it is possible that low priority tasks stay in the queue for a long time awaiting services while high priority tasks get dropped due to lack of queue space. On the other hand, under-admission wastes system capacity. A double-queue structure is used to handle the over/under admission problems. A primary queue is used as the incoming task queue, and a secondary queue is added for the backed up requests (we call this as the backup queue). The system structure is shown in Figure 4.
An incoming request is first sent to the AC manager ACM. The ACM classifies the request priority and decides if the request can be enqueued. Enqueued requests wait to be served in the primary queue Qp. At the beginning of the next period, unfinished requests are sent to the backup queue Qb. When Qb becomes full, it is cleared up and queued tasks are dropped. Other methods for expunging tasks from Qb can be explored. The task scheduler TS picks up requests in the queues and sends to the server pool. No request in the Qb can be picked up unless the Qp is empty. Replies are sent back through the server network interface Nb. By using a double queue structure, newly accepted requests need not wait for a long time for service, thus bounded delay is achieved for most of the requests.
The PACERS algorithm provides delay bounds and average waiting time assurance as well as throughput regulation. In this section, we analyze the delay bounds and the waiting time for the requests.
When the short term arrival rate is faster than the processing rate, newly arrived requests have to wait in the queue for service. If the variations of job arrival rate and processing rate are high, more queue space is needed, and the average queueing time and response delay tend to be high. If enough buffer space is not available, denial of service occurs when the queue is full.
One way to increase the response time predictability of a system is to limit the number of requests in a given ``short time'' period to no more than what the system can handle in that period of time. Borowsky [23] provided a theoretical proof that the service time required by pending jobs is no more than in a FCFS work-conserving system, if the sum of service time required by the arriving jobs at any duration is bounded by time . Thus the response time (or delay) of all requests can be bounded by by restricting the workload arriving in every interval of length time.
In a prioritized environment where each request is assigned a specific priority, the processing order and resource allocation are based on the priority of each request. The FCFS processing assumption in the above theorem is no longer valid. In [7], we proved that the mean response time of a request is bounded by the arrival rate with equal or higher priority and the service rate of the system, if requests with different priorities have the same service requirement distribution and a strict priority scheduling is used.
Let denote the priority of the requests and be the total number of priorities. Let be the service time requirement of task belonging to priority , and be the number of requests of priority arriving during the period .
Lemma : In a prioritized preemptive work-conserving environment with no pending work, if requests arrive belonging to a single priority level , then their response time is bounded by if (proved in [23]).
Theorem : If the priority levels are in the increasing order of , the response time of a task with priority is bounded by . If , the response time of any task is bounded by .
Proof : Let be the response time of the ith task in the priority class during period , and be the amount of time in that the system is busy during period . Let the request time series be partitioned into periods of with length .
For , without loss of generality, let all requests arriving during (0, T) be reshuffled to be served in the decreasing order of (for ), then , and .
Assume the above expressions hold true in the period ,
reshuffle the requests in queue in the
decreasing order of during period .
Let be the sum of the service times for
the th priority tasks, then
(7) |
Clearly,
. By assumption,
, so
Hence, by induction and from Equation (8), we get , i.e., the response delay boundary of each prioritized task can be achieved by controlling the requested service time in each period of length . As long as the sum of service times requested by high priority tasks does not exceed , the system response time can be assured by restricting the number of admitted low priority tasks.
The granularity of effects the system performance. Increasing the value of smoothes out the access variance between adjacent periods and allows more requests to be admitted. However, larger value of also increases the variance of response time and degrades the system responsiveness. Both user perceivable delay and system throughput should be considered in determining an appropriate value of .
The service time of
a web object can be estimated from statistics of the same or similar
type of web object with acceptable variance.
The number of requests allowed by the system can be derived from
the expected waiting time and the job arrival rate.
Let denote the expected waiting time of tasks with
priority at time , be the expected arrival rate
of tasks with priority at time . Let denote the expected
waiting time of a task in the system at time , be the
expected arrival rate of tasks of the system at time .
In the period , the expected number of tasks
in the system equals:
(9) |
According to Little's Law [24],
the waiting times of each priority group equals:
On the other hand, the waiting time of a task in the th
priority group equals the residual life of the executing task
plus the sum of service times of equal or higher priority tasks
in the queue, and the sum of service times of higher priority
tasks arriving while it waits in the queue. The mean service time of
group in period is represented as . The waiting time
is thus equal to :
Let
.
By combining results of Equations (10) and (11)
we can get,
is the unit processing capacity of the system, its inverse value is the worst case expected residual life of an executing task, which happens when the utilization factor approaches to 1. Equation (12) shows the mean waiting time of prioritized tasks by using the estimation of the inter-arrival rate of equal or higher priority tasks.
Similarly, we can get the expected task inter-arrival rate
and acceptable number of tasks in each period
based on the expected waiting time. The result is shown as:
Note that the above analysis are based on the periodic data collection assumption. In fact, all the data used in the above equations can be easily obtained from a few counters which are reset periodically. The workload of each priority level is estimated at the end of each period. Expected number of tasks is determined based on the expected inter-arrival rate. During one period, if the number of incoming tasks of one priority level exceeds the expected number, then there is no need to accept new requests in the same priority level until the beginning of next period.
In this study, we simulate an event driven server model using empirical workload from real trace files. The reference locality, size and object type distribution are extracted from the logs of the Computer Science departmental web server at Michigan State University and are used for the workload generation. We only consider two priority levels to simplify the study since the primary concern is to examine the effectiveness of the PACERS algorithm. The system configuration is shown in Table 2.
Parameter | value |
Priority Level | 2 |
Scheduling Period | 1 sec. |
Static Obj. Throughput | 1000 req./sec |
Dynamic Obj. Throughput | 100 req./sec |
Network Bandwidth | 100 Mbps |
Disk Bandwidth | 100 Mbps |
Caching hit ratio | 0.7 |
Maximum open connections | 1000 |
Maximum queue length | 1000 |
Maximum server process number | 30 |
Response delay bound | 1 sec. |
Three performance metrics are used in the simulation: server throughput, mean response time, and response delay bound miss rate of each priority group. System throughput indicates the server capability. Mean response time and delay bound miss rate quantify the service qualities in each priority group. The delay bound miss rate presents the proportion of tasks whose response time exceed the bounded delay.
Based on the data reported in Section 2, web objects are classified into 4 types: static, dynamic, audio and video objects. The CQs consumed by each object types and their input percentage are listed in Table 3. The CQs allocated to each object type is based on the results in Section 2.3.
Our experiments compare the performance of the PACERS algorithm to that of a simple admission control (SAC) algorithm, which is analogous to the leaky bucket [25] algorithm used in traffic engineering in network transmissions. The SAC algorithm produces CQ at a constant rate matching the server throughput; each admitted task consumes one CQ. Incoming tasks are dropped if there is no CQ available. However, estimation of the request rates is not implemented in the SAC algorithm. The SAC scheme outperforms the tail-drop scheme, since it smoothes out the web server traffic and also provides preliminary overload protection.
In the experiment, the ratio of high priority to low priority tasks is 1 to 1, and both types of tasks are randomly distributed in the whole arrival sequence. The web server trace is used to generate request at various rate and the performance is recorded from the simulated server environment.
Figures 5 and 6 plot the throughout of stress test of two priority groups. Normalized load in the x-axis is the ratio of real load to the system capacity, and the normalized throughput is ratio of the throughput to the system capacity. In the stress test, the aggregate request rate increases continuously till 2.5 times of the system capacity. The objective of this experiment is to explore the capability of the algorithm in preserving the system throughput and delay performance under extremely high load by the stress test.
Figure 5 shows the throughput variation using the SAC algorithm. It can be observed that the throughput of each priority group is proportional to the traffic percentage of each group. High priority tasks suffer from the same low throughput as low priority tasks when the aggregate workload is high. Predefined QoS profile strictly prioritized admission and scheduling, are not ensured.
Figure 6 shows the throughput of each priority group when the PACERS scheme is deployed. The system stops serving low priority tasks when the high priority load approaches about 80% of system capacity. On the other hand, the throughput of high priority tasks equals the request rate of high priority group until the system utilization reaches 0.8, which is the highest throughput of the system using the traces as input workload. The throughput of high priority tasks remain at the 80% level when the high priority tasks overload the system. It can also be observed that the aggregate throughput of the two AC schemes remains at the same level, which confirms that the PACERS scheme does not degrade the system aggregate throughput.
The average delay for each priority group is also well controlled in the PACERS scheme. As shown in Figures 7 and 8, the average delay of each priority group is much lower than the predefined delay bounds of 1 second.
= 2.2 in |
= 2.2 in |
Figure 7 shows the delay and throughput variation of low priority tasks using the two AC algorithms. Using SAC algorithm, the throughput of low priority tasks is proportional to its traffic ratio. However, extremely long delays are introduced when prioritized processing is used. On the contrary, the PACERS algorithm blocks admission of low priority tasks during high load periods in exchange of low delay bounds miss ratio. The prediction and CQ assignment behavior of the PACERS scheme avoids unnecessary waiting of low priority tasks, thus decrease the system overload during high load periods.
Figure 8 shows the high priority task delay and throughput performance of the two AC algorithms. Using SAC algorithm, high priority task throughput is proportional to the ratio of high priority traffic to the total traffic volume, although the high priority task traffic is low. Delays of high priority tasks are kept low because prioritized processing is used. On the contrary, the PACERS algorithm preserves throughput performance of high priority tasks during high load periods at the cost of increased average response delay, but still within the delay bounds.
We test the performance of the PACERS algorithm under fluctuating load situation. The result shows that the PACERS algorithm is very sensitive in detecting variation of system loads, thus preserve throughput and delay performance in different priority groups under varying loads. Figures 9, 10 and 11 plot throughput and delay variation under fluctuating workload. The experiment is aimed at examining the sensitivity of the PACERS algorithm to the variation in the workload. The workload can be classified as sustained lightload or overload, and occasional lightload or overload. The occasional overload duration is 2 time units, and the sustained overload duration is 10 time units. Each time unit is 1000 times of the observation period.
= 2.2 in |
= 2.2 in |
The maximum resource requirements of high priority requests equal the system capacity. It can be observed from Figure 9 that the PACERS scheme is very sensitive to the load fluctuation, which blocks the low priority tasks under high load situation to preserve the throughput of high priority tasks, and resume service of low priority tasks during medium to low load situation. Figure 10 shows the delay variation of the two priority groups. The delay of high priority tasks follow the trends of incoming high priority workload, but the maximum value is only about one fifth of the delay bounds. The delay of low priority tasks is also well controlled under the delay bound. Figure 11 shows the delay bounds miss ratio of the two priority groups. High priority tasks experience zero delay bounds miss ratio under various load situations; delay bound miss ratio of low priority tasks occasionally exceeds zero.
When a server is unable to provide satisfactory service to all requests, selective resource allocation is a promising technique to assure service to requests which are more important to clients or servers. In this study, we present a novel and effective admission control algorithm, PACERS, to adapt to the highly variant access patterns and processing of web server environments. Tasks are admitted based on the estimation of periodical behavior of prioritized task groups and service times. Theoretical proof and simulation results demonstrate that the PACERS algorithm is able to provide assurance of response time and throughput performance for each priority group under various workload situations. The algorithm can be expanded for web server session control if the session/connection establishment requests are assigned lower priority to protect existing sessions in the server. Estimation of the average response time of the server system can be used to determine the appropriate queue space, thus can be used for capacity planning.
This document was generated using the LaTeX2HTML translator Version 99.2beta6 (1.42)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.