Next Previous Contents

9. Analyzing Squid's access.log

9.1 Cache Result

The cache result code appears in the fourth column of Squid's native access.log format. It indicates how the request was handled by Squid.

The following codes, beginning with TCP_, are for requests to the HTTP port (3128).

TCP_HIT

A valid copy of the requested object is in the cache. Squid does not forward the request.

TCP_MISS

The requested object is not in the cache.

TCP_REFRESH_HIT

The object is in the cache, but stale. Squid has forwarded an If-Modified-Since request and received a Not Modified response.

TCP_REF_FAIL_HIT

The object is in the cache, but stale. Squid has forwarded an If-Modified-Since request but it failed (e.g. connection timeout), so Squid sends the old (stale) object to the client.

TCP_REFRESH_MISS

The object is in the cache, but stale. Squid has forwarded an If-Modified-Since request and received a response with the new content.

TCP_CLIENT_REFRESH

The client issued a request with the no-cache pragma, so Squid forwards the request.

TCP_IMS_HIT

The client issued an If-Modified-Since request and the object is in the cache, and still fresh. Squid does not forward the request.

TCP_IMS_MISS

The client issued an If-Modified-Since request for a stale object. Squid forwards the request as a miss.

TCP_SWAPFAIL

The object is believed to be in the cache, but could not be accessed, so Squid forwards the request.

TCP_DENIED

Access is denied for this request.

The following codes, beginning with UDP_, are for requests to the ICP port (3130).

UDP_HIT

A fresh copy of the requested object is in the cache.

UDP_HIT_OBJ

Same as UDP_HIT, but the object data is small enough to also be sent in the UDP reply packet.

UDP_MISS

The requested object is either stale or not in the cache at all.

UDP_DENIED

Access is denied for this request.

UDP_INVALID

An invalid request is received.

UDP_MISS_NOFETCH

The replying cache recommends that this request NOT be made at this time. Squid uses this reply code when reloading metadata at startup, and when the failure ratio exceeds its threshold.

There are also numerous error codes, which begin with ERR_, for requests to the HTTP port. We will not discuss these codes here.

9.2 HTTP Status

Every HTTP response includes a status code on the first line. This three-digit numeric code also appears in the fourth column of Squid's native access.log. Section 6.1.1 of the HTTP/1.1 RFC includes a full list of these codes. We are primarily interested in only two of them.

200 OK

This, the most common status code, indicates a successful request.

304 Not Modified

This means the request included an If-Modified-Since header with a timestamp, and the resource has not been modified since the given time.

9.3 Hierarchy Code

The ninth column of Squid's native access.log is a code that indicates how the next-hop cache was selected.

DIRECT

Squid forwards the request directly to the origin server.

FIREWALL_IP_DIRECT

Squid forwards the request directly to the origin server because the origin server's IP address is inside your firewall.

FIRST_PARENT_MISS

Squid forwards the request to the parent cache with the fastest weighted round trip time.

FIRST_UP_PARENT

Squid forwards the request to the first available parent in your list.

LOCAL_IP_DIRECT

Squid forwards the request directly to the origin server because the origin server's IP address matched your local_ip list.

SIBLING_HIT

Squid forwards the request to a sibling cache that sent us a UDP_HIT reply.

NO_DIRECT_FAIL

Squid cannot forward the request because of firewall restrictions and no parent caches are available.

PARENT_HIT

Squid forwards the request to a parent cache that sent us a UDP_HIT reply.

SINGLE_PARENT

The request is forwarded to the only parent cache appropriate for this request. Also requires that single_parent_bypass be enabled.

SOURCE_FASTEST

Squid forwards the request to the origin server because its source_ping reply arrived first.

PARENT_UDP_HIT_OBJ

Squid received the object in a UDP_HIT_OBJ reply from a parent cache.

SIBLING_UDP_HIT_OBJ

Squid received the object in a UDP_HIT_OBJ reply from a sibling cache.

DEFAULT_PARENT

Squid forwarded the request to a default parent, without sending any ICP queries first.

ROUNDROBIN_PARENT

Squid forwarded the request to the round-robin parent with the lowest use count, without sending any ICP queries first.

CLOSEST_PARENT_MISS

Squid forwarded the request to the the parent whose ICP_MISS reply has the lowest measured RTT to the origin server. This only appears with query_icmp enabled in the configuration file.

CLOSEST_DIRECT

Squid forwarded the request directly to the origin server because Squid measured a lower RTT to the origin than any of its parent caches.

NONE

Squid does not forward the request at all.

Note, when the two second timeout occurs waiting for ICP replies, the word TIMEOUT_ is prepended to the hierarchy code.

9.4 Hierarchy Categorization

We find it useful to categorize request forwarding through the hierarchy in one of five ways. For the NLANR caches, we examine the access.log fields for some of the codes mentioned in the previous sections. This analysis is somewhat specific to our cache configuration. For example, we do not use the default option, and therefore do not need to look for DEFAULT_PARENT requests. Much of our information comes from utilizing ICP.

  1. Local cache hits are requests satisfied from the local cache, which in our analysis includes anything with `HIT' in the cache result code, even TCP_REFRESH_HIT and TCP_IMS_HIT.
  2. Remote cache hits are requests forwarded to a neighbour cache, and which are cache hits there. The only way we can know this statistic is through ICP usage, by counting the number of requests forwarded due to an ICP_HIT reply.
  3. Remote cache misses are requests forwarded to a neighbour cache, and which are cache misses there. Again, we know this statistic through ICP, by counting the number of requests forwarded due to an ICP_MISS reply.
  4. Direct retrievals are requests forwarded directly to origin servers. It is likely that such requests are either 1) non-hierarchical, 2) because no ICP replies were received, or 3) forwarded to an origin server that is closer to the local cache than any of the cache's neighbors are.
  5. Other requests do not fall into the above categories, such as requests denied by the access controls.
We are also interested in the number of requests suffering the two second neighbour_timeout.

We use the following simple Perl script to parse Squid's native access.log and output the percentage of each category:

        #!/usr/local/bin/perl
        while (<>) {
                chop;
                @F = split;
                $L = $F[3];                     # local cache result code
                $H = $F[8];                     # hierarchy code
                next unless ($L =~ /TCP_/);     # skip UDP and errors
                $N++;
                $timeout++ if ($H =~ /TIMEOUT/);
                if ($L =~ /HIT/) {
                        $local_hit++;
                } elsif ($H =~ /HIT/) {
                        $remote_hit++;
                } elsif ($H =~ /MISS/) {
                        $remote_miss++;
                } elsif ($H =~ /DIRECT/) {
                        $direct++;
                } else {
                        $other++;
                }
        }
        printf "TCP-REQUESTS %d\n", $N;
        printf "TIMEOUT %f\n", 100*$timeout/$N;
        printf "LOCAL-HIT %f\n", 100*$local_hit/$N;
        printf "REMOTE-HIT %f\n", 100*$remote_hit/$N;
        printf "REMOTE-MISS %f\n", 100*$remote_miss/$N;
        printf "DIRECT %f\n", 100*$direct/$N;
        printf "OTHER %f\n", 100*$other/$N;
This script may or may not be appropriate to your particular situation, but should help you get started developing one of your own.

9.5 The NLANR Caches

Next we present a case study analysis of the NLANR cache hierarchy. Presently there are six cache servers in our top-level configuration, distributed across the U.S., with two on the Eastern coast, two in the Central region, and two on the West coast. Five of these caches communicate with each other via the vBNS, an unsaturated nationwide OC3/OC12 network with roughly 40 millisecond round-trip times between sites. The other cache, `SV', is located at FIX-West, a popular exchange point in Silicon Valley, California.

The five vBNS caches use multicast to send ICP queries to one another for almost every request. The `SV' cache does not send ICP to the other five because of the congested, commodity Internet paths between them. However, `SV' and all of the other caches do have parent relationships with caches located in other countries.

The NLANR caches are all configured as mutual parents. That is, every cache is a parent for every other. Normally this would result in forwarding loops, but we eliminate them with the cache_host_acl directive:

        cache_host nlanr.mcast.ircache.net multicast 3128 3130 ttl=4
        acl nlanr_peers src it.cache.nlanr.net
        acl nlanr_peers src pb.cache.nlanr.net
        acl nlanr_peers src uc.cache.nlanr.net
        acl nlanr_peers src bo.cache.nlanr.net
        acl nlanr_peers src sd.cache.nlanr.net
        cache_host_acl nlanr.mcast.ircache.net !nlanr_peers
which lets us query the multicast group, unless the request is from one of our own caches.

Also, for the five vBNS caches, we use the proxy-only option to effect a large, distributed cache. We also use the ICMP measurements to route requests to the closest neighbour.

Note that one of the machines, `SD', is currently out of operation due to hardware failure, so we are unable to present its statistics here.

Hierarchy Analysis By Category

Here we present graphs of the hierarchy category values for the NLANR caches, plotted over a 30-day time period. These are figures 2--6.

The percentage of local hits hovers in the 20-30% range. Note these are hits by request count, not by byte volume. The occasional dips are due to times when the entire cache was lost, or perhaps going through a slow reload after a crash. The interesting peak at 40% for `SV' is due to an erroneous change in its configuration file. We inadvertently increased its cache size when we should not have.

The percentage of remote hits is comparatively low, but not unexpectedly. The value normally ranges from 0-10%, but occasionally up to 15%. Notice that `IT' is usually near zero, except for a couple of peaks. This is most likely due to multicast connectivity problems at that site. This looks like multicast connectivity was usually broken, but actually working for a small number of days. We will see this same pattern for `IT' in almost all of the graphs.

There is much more range in the percent of requests which are remote misses. Note that `SV' is down in the 1% range because it does not peer with the other NLANR caches (but it does peer with other non-NLANR caches). The `BO' cache usually forwards 35-40% of its requests to one of the other caches. Since the NLANR caches utilize the ICMP measurements, this likely means that `BO' generally measures larger RTTs to the origin servers. At the other extreme, `UC' forwards relatively few requests. Presumably, then, `UC' is rather well-connected. Again notice the peaks for `IT' due to changing multicast connectivity.

There is a lot of range in the percent of requests forwarded directly to origin servers. The `SV' cache is near the top with 75%, as is `IT' when it becomes cut off from the others. Whereas `BO' is the highest for remote misses, it is the lowest for direct retrievals.

The percent of requests that timeout while waiting for ICP replies is very important to us because timeouts will adversely affect latency experienced by end users. Ideally we should see 0-1% timeouts. However, on a handful of days, it reaches as high as 10%. One interesting note is that even when `IT' has lost multicast connectivity, it does not also have a large amount of timeouts, an indication that our technique for counting multicast peers works well. Indeed, most timeouts arise from unstable network conditions (either multicast or unicast). If the network is stable, then Squid will adapt to its neighbours being up or down. However, if things are unstable, then Squid too will be unstable and not able to accurately predict the number of ICP replies it should expect.

Hierarchy Analysis By Cache

In figures 7--11 we show the same information, except this time grouped by cache instead of by category. The categories are shown cumulatively in each graph: (1) local hits, (2) remote hits, (3) remote misses, (4) direct retrievals, and (5) other. For each graph, the final sum should lie at exactly 100%. Note that the plots for direct and other coincide exactly because the other category is very near zero.

This figure indicates strong spikes when `IT' is able to reach its neighbours. At the other times, we can see there are relatively few remote hit and remote miss requests; the few present are likely due to non-NLANR peer caches.

`PB' shows a nice distribution of hierarchy categories. Approximately half of all the requests `PB' forwards directly.

Here again it is apparent that `UC' forwards many requests directly to origin servers, and relatively few to other caches. The dip near the end of July is due a temporary disk failure which caused the entire cache contents to be lost.

The `BO' cache also shows a nice distribution of categories. Because this cache is less-well-connected than some of the others, it ends up forwarding many requests to its neighbours.

This graph clearly shows that `SV' handles almost all of its requests as cache hits, or forwards directly to origin servers. Recall that `SV' does not send ICP queries to the other NLANR caches. The increase in cache hits in early August was due to a misconfiguration error. The subsequent dip is due to a slow reload.

During this 30-day period, three of the caches were upgraded from 256 to 512 MB of RAM, and from 8 to 16 Gbytes of disk allocated for caching. Can you spot any corresponding changes in these graphs? Although difficult to detect, there are slight increases (roughly 5%) in the local hit rates for the three caches which were upgraded (`IT', `UC', and `BO'). This highlights another interesting thing about caching; doubling your capacity of a given resource does not necessarily double your hit rates!

9.6 Checking on your Siblings

If you are a sibling for another cache, you can use a simple command to search for anomalous requests. You should expect that a request from a sibling would result in a cache hit for you. If the IP address of the other cache is 10.0.0.1, then this command will locate any questionable requests:

        awk '$2 == "10.0.0.1" && $4 ~ /MISS/' access.log

A few such requests likely does not merit concern; ICP and HTTP do not match up perfectly in all situations, so some of these requests may slip by. However, if you see a large number of anomalous access.log entries, the other cache could be using you as a parent instead of a sibling.


Next Previous Contents