Squid: A User's Guide
Prev	Chapter 9. Transparent Caching	Next

The Transparent Caching Process

Let's look at what happens when you use transparency. First, though, you need to know something of what happens to IP packets at the ethernet level.

Some Routing Basics

An ethernet IP packet contains four addresses:

The destination mac address. When a packet is transmitted down the ethernet wire, all ethernet cards on the network will check the destination mac address value. Each ethernet has a (supposedly) unique mac address. If the ethernet card's mac address matches the destination mac address of the packet, the ethernet card will pass the packet to the operating system, which will then deal with the contents of the packet.
The source mac address: set by the sending ethernet card
The destination IP address: set by the application sending the packet.
The source IP address: set by the operating system of the source host (or, in some circumstances, the application on the source machine.) This value is not changed by routers along the way, routers re-forward the contents of the packet intact, and change only the destination mac addresses. If the source address was changed by each router, the routers would have to keep state of all the connections passing through it. This way, it can simply forward packets and forget about them.

When a host wants to communicate with a machine that isn't on the local network, it uses a smart router to find the path to that network. When the client wants to send a packet through a router, the client sets the destination mac address of the packet to the router's interface, and sets the IP destination address to the required end host. It's important to know that the destination IP address of the packet isn't set to the router's IP address, only the mac address is changed. When a router accepts a packet, it decides which host to forward it to, based on it's routing tables. The router then sets the destination mac address of the packet to the next-hop router's ethernet address, and sends the packet to that machine. The remote host then repeats this process: if it's the destination machine, it uses the packet, but if it's another router, it will try and move the packet closer to it's final destination.

Packet Flow with Transparent Caches

Transparent caches essentially look out for TCP connections destined for port 80. The cache server will intercept these packets, convert them to a standard TCP stream and pass them to Squid. When Squid sends reply data to the client, the Operating System fakes the source address of the packets, so that the client believes it is connected to the server that it originally sent the request to.

You can't simply plug a transparent cache into the network and get it to transparently cache pages. The cache server needs to be in a position where it can fake the reply packets (without the real server interrupting the conversation and confusing things.) The server needs to be the gateway to the outside world.

Let's look at the simplest transparent cache setup. The client machine (10.0.0.50) treats the cache server's internal (10.0.0.1) interface as it's default gateway. This way, all packets arrive on the cache server before they reach the rest of the Internet. The filter looks for port 80 packets, and passes them to Squid, but allows all other packets to be passed to the routing layer, which passes the packets to the router's IP (172.31.0.2).

Once the connection is established, Squid needs to communicate with the client. Squid doesn't do any strange packet assembly: that's left to the transparency layer. When Squid sends reply data to the client, the kernel automatically changes the packet's from address, so it appears to the client that the server is just routing the requests from the outside world. When Squid connects to the remote server, however, the connect comes from the external interface of the cache server (172.31.0.1, in the example.) This is where IP-authentication breaks: since the request is coming from the cache (rather than the client's real address (10.0.0.50).

Effectively, we need to get four things right to get transparency right:

Correct network layout
Filtering out the appropriate packets
Kernel Transparency: redirecting port 80 connections to Squid
Squid settings. Squid needs to know that it's supposed to act in transparent mode.

Prev	Home	Next
Transparent Caching	Up	Network Layout