Ok, there are a lot of parameters which can be modified. We try to list them all. Also documented (partly) in Documentation/ip-sysctl.txt.
Some of these settings have different defaults based on whether you answered 'Yes' to 'Configure as router and not host' while compiling your kernel.
Oskar Andreasson also has a page on all these flags and it appears to be better than ours, so also check http://ipsysctl-tutorial.frozentux.net/.
As a generic note, most rate limiting features don't work on loopback, so don't test them locally. The limits are supplied in 'jiffies', and are enforced using the earlier mentioned token bucket filter.
The kernel has an internal clock which runs at 'HZ' ticks (or 'jiffies') per second. On Intel, 'HZ' is mostly 100. So setting a *_rate file to, say 50, would allow for 2 packets per second. The token bucket filter is also configured to allow for a burst of at most 6 packets, if enough tokens have been earned.
Several entries in the following list have been copied from /usr/src/linux/Documentation/networking/ip-sysctl.txt, written by Alexey Kuznetsov <[email protected]> and Andi Kleen <[email protected]>
If the kernel decides that it can't deliver a packet, it will drop it, and send the source of the packet an ICMP notice to this effect.
Don't act on echo packets at all. Please don't set this by default, but if you are used as a relay in a DoS attack, it may be useful.
If you ping the broadcast address of a network, all hosts are supposed to respond. This makes for a dandy denial-of-service tool. Set this to 1 to ignore these broadcast messages.
The rate at which echo replies are sent to any one destination.
Set this to ignore ICMP errors caused by hosts in the network reacting badly to frames sent to what they perceive to be the broadcast address.
A relatively unknown ICMP message, which is sent in response to incorrect packets with broken IP or TCP headers. With this file you can control the rate at which it is sent.
This is the famous cause of the 'Solaris middle star' in traceroutes. Limits the rate of ICMP Time Exceeded messages sent.
Maximum number of listening igmp (multicast) sockets on the host. FIXME: Is this true?
FIXME: Add a little explanation about the inet peer storage? Miximum interval between garbage collection passes. This interval is in effect under low (or absent) memory pressure on the pool. Measured in jiffies.
Minimum interval between garbage collection passes. This interval is in effect under high memory pressure on the pool. Measured in jiffies.
Maximum time-to-live of entries. Unused entries will expire after this period of time if there is no memory pressure on the pool (i.e. when the number of entries in the pool is very small). Measured in jiffies.
Minimum time-to-live of entries. Should be enough to cover fragment time-to-live on the reassembling side. This minimum time-to-live is guaranteed if the pool size is less than inet_peer_threshold. Measured in jiffies.
The approximate size of the INET peer storage. Starting from this threshold entries will be thrown aggressively. This threshold also determines entries' time-to-live and time intervals between garbage collection passes. More entries, less time-to-live, less GC interval.
This file contains the number one if the host received its IP configuration by RARP, BOOTP, DHCP or a similar mechanism. Otherwise it is zero.
Time To Live of packets. Set to a safe 64. Raise it if you have a huge network. Don't do so for fun - routing loops cause much more damage that way. You might even consider lowering it in some circumstances.
You need to set this if you use dial-on-demand with a dynamic interface address. Once your demand interface comes up, any local TCP sockets which haven't seen replies will be rebound to have the right address. This solves the problem that the connection that brings up your interface itself does not work, but the second try does.
If the kernel should attempt to forward packets. Off by default.
Range of local ports for outgoing connections. Actually quite small by default, 1024 to 4999.
Set this if you want to disable Path MTU discovery - a technique to determine the largest Maximum Transfer Unit possible on your path. See also the section on Path MTU discovery in the Cookbook chapter.
Maximum memory used to reassemble IP fragments. When ipfrag_high_thresh bytes of memory is allocated for this purpose, the fragment handler will toss packets until ipfrag_low_thresh is reached.
Set this if you want your applications to be able to bind to an address which doesn't belong to a device on your system. This can be useful when your machine is on a non-permanent (or even dynamic) link, so your services are able to start up and bind to a specific address when your link is down.
Minimum memory used to reassemble IP fragments.
Time in seconds to keep an IP fragment in memory.
A boolean flag controlling the behaviour under lots of incoming connections. When enabled, this causes the kernel to actively send RST packets when a service is overloaded.
Time to hold socket in state FIN-WAIT-2, if it was closed by our side. Peer can be broken and never close its side, or even died unexpectedly. Default value is 60sec. Usual value used in 2.2 was 180 seconds, you may restore it, but remember that if your machine is even underloaded WEB server, you risk to overflow memory with kilotons of dead sockets, FIN-WAIT-2 sockets are less dangerous than FIN-WAIT-1, because they eat maximum 1.5K of memory, but they tend to live longer. Cf. tcp_max_orphans.
How often TCP sends out keepalive messages when keepalive is enabled. Default: 2hours.
How frequent probes are retransmitted, when a probe isn't acknowledged. Default: 75 seconds.
How many keepalive probes TCP will send, until it decides that the connection is broken. Default value: 9. Multiplied with tcp_keepalive_intvl, this gives the time a link can be non-responsive after a keepalive has been sent.
Maximal number of TCP sockets not attached to any user file handle, held by system. If this number is exceeded orphaned connections are reset immediately and warning is printed. This limit exists only to prevent simple DoS attacks, you _must_ not rely on this or lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value, and tune network services to linger and kill such states more aggressively. Let me remind you again: each orphan eats up to 64K of unswappable memory.
How may times to retry before killing TCP connection, closed by our side. Default value 7 corresponds to 50sec-16min depending on RTO. If your machine is a loaded WEB server, you should think about lowering this value, such sockets may consume significant resources. Cf. tcp_max_orphans.
Maximal number of remembered connection requests, which still did not receive an acknowledgment from connecting client. Default value is 1024 for systems with more than 128Mb of memory, and 128 for low memory machines. If server suffers of overload, try to increase this number. Warning! If you make it greater than 1024, it would be better to change TCP_SYNQ_HSIZE in include/net/tcp.h to keep TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog and to recompile kernel.
Maximal number of timewait sockets held by system simultaneously. If this number is exceeded time-wait socket is immediately destroyed and warning is printed. This limit exists only to prevent simple DoS attacks, you _must_ not lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value.
Bug-to-bug compatibility with some broken printers. On retransmit try to send bigger packets to work around bugs in certain TCP stacks.
How many times to retry before deciding that something is wrong and it is necessary to report this suspicion to network layer. Minimal RFC value is 3, it is default, which corresponds to 3sec-8min depending on RTO.
How may times to retry before killing alive TCP connection. RFC 1122 says that the limit should be longer than 100 sec. It is too small number. Default value 15 corresponds to 13-30min depending on RTO.
This boolean enables a fix for 'time-wait assassination hazards in tcp', described in RFC 1337. If enabled, this causes the kernel to drop RST packets for sockets in the time-wait state. Default: 0
Use Selective ACK which can be used to signify that specific packets are missing - therefore helping fast recovery.
Use the Host requirements interpretation of the TCP urg pointer field. Most hosts use the older BSD interpretation, so if you turn this on Linux might not communicate correctly with them. Default: FALSE
Number of SYN packets the kernel will send before giving up on the new connection.
To open the other side of the connection, the kernel sends a SYN with a piggybacked ACK on it, to acknowledge the earlier received SYN. This is part 2 of the threeway handshake. This setting determines the number of SYN+ACK packets sent before the kernel gives up on the connection.
Timestamps are used, amongst other things, to protect against wrapping sequence numbers. A 1 gigabit link might conceivably re-encounter a previous sequence number with an out-of-line value, because it was of a previous generation. The timestamp will let it recognize this 'ancient packet'.
Enable fast recycling TIME-WAIT sockets. Default value is 1. It should not be changed without advice/request of technical experts.
TCP/IP normally allows windows up to 65535 bytes big. For really fast networks, this may not be enough. The window scaling options allows for almost gigabyte windows, which is good for high bandwidth*delay products.
DEV can either stand for a real interface, or for 'all' or 'default'. Default also changes settings for interfaces yet to be created.
If a router decides that you are using it for a wrong purpose (ie, it needs to resend your packet on the same interface), it will send us a ICMP Redirect. This is a slight security risk however, so you may want to turn it off, or use secure redirects.
Not used very much anymore. You used to be able to give a packet a list of IP addresses it should visit on its way. Linux can be made to honor this IP option.
Accept packets with source address 0.b.c.d with destinations not to this host as local ones. It is supposed that a BOOTP relay daemon will catch and forward such packets.
The default is 0, since this feature is not implemented yet (kernel version 2.2.12).
Enable or disable IP forwarding on this interface.
See the section on Reverse Path Filtering.
If we do multicast forwarding on this interface
If you set this to 1, this interface will respond to ARP requests for addresses the kernel has routes to. Can be very useful when building 'ip pseudo bridges'. Do take care that your netmasks are very correct before enabling this! Also be aware that the rp_filter, mentioned elsewhere, also operates on ARP queries!
See the section on Reverse Path Filtering.
Accept ICMP redirect messages only for gateways, listed in default gateway list. Enabled by default.
If we send the above mentioned redirects.
If it is not set the kernel does not assume that different subnets on this device can communicate directly. Default setting is 'yes'.
FIXME: fill this in
Dev can either stand for a real interface, or for 'all' or 'default'. Default also changes settings for interfaces yet to be created.
Maximum for random delay of answers to neighbor solicitation messages in jiffies (1/100 sec). Not yet implemented (Linux does not have anycast support yet).
Determines the number of requests to send to the user level ARP daemon. Use 0 to turn off.
A base value used for computing the random reachable time value as specified in RFC2461.
Delay for the first time probe if the neighbor is reachable. (see gc_stale_time)
Determines how often to check for stale ARP entries. After an ARP entry is stale it will be resolved again (which is useful when an IP address migrates to another machine). When ucast_solicit is greater than 0 it first tries to send an ARP packet directly to the known host When that fails and mcast_solicit is greater than 0, an ARP request is broadcast.
An ARP/neighbor entry is only replaced with a new one if the old is at least locktime old. This prevents ARP cache thrashing.
Maximum number of retries for multicast solicitation.
Maximum time (real time is random [0..proxytime]) before answering to an ARP request for which we have an proxy ARP entry. In some cases, this is used to prevent network flooding.
Maximum queue length of the delayed proxy arp timer. (see proxy_delay).
The time, expressed in jiffies (1/100 sec), between retransmitted Neighbor Solicitation messages. Used for address resolution and to determine if a neighbor is unreachable.
Maximum number of retries for unicast solicitation.
Maximum queue length for a pending arp request - the number of packets which are accepted from other layers while the ARP address is still resolved.
This parameters are used to limit the warning messages written to the kernel log from the routing code. The higher the error_cost factor is, the fewer messages will be written. Error_burst controls when messages will be dropped. The default settings limit warning messages to one every five seconds.
Writing to this file results in a flush of the routing cache.
Values to control the frequency and behavior of the garbage collection algorithm for the routing cache. This can be important for when doing fail over. At least gc_timeout seconds will elapse before Linux will skip to another route because the previous one has died. By default set to 300, you may want to lower it if you want to have a speedy fail over.
Also see this post by Ard van Breemen.
See /proc/sys/net/ipv4/route/gc_elasticity.
See /proc/sys/net/ipv4/route/gc_elasticity.
See /proc/sys/net/ipv4/route/gc_elasticity.
See /proc/sys/net/ipv4/route/gc_elasticity.
Maximum delay for flushing the routing cache.
Maximum size of the routing cache. Old entries will be purged once the cache reached has this size.
FIXME: fill this in
Minimum delay for flushing the routing cache.
FIXME: fill this in
FIXME: fill this in
Factors which determine if more ICMP redirects should be sent to a specific host. No redirects will be sent once the load limit or the maximum number of redirects has been reached.
See /proc/sys/net/ipv4/route/redirect_load.
Timeout for redirects. After this period redirects will be sent again, even if this has been stopped, because the load or number limit has been reached.