Gentoo Linux rsync Mirrors Policy
1. Hardware Request
Machine Donation
Gentoo Linux relies upon two different kinds of mirrors: main rotation mirrors
and community mirrors. Main rotation mirrors are dedicated rsync servers and
are responsible for handling the bulk of our rsync traffic. All main rotation
mirrors run Gentoo Linux and are managed by members of the Gentoo development
team. Community mirrors are servers which are provided and managed by members
of the community. These servers may or may not be dedicated to rsync usage and
they may or may not run Gentoo Linux.
At this time, we have enough community mirrors and are actively seeking
additional main rotation mirrors. Specifications for main rotation servers
include:
- Minimum of a 2GHz Pentium 4 processor (or equivalent)
- Minimum of 1GB RAM (1.5GB - 2GB is ideal)
- 10GB of disk space (IDE is fine)
These servers may be donated with bandwidth and colocation space if you have
them. Otherwise, we can provide these services and you can simply ship the
machine to our colocation facility. Average bandwidth consumption for a main
rotation mirror is currently ~7Mbps. As the number of main rotation mirrors
increase, this number should decrease accordingly.
If you would like to donate your machine, please email Jeffrey Forman with the pertinent information.
2. Short FAQ (provided as a reference for current mirror admins)
Q: Who should I contact regarding rsync issues and maintenance?
A: Visit http://bugs.gentoo.org and fill out a bug on the product "Rsync".
Q: I run a private rsync mirror for my company. Can I still access rsync1.us.gentoo.org?
A: Because our resources are limited, we need to ensure we allocate them in
such a way to provide the maximum amount of benefit to our users. As such, we
limit connections to our master rsync and distfile mirrors to public mirrors
only. Users are welcome to use our regular mirror system to establish a private
rsync mirror, though they are asked to follow certain basic
rsync etiquette guidelines.
Q: Is it important that I sync my mirror twice an hour?
A: Yes it is important. You do not need to perform the syncs at exactly :00 and :30
but the syncs should take place between the following two windows:
- :00 and :10
- :30 and :40
Additionally, please make sure that your syncs are exactly 30 minutes apart. So, if
you schedule the first sync of each hour for :08, please schedule the second sync of
the hour for :38.
Q: How do I find the mirror nearest to me?
A: netselect was designed to do this for you. If you haven't already run
emerge netselect then do it. Then run: netselect rsync.gentoo.org.
After a minute or so netselect will print an IP address. Take this address and
use it as the only parameter for rsync with two colons appended to it. eg:
rsync 1.2.3.4::. You should be able to find out which mirror that is
from the banner message. Update your /etc/make.conf accordingly.
Q: Can I use compression when syncing against rsync1.us.gentoo.org?
A: No. Compression utilizes too many resources on the server, so we have
forcibly disabled it on rsync1.us.gentoo.org. Please do not
attempt to use compression when syncing against this server.
Q: I'm seeing a lot of old and probably dead rsync processes, how can I get rid of that?
A: Please see the Example Scripts section.
Q: There are many users who connect to my rsync server very frequently,
sometimes even causing a DoS to my mirror, is there any way to prevent this?
A: Again please see the Example Scripts section.
3. Example Scripts
Note:
You will find sample configuration and script files in the gentoo-rsync-mirror
package. Just do emerge gentoo-rsync-mirror
|
Right now, mirroring our Portage tree requires around 250Mb, so it isn't space
intensive; having at least 500Mb free should allow for growing room. Setting
up a Portage tree mirror is simple -- first, ensure that your mirror has rsync
installed. Then, set up your rsyncd.conf file to look something like this:
Code Listing 3.1: rsyncd.conf |
uid = nobody
gid = nobody
use chroot = yes
max connections = 15
pid file = /var/run/rsyncd.pid
motd file = /etc/rsync/rsyncd.motd
log file = /var/log/rsync.log
transfer logging = yes
log format = %t %a %m %f %b
syslog facility = local3
timeout = 300
[gentoo-x86-portage]
#this entry is for compatibility
path = /space/gentoo/rsync
comment = Gentoo Linux Portage tree
[gentoo-portage]
#modern versions of portage use this entry
path = /gentoo/rsync
comment = Gentoo Linux Portage tree mirror
exclude = distfiles
|
Above, the gentoo-x86-portage mirror points to the same data as gentoo-portage.
Although we have recently changed the official name of our mirror to
gentoo-portage, gentoo-x86-portage is still needed for backwards compatibility,
so include both entries.
For security reasons, the use of a chrooted environment is required!
Now, you need to mirror the Gentoo Linux Portage tree. You should use the
following script to do so:
Code Listing 3.2: rsync-gentoo-portage.sh |
#!/bin/bash
RSYNC="/usr/bin/rsync"
OPTS="--quiet --recursive --links --perms --times --devices --delete --timeout=300"
#Uncomment the following line only if you have been granted access to rsync1.us.gentoo.org
#SRC="rsync://rsync1.us.gentoo.org/gentoo-portage"
#If you are waiting for access to our master mirror, select one of our mirrors to mirror from:
SRC="rsync://rsync2.de.gentoo.org/gentoo-portage"
DST="/space/gentoo/rsync/"
echo "Started update at" `date` >> $0.log 2>&1
logger -t rsync "re-rsyncing the gentoo-portage tree"
${RSYNC} ${OPTS} ${SRC} ${DST} >> $0.log 2>&1
echo "End: "`date` >> $0.log 2>&1
|
Code Listing 3.3: /etc/init.d/rsyncd |
#!/sbin/runscript
# Copyright 1999-2004 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: /var/cvsroot/gentoo-x86/net-misc/rsync/files/rsyncd.init.d,v 1.2 2004/05/02 22:45:02 mholzer Exp $
depend() {
need net
}
# FYI: --sparce seems to cause problems.
RSYNCOPTS="--daemon --safe-links --timeout=300"
start() {
ebegin "Starting rsync daemon"
start-stop-daemon --start --quiet --pidfile /var/run/rsyncd.pid --nicelevel 15 --exec /usr/bin/rsync -- ${RSYNCOPTS}
eend $?
}
stop() {
ebegin "Stopping rsync daemon"
start-stop-daemon --stop --quiet --pidfile /var/run/rsyncd.pid
eend $?
}
|
Your rsyncd.motd should contain your IP address and other relevant information
about your mirror, such as information about the host providing the Portage
mirror and an administrative contact. After you have been approved as an
official rsync mirror your host will be aliased with a name of the form:
rsync[num].[country code].gentoo.org
This command will help you killing old rsync processes that sometimes lies
around due to connection problems. It's important to kill those because they
count as valid connections for the 'max connections' option. You may run this
command via crontab every hour, it will search and kill rsync processes older
than one hour.
Code Listing 3.4: Kill old rsync processes |
/bin/kill -9 `/bin/ps --no-headers -Crsync -o etime,user,pid,command|/bin/grep nobody | \
/bin/grep "[0-9]\{2\}:[0-9]\{2\}:" |/bin/awk '{print $3}'`
|
In some cases, there are a few inconsiderate users who abuse the rsync mirror
system by syncing more than 1-2 times per day. In the most extreme cases,
users schedule cron jobs to sync every 15 minutes or so. This often leads to a
Denial of Service attack by continually occupying an rsync slot that could have
otherwise gone to another user. To try and prevent this, you may use the
following perl
script which will scan through your rsync log files, pick out IP
addresses that have already connected more than N times that day and
dynamically create a rsyncd.conf file, including the offending IP addresses in
the 'hosts deny' directive. The following line controls what N equals:
Code Listing 3.5: Define maximum number of connections per IP |
@badhosts=grep {$hash{$_}>4} keys %hash;
|
If you use this script, please remember to rotate your rsync log files daily
and modify the script to match the location of your rsyncd.conf file. This
script is tested on Gentoo Linux, but should work suitably on other arches that
support both rsync and perl.
4. Setting up your own local rsync mirror
Introduction
Many users run Gentoo on several machines and need to run emerge --sync
on all of them. Using public mirrors is simply a waste of bandwidth at both
ends. Syncing only one machine against a public mirror and all others against
that computer would save resources on Gentoo mirrors and save users' bandwidth.
All you need to do is select which of your machines is going to be your own local
rsync mirror and set it up. You should choose a computer that can handle the
CPU and disk load that an rsync operation require. Your local mirror also needs
to be available whenever any of your other computers syncs its portage tree.
Besides, it should have a static IP address or a name that always resolve to
your server. Configuring a DHCP and/or a DNS server is beyond the scope of
this guide.
Setting up the server
There is no extra package to install as the required software is already on
your computer. Setting up your own local rsync mirror is just a matter of
configuring the rsyncd daemon to make your /usr/portage
directory available for syncing. Create the following
/etc/rsync/rsyncd.conf configuration file:
Code Listing 4.1: Sample /etc/rsync/rsyncd.conf |
pid file = /var/run/rsyncd.pid
max connections = 5
use chroot = yes
uid = nobody
gid = nobody
hosts allow = 192.168.0.1 192.168.0.2 192.168.1.0/24
hosts deny = *
[portage]
path=/usr/portage
comment=Gentoo Portage
exclude=distfiles/ packages/
|
You do not have to use the hosts allow and hosts deny options. By
default, all clients will be allowed to connect. The order in which you write
the options is not relevant. The server will always check the hosts
allow option first and grant the connection if the connecting host matches
any of the listed patterns. The server will then check the hosts deny
option and refuse the connection if any match is found. Any host that does not
match anything will be granted a connection. Please read the man page (man
rsyncd.conf) for more information.
Now, start your rsync daemon with the following command as the root user:
Code Listing 4.2: Starting the rsync daemon |
# /etc/init.d/rsyncd start
# rc-update add rsyncd default
|
Let's test your rsync mirror. You do not need to try from another machine but
it would be a good idea to do so. If your server is not known by name from all
your computers, you can use its IP address instead.
Code Listing 4.3: Testing your mirror |
# rsync 192.168.0.1::
portage Gentoo Portage
# rsync your_server_name::portage
|
Your rsync mirror is now set up. Keep running emerge --sync as you have
done so far to keep your server up-to-date.
Note:
Please note that most public mirror administrators consider syncing more than
once or twice a day as an abuse.
|
Configuring your clients
Now, make your other computers use your own local rsync mirror instead of a
public one. Edit your /etc/make.conf and make the SYNC
variable point to your server.
Code Listing 4.4: Define SYNC in /etc/make.conf |
SYNC="rsync://192.168.0.1/portage"
SYNC="rsync://your_server_name/portage"
|
You can check that your computer has been properly set up and sync against your
own local mirror for the first time:
Code Listing 4.5: Checking and syncing |
# emerge --info|grep SYNC
SYNC="rsync://your_server_name/portage"
# emerge --sync
|
That's it! All your computers will now use your local rsync mirror whenever you
run emerge --sync.
The contents of this document are licensed under the Creative Commons - Attribution / Share Alike license.
|