Gentoo Logo
Gentoo Logo Side

Gentoo Linux rsync Mirrors Policy

Content:

1. Hardware Request

Machine Donation 

Gentoo Linux relies upon two different kinds of mirrors: main rotation mirrors and community mirrors. Main rotation mirrors are dedicated rsync servers and are responsible for handling the bulk of our rsync traffic. All main rotation mirrors run Gentoo Linux and are managed by members of the Gentoo development team. Community mirrors are servers which are provided and managed by members of the community. These servers may or may not be dedicated to rsync usage and they may or may not run Gentoo Linux.

At this time, we have enough community mirrors and are actively seeking additional main rotation mirrors. Specifications for main rotation servers include:

  • Minimum of a 2GHz Pentium 4 processor (or equivalent)
  • Minimum of 1GB RAM (1.5GB - 2GB is ideal)
  • 10GB of disk space (IDE is fine)

These servers may be donated with bandwidth and colocation space if you have them. Otherwise, we can provide these services and you can simply ship the machine to our colocation facility. Average bandwidth consumption for a main rotation mirror is currently ~7Mbps. As the number of main rotation mirrors increase, this number should decrease accordingly.

If you would like to donate your machine, please email Jeffrey Forman with the pertinent information.

2. Short FAQ (provided as a reference for current mirror admins)

Q: Who should I contact regarding rsync issues and maintenance? 

A: Visit http://bugs.gentoo.org and fill out a bug on the product "Rsync".

Q: I run a private rsync mirror for my company. Can I still access rsync1.us.gentoo.org? 

A: Because our resources are limited, we need to ensure we allocate them in such a way to provide the maximum amount of benefit to our users. As such, we limit connections to our master rsync and distfile mirrors to public mirrors only. Users are welcome to use our regular mirror system to establish a private rsync mirror, though they are asked to follow certain basic rsync etiquette guidelines.

Q: Is it important that I sync my mirror twice an hour? 

A: Yes it is important. You do not need to perform the syncs at exactly :00 and :30 but the syncs should take place between the following two windows:

  1. :00 and :10
  2. :30 and :40

Additionally, please make sure that your syncs are exactly 30 minutes apart. So, if you schedule the first sync of each hour for :08, please schedule the second sync of the hour for :38.

Q: How do I find the mirror nearest to me? 

A: netselect was designed to do this for you. If you haven't already run emerge netselect then do it. Then run: netselect rsync.gentoo.org. After a minute or so netselect will print an IP address. Take this address and use it as the only parameter for rsync with two colons appended to it. eg: rsync 1.2.3.4::. You should be able to find out which mirror that is from the banner message. Update your /etc/make.conf accordingly.

Q: Can I use compression when syncing against rsync1.us.gentoo.org? 

A: No. Compression utilizes too many resources on the server, so we have forcibly disabled it on rsync1.us.gentoo.org. Please do not attempt to use compression when syncing against this server.

Q: I'm seeing a lot of old and probably dead rsync processes, how can I get rid of that? 

A: Please see the Example Scripts section.

Q: There are many users who connect to my rsync server very frequently, sometimes even causing a DoS to my mirror, is there any way to prevent this? 

A: Again please see the Example Scripts section.

3. Example Scripts

Note: You will find sample configuration and script files in the gentoo-rsync-mirror package. Just do emerge gentoo-rsync-mirror

Right now, mirroring our Portage tree requires around 250Mb, so it isn't space intensive; having at least 500Mb free should allow for growing room. Setting up a Portage tree mirror is simple -- first, ensure that your mirror has rsync installed. Then, set up your rsyncd.conf file to look something like this:

Code Listing 3.1: rsyncd.conf

uid = nobody
gid = nobody
use chroot = yes
max connections = 15
pid file = /var/run/rsyncd.pid
motd file = /etc/rsync/rsyncd.motd
log file = /var/log/rsync.log
transfer logging = yes
log format = %t %a %m %f %b
syslog facility = local3
timeout = 300

[gentoo-x86-portage]
#this entry is for compatibility
path = /space/gentoo/rsync
comment = Gentoo Linux Portage tree

[gentoo-portage]
#modern versions of portage use this entry
path = /gentoo/rsync
comment = Gentoo Linux Portage tree mirror
exclude = distfiles

Above, the gentoo-x86-portage mirror points to the same data as gentoo-portage. Although we have recently changed the official name of our mirror to gentoo-portage, gentoo-x86-portage is still needed for backwards compatibility, so include both entries.

For security reasons, the use of a chrooted environment is required!

Now, you need to mirror the Gentoo Linux Portage tree. You should use the following script to do so:

Code Listing 3.2: rsync-gentoo-portage.sh

#!/bin/bash

RSYNC="/usr/bin/rsync"
OPTS="--quiet --recursive --links --perms --times --devices --delete --timeout=300"
#Uncomment the following line only if you have been granted access to rsync1.us.gentoo.org
#SRC="rsync://rsync1.us.gentoo.org/gentoo-portage"
#If you are waiting for access to our master mirror, select one of our mirrors to mirror from:
SRC="rsync://rsync2.de.gentoo.org/gentoo-portage"
DST="/space/gentoo/rsync/"

echo "Started update at" `date` >> $0.log 2>&1
logger -t rsync "re-rsyncing the gentoo-portage tree"
${RSYNC} ${OPTS} ${SRC} ${DST} >> $0.log 2>&1

echo "End: "`date` >> $0.log 2>&1 

Code Listing 3.3: /etc/init.d/rsyncd

#!/sbin/runscript
# Copyright 1999-2004 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: /var/cvsroot/gentoo-x86/net-misc/rsync/files/rsyncd.init.d,v 1.2 2004/05/02 22:45:02 mholzer Exp $

depend() {
need net
}

# FYI: --sparce seems to cause problems.
RSYNCOPTS="--daemon --safe-links --timeout=300"

start() {
ebegin "Starting rsync daemon"
start-stop-daemon --start --quiet --pidfile /var/run/rsyncd.pid --nicelevel 15 --exec /usr/bin/rsync -- ${RSYNCOPTS}
eend $?
}

stop() {
ebegin "Stopping rsync daemon"
start-stop-daemon --stop --quiet --pidfile /var/run/rsyncd.pid
eend $?
} 

Your rsyncd.motd should contain your IP address and other relevant information about your mirror, such as information about the host providing the Portage mirror and an administrative contact. After you have been approved as an official rsync mirror your host will be aliased with a name of the form: rsync[num].[country code].gentoo.org

This command will help you killing old rsync processes that sometimes lies around due to connection problems. It's important to kill those because they count as valid connections for the 'max connections' option. You may run this command via crontab every hour, it will search and kill rsync processes older than one hour.

Code Listing 3.4: Kill old rsync processes

/bin/kill -9 `/bin/ps --no-headers -Crsync -o etime,user,pid,command|/bin/grep nobody | \
             /bin/grep "[0-9]\{2\}:[0-9]\{2\}:" |/bin/awk '{print $3}'` 

In some cases, there are a few inconsiderate users who abuse the rsync mirror system by syncing more than 1-2 times per day. In the most extreme cases, users schedule cron jobs to sync every 15 minutes or so. This often leads to a Denial of Service attack by continually occupying an rsync slot that could have otherwise gone to another user. To try and prevent this, you may use the following perl script which will scan through your rsync log files, pick out IP addresses that have already connected more than N times that day and dynamically create a rsyncd.conf file, including the offending IP addresses in the 'hosts deny' directive. The following line controls what N equals:

Code Listing 3.5: Define maximum number of connections per IP

@badhosts=grep {$hash{$_}>4} keys %hash;

If you use this script, please remember to rotate your rsync log files daily and modify the script to match the location of your rsyncd.conf file. This script is tested on Gentoo Linux, but should work suitably on other arches that support both rsync and perl.

4. Setting up your own local rsync mirror

Introduction 

Many users run Gentoo on several machines and need to run emerge --sync on all of them. Using public mirrors is simply a waste of bandwidth at both ends. Syncing only one machine against a public mirror and all others against that computer would save resources on Gentoo mirrors and save users' bandwidth.

All you need to do is select which of your machines is going to be your own local rsync mirror and set it up. You should choose a computer that can handle the CPU and disk load that an rsync operation require. Your local mirror also needs to be available whenever any of your other computers syncs its portage tree. Besides, it should have a static IP address or a name that always resolve to your server. Configuring a DHCP and/or a DNS server is beyond the scope of this guide.

Setting up the server 

There is no extra package to install as the required software is already on your computer. Setting up your own local rsync mirror is just a matter of configuring the rsyncd daemon to make your /usr/portage directory available for syncing. Create the following /etc/rsync/rsyncd.conf configuration file:

Code Listing 4.1: Sample /etc/rsync/rsyncd.conf

pid file = /var/run/rsyncd.pid
max connections = 5
use chroot = yes
uid = nobody
gid = nobody
# Optional: restrict access to your Gentoo boxes
hosts allow = 192.168.0.1 192.168.0.2 192.168.1.0/24
hosts deny  = *

[portage]
path=/usr/portage
comment=Gentoo Portage
exclude=distfiles/ packages/

You do not have to use the hosts allow and hosts deny options. By default, all clients will be allowed to connect. The order in which you write the options is not relevant. The server will always check the hosts allow option first and grant the connection if the connecting host matches any of the listed patterns. The server will then check the hosts deny option and refuse the connection if any match is found. Any host that does not match anything will be granted a connection. Please read the man page (man rsyncd.conf) for more information.

Now, start your rsync daemon with the following command as the root user:

Code Listing 4.2: Starting the rsync daemon

(Start the daemon now)
# /etc/init.d/rsyncd start
(Add the daemon to your default runlevel)
# rc-update add rsyncd default

Let's test your rsync mirror. You do not need to try from another machine but it would be a good idea to do so. If your server is not known by name from all your computers, you can use its IP address instead.

Code Listing 4.3: Testing your mirror

(You may use the server name or its IP)
# rsync 192.168.0.1::
portage         Gentoo Portage
# rsync your_server_name::portage
(You should see the content of /usr/portage on your mirror)

Your rsync mirror is now set up. Keep running emerge --sync as you have done so far to keep your server up-to-date.

Note: Please note that most public mirror administrators consider syncing more than once or twice a day as an abuse.

Configuring your clients 

Now, make your other computers use your own local rsync mirror instead of a public one. Edit your /etc/make.conf and make the SYNC variable point to your server.

Code Listing 4.4: Define SYNC in /etc/make.conf

(Use your server IP addess)
SYNC="rsync://192.168.0.1/portage"
(Or use your server name)
SYNC="rsync://your_server_name/portage"

You can check that your computer has been properly set up and sync against your own local mirror for the first time:

Code Listing 4.5: Checking and syncing

(Check that the SYNC variable has been setup)
# emerge --info|grep SYNC
SYNC="rsync://your_server_name/portage"
(Sync against your local mirror)
# emerge --sync

That's it! All your computers will now use your local rsync mirror whenever you run emerge --sync.


The contents of this document are licensed under the Creative Commons - Attribution / Share Alike license.
Print
line
Updated October 2, 2004
line
Gentoo Mirror Administrators
Author

Xavier Neys
Editor

line
Summary:  This document explains how to set up a official rsync mirror and your own local mirror.
line
The Gentoo Linux Store
line
Copyright 2001-2004 Gentoo Foundation, Inc. Questions, Comments, Corrections? Email www@gentoo.org.