Version Control: Git, Mercurial

distributed version control: version and help | local and remote repository | working directory | track and commit | branch and merge | history | push and pulll | configuration files | external repositories | packaging | integrity check and garbage collection

version control: svn | cvs | rcs

archive and patch tools: diff | cpio | diff3 | ar | tar | patch | zip | diffstat | jar | rsync | colordiff

version and help
git hg
show version $ git version $ hg version
list subcommands $ git help -a

commonly used subcommands only:
$ git help
$ hg [-v] help

-v: show aliases and global options
get help for subcommand $ git help CMD $ hg help CMD
list topic guides $ git help -g $ hg help
get help for topic $ git help TOPIC $ hg help TOPIC
local and remote repository
git hg
create repository from a directory $ git init [DIR] $ hg init [DIR]
create repository with no working directory $ git init --bare [DIR]

puts repo in DIR
none
clone entire repository $ git clone [-b BRANCH] [-o NAME] [--depth NUM] URL [DIR]

-b: checkout BRANCH in working dir
-o: assign NAME to remote repo
--depth: copy commit history to depth NUM
repository url formats ssh://[user@]host.xz[:port]/path/to/repo.git
git://host.xz[:port]/path/to/repo.git
http[s]://host.xz[:port]/path/to/repo.git
ftp[s]://host.xz[:port]/path/to/repo.git
rsync://host.xz/path/to/repo.git
/path/to/repo.git
file:///path/to/repo.git
local/filesystem/path[#revision]
file://local/filesystem/path[#revision]
http://[user[:pass]@]host[:port]/[path][#revision]
https://[user[:pass]@]host[:port]/[path][#revision]
ssh://[user@]host[:port]/[path][#revision]
clone branch from repository $ hg clone [-r REV|-b BRANCH] … URL [DIR]
clone repository; new repository has no working directory $ git clone --bare URL [DIR]

puts repo in DIR
$ hg clone -U URL [DIR]

puts repo in DIR/.hg
list remote repositories $ git [[-v] show] remote

-v: show url of remote
$ hg paths
add remote repository $ git remote add [-t BRANCH] … NAME URL
$ git remote add [-m BRANCH] NAME URL
edit the [paths] section of .hg/hgrc
remove remote repository $ git remote rm REMOTE
rename remote repository $ git remote rename REMOTE NAME
show remote repository details $ git remote show [-n] REMOTE

-n: do not connect to remote repo
edit remote repository details $ git remote set-head REMOTE (-a|-d) BRANCH
$ git remote set-url --add REMOTE URL
$ git remote set-url --delete REMOTE URL
$ git remote set-branches REMOTE [--add]   BRANCH …
working directory
git hg
check out version $ git checkout [-f] (BRANCH|TREEISH)

-f: overwrite changes in working dir and index
$ hg update [-c|-C] -r REV

-c: fail if changes in working dir
-C: discard changes in working dir
list modified files $ git status [-s] [--ignored] [PATH] …

-s:        short format
--ignored: also files excluded by .gitignore
$ hg status [PATH]
ignore file .gitignore .hgignore
check out specific files $ git checkout [TREEISH] -- PATHSPEC … $ hg revert [-C] [-r REV] PATH

-C: do not create backups w/ .orig suffix
check out from index $ git checkout -p PATHSPEC no index
clear index $ git reset
clear index and working directory $ git reset --hard $ hg revert -C -a
list or remove untracked files $ git clean (-f|-n)

-f: remove untracked files
-n: list untracked files
$ hg purge [-p]

-p: remove untracked files

without -p untracked files are listed
move working directory changes to shelf $ git stash [save [STR]] $ hg [-n STR] shelve
list sets of changes in shelf $ git stash list $ hg shelve --list [-p]

-p: show each set of changes as diff
show set of changes in shelf as diff $ git stash show [STASH] none
restore changes from shelf $ git stash pop [STASH] $ hg unshelve [STR]
delete set of changes from shelf $ git stash drop [STASH] $ hg shelve -d STR
clear shelf $ git stash clear $ hg shelve --clear
track and commit
git hg
track files files are not tracked; changes must be added to staging area before each commit $ hg add PATH
track files matching pattern $ hg add -I PATTERN
track all files in working directory $ hg add
add modified or new files to staging area $ git add PATHSPEC

-u: files already under version control only
working directory is the staging area
add part of modified file to staging area $ git add -e PATH
track any new files in working directory, and remove any tracked files not in working directory $ hg addremove [PATH] …
remove files from working directory and next commit $ git rm [-f] PATH

-f: force if changed in index
$ hg remove [-f] PATH

-f: force if added or changed
remove files matching pattern $ git rm [-f] PATHSPEC

-f: force if changed in index
$ hg remove [-f] -I PATTERN

-f: force if added or changed
remove files in next commit which are no longer in working directory $ hg remove -A PATH
mark file to be removed in next commit without removing from working directory $ hg forget PATH
remove subdirectory from working directory and next commit $ git rm -r DIR $ hg remove DIR
remove files from index $ git rm --cached FILE
copy files from head to index $ git reset -p FILE
copy files from index to working directory $ git checkout -p PATH
move file in working directory and next commit $ git mv OLDPATH NEWPATH $ hg rename OLDPATH NEWPATH
move files into directory $ git mv PATHDIR $ hg rename PATHDIR
copy file in working directory and next commit $ hg copy (-A|-f) SRC_PATH DEST_PATH

-A: if file already copied
-f: if target is tracked
show difference between staging area and head $ git diff --cached [PATHSPEC …] $ hg diff [PATH …]
show difference between working directory and staging area $ git diff [PATHSPEC …] working directory and staging area are the same
diff options --name-only: list modified file names
--name-status: status (M, A, D, R, ..) and modified file names
--stat: histogram of changes by file
--dirstat: histogram of changes by directory
--word-diff: show changes to line inline
--word-diff-regex=REGEX: set regex used by --word-diff
-W: show entire modified function in context
 
-R: reverse direction of diff
-w: ignore whitespace differences
--ignore-blank-lines: ignore blank lines
 
--quiet: no output; exit status 1 if changes, otherwise 0
 
 
 
--stat: histogram of changes by file
 
 
 
-p: show entire modified function in context
-U NUM: show NUM lines of context
--reverse: reverse direction of diff
-w: ignore whitespace differences
-b: ignore blank lines
-X PATTERN: exclude files matching fileglob
 
-S: recurse into subrepos
grep staging area $ git grep --cached [-i] [-v] [-E|F|P] [-h|H] [-l|L] [-n] \
    -e STR TREEISH [--] PATHSPEC
commit changes in staging area $ git commit [-a] [-m STR]

-a: add all modified files with version history to staging area before commit.
$ hg commit [-m STR]
commit changes to selected files in staging area $ git commit [-m STR] PATH $ hg commit [-m STR] PATH …
commit changes in working directory $ hg commit -A [-m STR]
amend most recent commit $ git commit --amend $ hg commit --amend
change author of most recent commit $ git commit --amend --author=STR $ hg commit --amend -u STR
commit identifiers 40 hex digit hashes, e.g:

  bbf3837d6c9bb54f11a4c620a1c81975156c2a49

A prefix can be used to refer to a commit if it uniquely specifies it; often an 8 digit prefix is sufficient.

HEAD refers to the most recent commit of the current branch.
Each revision has a 40 hex digit hash and a sequential integer identifier. The hash is unique across all repositories, but the sequential integer is local to the repository.

A prefix can be used to refer to a revision if it uniquely specifies it; often an 8 digit prefix is sufficient.

A period . refers to the parent revision of the working directory.

tip refers to the most recent revision in the repository.
other ways to refer to commits A circumflex postfix can be used to refer to the parent commit of a commit, e.g.

  bbf3837d6^

The circumflex can be used multiple times to refer to a grandparent, great-grandparent, and so on:

  bbf3837d6^^
  bbf3837d6^^^

An alternative to multiple circumflexes is tilde notation:

  bbf3837d6~2
  bbf3837d6~3

when a commit has multiple parents (i.e. the commit is a merge), use a circumflex followed by a number:

  bbf3837d6^2
resolve commit notation $ git rev-parse COMMIT

e.g. to get commit id of parent of tip:

  git rev-parse HEAD^
create commit which reverts another commit $ git revert [-n] COMMIT

-n: no commit, just change working directory
$ hg backout -r REV
create commit which reverts a merge commit $ git revert [-n] -m NUM COMMIT

NUM is the number of the parent (e.g. 1, 2, …) to restore
create commits which revert a sequence of commits $ git revert [-n] COMMIT1..COMMIT2

reverts commits from COMMIT1 up to but not including COMMIT2. Use triple dots ... to include COMMIT2.
tag a commit $ git tag [-f] NAME [COMMIT]

-f: replace existing tag with same NAME

if no commit specified, HEAD is tagged
$ hg tag [-r REV] NAME
delete a tag $ git tag -d TAG $ hg tag --remove NAME
list tags $ git tag $ hg tags
branch and merge
git hg
current branch $ git rev-parse --abbrev-ref HEAD
list branches $ git branch [-r|-a]

-r: list remote tracking branches
-a: list local and remote tracking branches
$ hg bookmarks
list branches by commit $ git branch (--contains|--merged) COMMIT

--contains: branches descended from COMMIT
--merged:branches ancestors of COMMIT
checkout branch $ git checkout BRANCH

-f: discard changes in index and working directory
$ hg update BRANCH

-C: discard changes in working directory
-c: fail if changes in working directory
move branch head without changing working directory $ git reset [--soft] COMMIT
create new branch by cloning branch $ git checkout -b NAME [BRANCH] $ hg bookmarks NAME
create a tracking branch $ git branch --track NAME [BRANCH]
create a branch from a commit $ git branch NAME COMMIT $ hg bookmarks -r REV NAME
rename a branch $ git branch -m BRANCH NAME $ hg bookmarks -m NAME1 NAME2
delete a branch $ git branch (-d|-D) BRANCH

-d: fail if tracking branch with unmerged commits
-D: delete even if unmerged commits exist
$ hg bookmarks -d NAME
list "dedicated commit" branches $ hg branches
show current "dedicated commit" branch $ hg branch
change branch of next commit $ hg branch BRANCH
list branch tips $ hg heads [-c]

-c: include closed tips
close branch tip $ hg commit --close-branch
merge $ git merge [--squash] COMMIT …

--squash: make changes to index and working directory only; do not create a commit
$ hg merge [[-r] REV]

Only modifies working directory; must be following by hg commit to create a new changeset.
show conflicts $ git status $ hg resolve -l
mark file with conflicts as resolved $ git add PATH $ hg resolve PATH
unmark file with conflicts as resolved $ hg resolve -u PATH
abort merge $ git merge --abort $ hg update -C
rebase current branch $ git rebase BRANCH
continue rebase $ git rebase --continue
abort rebase $ git rebase --abort
squash commits $ git rebase -i COMMIT

commits after COMMIT can be squashed
apply commit to current branch $ git cherry-pick COMMIT $ hg graft -r REV
continue cherry pick $ git cherry-pick --continue
abort cherry pick $ git cherry-pick --abort
create detached head from branch and sequence of commits $ git rebase onto BRANCH COMMIT1 COMMIT2
history
git hg
write version of file to standard out $ git show COMMIT:FILE $ hg cat -r REV FILE
annotate lines of file with commit info $ git blame [-l] [-s] [-s] PATH [COMMIT]

-l: show full commit id (default is 8 chars)
-s: suppress author name and timestamp
-w: ignore whitespace differences
$ hg annotate -cudln [-r REV] [PATH]

-c: changeset
-u: author
-d: date
-l: line number
-n: local revision number
commits which are ancestors of head/tip $ git log [--parents]

--parents: after commit identifier show parent commit identifiers
$ hg log
commits as graph $ git log --graph $ hg log -G
first parent history $ git log --first-parent
chronological order $ git log --reverse
all commits in repository $ git log [--source] --all

--source: print ref name after each commit
limit commits $ git log [--skip=NUM] -(n NUM|-NUM)

--skip: skip first NUM commits
commits which touched files $ git log [--follow] [--] PATH …

--follow: follow renamed files
--: prevent interpreting PATH as option
$ hg log [-f] PATH …

-f: follow copied and renamed files
commits which touched lines $ git log -L NUM,NUM:PATH
one line commits $ git log --oneline
format string for commit $ git log —pretty=format:FORMAT

format string specifiers:

%H commit hash
%h abbrev. commit hash
%T tree hash
%t abbrev. tree hash
%P parent hash(es)
%p abbrev. parent hash(es)
%s subject
%b body
%an author name
%ae author email
%ad author date
%cn committer name
%ce committer email
%cd committer date
%n newline
%% percent sign
show commit diffs $ git log -p
show commits touching lines matching regular expression $ git log -p [--pickaxe-all] -G REGEX $ hg grep PATTERN
grep commit messages $ git log --grep=REGEX $ hg log --keyword STR

case insensitive search; also searches source
show changes to head $ git reflog

The output format is:

  <commit> HEAD@{<num>} <description>

<num> is the number of changes since HEAD was at <commit>. HEAD@{<num>} can be used as an alias for <commit>.


This outputs reflog info in log style:

$ git log -g
difference between commit and its parent $ git diff [--name-only] COMMIT^ COMMIT [--] [PATH …] $ hg diff -c REV [PATH …]
difference between two comits $ git diff [--name-only] COMMIT1 COMMIT2 [--] [PATH …] $ hg diff -r REV1 -r REV2 [PATH …]
diff options --name-only: list modified file names
--name-status: status (M, A, D, R, ..) and modified file names
--stat: histogram of changes by file
--dirstat: histogram of changes by directory
--word-diff: show changes to line inline
--word-diff-regex=REGEX: set regex used by --word-diff
-W: show entire modified function in context
 
-R: reverse direction of diff
-w: ignore whitespace differences
--ignore-blank-lines: ignore blank lines
 
--quiet: no output; exit status 1 if changes, otherwise 0
 
 
 
--stat: histogram of changes by file
 
 
 
-p: show entire modified function in context
-U NUM: show NUM lines of context
--reverse: reverse direction of diff
-w: ignore whitespace differences
-b: ignore blank lines
-X PATTERN: exclude files matching fileglob
 
-S: recurse into subrepos
grep commit $ git grep [-i] [-v] [-E|F|P] [-h|H] [-l|L] [-n] \
    -e STR TREEISH [--] PATHSPEC
start bisection $ git bisect BAD_COMMIT GOOD_COMMIT
mark bisection commit as good $ git bisect good
mark bisection commit as bad $ git bisect bad
mark bisection commits automatically $ git bisect run SCRIPT [ARG]…

Exit status of 0 indicates a good commit. Exit status of 125 indicates an untestable commit. Any other status in the range 1…127 indicates a bad commit.
show bisection decisions $ git bisect log
terminate bisection $ git bisect reset
push and pull
git hg
pull commits from remote $ git fetch [-f] [-t] [-p] [REPO]

-f: force if update not a fast-forward
-t: copy tags
-p: remove local references gone from remote

Pulls from origin if REPO not specified.

Updates remote tracking branches per refspec of REPO in .git/config.
$ hg pull [-u] [SOURCE]

-u: update working directory to most recent changeset

Pushes all revisions in local repository not on remote repository.

If no SOURCE specified, uses value of default in the [paths] section of .hg/hgrc.
pull commits from remote for branch $ git fetch [-f] [-t] [-p] REPO REFSPEC $ hg pull (-b BRANCH|-B BOOKMARK|-r REV) … [SOURCE]

-b: pull revisions on BRANCH and parents
-B: pull BOOKMARKed revision and parents
-r: pull REV and parents
pull commits from multiple remotes $ git fetch [-f] [-t] [-p] (--all|--multiple REPO …)
show remote commits available for pulling $ hg incoming
pull commits from remote and merge current branch with remote head $ git pull [-f] REPO [REFSPEC]
push commits to remote $ git push [-f] [--prune] [--tags] [REPO] $ hg push [-f] [SOURCE]
push commits to remote for branch $ git push [-f] [-u] REPO [BRANCH] $ hg push [--new-branch] (-b BRANCH|-B BOOKMARK|-r REV) … [SOURCE]
delete remote branches $ git push --delete REPO BRANCH
show commits which have not been pushed $ hg outgoing
move commits by archive $ git bundle $ hg bundle
$ hg unbundle
configuration files
git hg
repository config file .git/config .hg/hgrc
user config file ~/.gitconfig ~/.hgrc
show config options $ git help config $ hg help config
set config value when cloning $ git clone [-c SECTION.KEY=VAL] URL [DIR]
list configuration settings $ git config -l [--global] $ hg showconfig
show specific configuration setting $ git config —get [global] SECTION.KEY
external repositories
git hg
external repository file .gitmodules .hgsub
external repository file format [submodule "vendor/modules/system"]
  path = vendor/modules/system
  url = ../dm-kohana-core.git
add submodule $ git submodule add URL PATH

Records submodule in .gitmodules. If PATH does not exist, URL is cloned there.
register submodules $ git submodule init

Copies data from .gitmodules in index to .git/config.
$ git submodule update
clone repository and external repositories $ git clone --recursive URL
packaging
git hg
create tarball $ git archive --format=tar TREEISH > repo.tar $ hg archive -t tar ../NAME.tar
create gzipped tarball $ git archive --format=tgz TREEISH > repo.tgz
create zip archive $ git archive --format=zip TREEISH > repo.zip
create a patch $ git format-patch ??? $ hg export ???
apply patch $ git apply ??? $ hg import ???
integrity check and garbage collection
git hg
integrity check $ git fsck $ hg verify
garbage collection $ git gc none
_______________________________________________ _______________________________________________________________________ _______________________________________________________________________

Metasyntactic Variables

git
BRANCH the name of a branch.
CMD the name of a version control command: the first argument of the base command.
COMMIT the HASH for a commit. A commit can be referenced indirectly via a branch or tag name or via commit notation. The symbolic references HEAD or FETCH_HEAD can also be used to reference commits.
DIR a directory on the file system. In some cases it must exist; in others it will be created.
FILE a regular file on the file system. In some cases it must exist; in others it will be created.
HASH a 40 digit hex string used as an identifier for something in the object database.
HEAD the literal string HEAD.
NAME a name for an entity which will be created. Usually there are restrictions on the characters that can be used.
PATH a path on the file system. In some cases it must exist; in others it will be created.
PATHSPEC like a file glob pattern, except that ? and * can match the directory separator: /. Characters special to the shell must be escaped.
REFSPEC [+]SRC_REF:DEST_REF where SRC_REF and DEST_REF are ref paths relative to the .git directory. SRC_REF is on the remote repository in a fetch or a pull and on the local repository in a push.

An asterisk can be used in place of a component of the relative path to match everything in the directory. If the SRC_REF has an asterisk, the DEST_REF must also have one.

A plus sign prefix + is used to indicate that the update should be made even when it is not a fast-forward.

If the SRC_REF is the empty string, then the DEST_REF is deleted.

Leading components of SRC_REF or DEST_REF can be omitted if no ambiguity results.
REMOTE the name of a remote.
REPO A REMOTE or a URL.
STASH stash identifier format: stash@{0}, stash@{1}, …
STR a string. There are no restrictions on the characters that can be used, but if they include whitespace or characters special to the shell they must be escaped or quoted.
TREEISH the HASH for a tree, a commit, or a tag. If the HASH is for a commit or a tag the tree in the commit is used.
URL a url for a repository.
hg
BRANCH the name of a branch.
CMD the name of a version control command: the first argument of the base command.
DIR a directory on the file system. In some cases it must exist; in others it will be created.
FILE a regular file on the file system. In some cases it must exist; in others it will be created.
NAME a name for an entity which will be created. Usually there are restrictions on the characters that can be used.
PATH a path on the file system. In some cases it must exist; in others it will be created.
PATTERN a file glob pattern. The metacharacters ?, *, and ** are supported. Characters special to the shell must be escaped.
REV the revision number for a changeset. It can be either the local revision number, which is a small decimal integer, or the 12 hex digit universal revision number.
SOURCE A URL or a name for a URL in the [paths] section of the .hg/hgrc file
STR a string. There are no restrictions on the characters that can be used, but if they include whitespace or characters special to the shell they must be escaped or quoted.
URL a url for a repository.

Help

show version

list subcommands

get help for subcommand

list topic guides

get help for topic

Local and Remote Repository

When a set of files is edited, we can view the set of files as a sequence of versions or revisions which are ordered in time. A repository is a record of the versions of a set of files; it permits recovering the set files as they were at any of the recorded points in time.

A working directory (q.v the working directory of a process) can contain a copy of one of the file set versions; it can also be edited to create a new version of the file set. A commit is the act of recording the new version of the file set in the repository. This is done by comparing the working directory with the previous version or an empty file set in the case of an initial commit. The commit can also be regarded as the difference—that is, the output of diff -r—between the previous and the new version. This difference is also called the changeset.

When calculating the changeset, the version control system may ignore files in the working directory which are not tracked files.

A set of files and directories under version control is called a repository.

git

A file or directory under version control has one or more versions. One adds new versions to the repository by making a commit. The set of all files and directories in the repository can also be seen as having versions; these versions are called commits; they consist of at most one version of each file or directory in the repository.

hg

A file or directory under version control has one or more revisions. One adds new revisions to the repository by making a commit. The set of files and directories in the repository can also be seen as having revisions; these revisions are called changesets.

Working Directory

check out version

git:

These two commands appear to do the same thing:

$ git checkout -f COMMIT
$ git reset --hard COMMIT

list modified files

ignore file

git

man gitignore

A list of file patterns, one per line. The patterns specify files that git status and git add should ignore. Shell glob syntax (i.e. the asterisk: *) can be used.

A .gitignore can be placed in any directory in the repository. The rules in a given .giitignore file will only apply to the current directory and the directories beneath it.

Lines starting with a pound sign: # are ignored.

A pattern starting with an exclamation point: ! will negate a pattern. This can be used to include files that were excluded by a pattern higher in the file matching a broader set of files.

hg

Unlike .gitignore, an .hgignore file must be in the root of the working directory.

The format is one Perl regular expression per line. All files which match the regular expression will be ignored.

Comments start with the pound sign: #

It is also possible to use glob syntax:

# regexp to ignore twiddle files:
~$

# glob to ignore compiled python files:
syntax: glob
*.pyc

# additional patterns will use regexp format:
syntax: regexp

Track and Commit

git

Git keeps copies of all versions of files and directories that have been committed, as well as the commits themselves, in the directory .git/objects. All objects are identified by their 40 character SHA-1 checksum called the hash. There are three types of git objects in this directory: a blob, which is the contents of a file. A tree, which corresponds to file system directory; it contains the file system name of the objects, which can be blobs (regular files) or trees (directories) and their hashes. Finally, a commit contains the top level tree for the commit and the parents of the commit. There will be zero parents for the initial commit and more than one parent for a commit which was created by a merge. Git stores a separate, albeit compressed, copy of each version of a file, tree, or commit in the .git/objects directory.

The git cat-file -p HASH command, though not needed for day-to-day use, provides a way to inspect a git object. It shows the additional information stored in trees and commits which we have not mentioned here.

hg

Mercurial uses a storage format called a revlog to store the versions of a file. Most revlogs are kept in .hg/store/data. A revlog usually consists of two files: one with an .i suffix and another with a .d suffix. If the file is small and has little or no history, the revlog might consist of only a .i file. A revlog which tracks the history of a file is called a filelog. When the file is first committed, it is written to the filelog. Each time a commit is made which alters it, a delta describing the change is appended to the file. Thus, to fetch the current version of a file, all the deltas must be applied in order to the original version of the file. As a performance optimization, Mercurial will sometimes append the full version of the file to a filelog. Thus, when reconstructing the current version, one need only apply delta starting from the last time the full version was stored.

Mercurial stores a manifest for each revision of the repository. A reviion of the repository is called a changeset. The manifest is list of the pathnames relative to the root of all files in the changeset. Rather than store the manifests in separate files, all the manifests for the repository are stored in a revlog in .hg/store. Each time a new changeset is added to a repository by a push, pull, or commit command, it is assigned a local revision number which is the order in which it was appended to the local manifest revlog. If the changeset was pulled from a different repository, the local revision numbers might not match.

Information about changesets is also stored in the changelog, which is another type of revlog. The changelog has a pointer to manifest revision, pointers to parents of the changeset, and information about the committer.

git

Git has three types of objects: commits, trees, and blobs. Each is assigned a unique hash ID which is a 40 digit hex string. The identifier is called the hash, SHA1, object name, or object identifier with no difference in meaning. When the underlying object is a commit or tree it is also called a tree-ish.

Commit hashes are the hashes the user most commonly sees and needs to reference. Only as many of the digits that are necessary to uniquely identify an object in the object database need to be provided to a git command; usually the first 6 or 7 is sufficient.

HEAD is a special name which refers to the most recent commit of the current branch. It is stored in .git/HEAD. The previous commit is HEAD^ and the commit before that is HEAD^^. The is also numerical notation: HEAD~4 is 4 commits ahead of HEAD. If HEAD is the result of a merge, then the antecedents can be be referenced with HEAD^1 and HEAD^2.

hg

In Mercurial, every commit is assigned two identifiers: a local revision number and a universal changeset identifier. The local revision number is a small integer that is unique only to the local repository. The first local revision number issued is zero, and it increments with each local commit. The changeset identifier is a twelve digit hex number which is unique across all repositories.

The -r option is used to pass a mercurial commit identifier to a command. The argument can be a local revision number, a changeset identifier, or both separated by a colon.

move file in working directory and next commit

It is desirable for a version control system to track file name changes. Otherwise commands like blame and log when used on a single path will not show activity before the name change. If the version control system is aware of a name change, it can correctly handle the case when merging where the name was changed on one branch and edited on the other.

git

Although Git provides a git mv subcommand, it does not actually track name changes. Instead, it will assume that a name change occurred during a commit when one file disappeared, another appeared, and they have similar contents. Hence, even if the user uses git rm, a Unix command mv, and git add, Git will preserve the history for the file.

hg

Mercurial keeps track of the name a file had in each revision of a filelog. The hg rename subcommand must be used to preserve history.

move file

commit changes in staging area

In the case of Git, the staging area is the index.

In the case of Mercurial, the staging area is the files in the working directory which are tracked.

commit identifiers

mercurial:

null refers to an empty revision which is the parent of revision 0.

Branch and Merge

git

Git has a low level feature called a ref which it uses to implement branches and tags. A ref is a file in .git/refs which contains the hash of a commit. Branches are in .git/refs/heads and tags are in .git/refs/tags. Whenever a commit is made, the value in .git/refs/heads/BRANCH is updated where BRANCH is the current branch. The values in .git/refs/tags/TAG do not change.

The name of the branch which is currently checked out is stored in .git/HEAD. It is stored as the relative path refs/head/NAME.

Git also stores remote branches and tags in .git/refs/remotes/REPO. The git branch -r command can be used to list remote branches. Remote branches have names of the form REPO/BRANCH, and each remote branch will usually have a tracking branch, which is a local branch named BRANCH. This will be the case for any branches which were copied when a repository is created via git clone. A tracking branch can also be created when a remote repository is added using git remote -t BRANCH REPO URL. git fetch will only update remote branches. git pull will update remote branches and merge them with their tracking branches.

The default branch is called master. It is created by git init, and is the branch that is copied by git clone if no branch is explicitly specified.

Commits have zero or more parent commits. git commit creates a commit with one parent, except in the case of the initial commit. git merge creates a commit with two or more parent commits. If the commit has three or more parents, the merge is called an octopus merge.

staging numbers:

To perform a merge Git gets the tree contained in the common ancestor and puts its items into the staging area with staging number 1. It puts the current branch tree items in the staging area with staging number 2. It puts the tree items of the other branches in the staging area with staging number 3 or higher.

fast-forward commits aren't actually commits:

Suppose that bar is a branch of foo. If commits have subsequently been made to foo but not to bar, then running the following when bar is the current branch will perform a fast-forward:

git merge foo

In a fast-forward no merge commit is created. Instead the head of bar is simply moved to point to the same commit as the head of foo.

hg

A Mercurial branch is a name which is stored in a changeset. When a commit is made, the new changeset inherits the branch name of the previous changeset, unless a different name was specified before the commit with hg branch. To switch to a new branch one must make a commit.

Mercurial branches differ from Git branches in that:

  • every commit belongs to a single branch
  • a branch can have multiple heads

Mercurial tags are names for changesets. They are stored in the .hgtags file at the repository root. Creating a tag requires making a commit.

Mercurial does not support octopus merges. Thus changesets have at most two parents. A changeset created by hg merge sets the branch of the new changeset to be the branch of the first argument.

Changesets can have no branch specified. This is also called the default branch.

bookmarks:

Mercurial bookmarks work like Git branches, with the exception that Mercurial does not have the equivalent of Git tracking branches.

list "dedicated commit" branches

"dedicated commit" branches is the term used in this sheet to distinguish Mercurial-style branches from Git-style branches. Mercurial bookmarks are equivalent to Git-style branches.

In Git, branches are names which contain the hash of a commit. This commit is the head of the branch, and because each commit knows its parents, the entire history of the branch can be constructed. Git commits can belong to multiple branches, or they can belong to none at all. If a Git commit is not reachable from any branch or tag, it is at risk of being garbage collected.

In Mercurial, each changeset has a branch name associated with it. Thus, each changeset must belong to exactly one branch. Also, Mercurial branches can have multiple tips, whereas a Git branch always has a single head.

Mercurial does not provide a mechanism for renaming or deleting branches. The recommended way to get rid of unwanted branches is to rename the repository and then clone it to the original name with:

$ hg clone -r REV

History

Push and Pull

git

The basic command for getting changes from a remote repository origin is:

$ git fetch

Which branches are fetched is controlled by the fetch key in the remote section of .git/config. If the local repository was created by git clone, here is a likely value:

[remote "origin"]
        fetch = +refs/heads/*:refs/remotes/origin/*

In this case, git fetch origin connects to the remote repository and copies all of the remote branches to refs/remotes/origin. Then it adds all remote objects referred to by the remote branches to the local objects database. It also puts the remote HEAD into FETCH_HEAD. The + indicates that local branches should be updated even if the commits are not fast-forwards.

The basic command for sending changes back to the remote repository origin is:

$ git push

Which branches are pushed is controlled by the push key in the remote section of .git/config. Here is an example entry which pushes commits on the master branch, and fails if the commits are not fast-forwards:

[remote "origin"]
        push = refs/heads/master:refs/heads/master

A git pull is a git fetch followed by a git merge FETCH_HEAD, which git fetch sets to whatever was in HEAD on the remote repository.

hg

hg pull pulls changesets for all the remote branches that are also local branches unless branches are listed explicitly with the -b flag. hg pull -u is equivalent to hg pull followed by hg update. Pulling can create local branches with multiple heads, in which case an hg update will fail. An hg merge is used to merge the two heads, or an hg commit --close-branch is used to mark one of them as closed.

hg push pushes changsets for all local branches that are also remote branches unless branches are listed explicitly with the -b flag. A push which would create a branch with multiple heads will fail unless the -f flag is used. The --new-branch flag must be used to create a new branch.

Configuration Files

External Repositories

Packaging

Integrity Check and Garbage Collection

integrity check

garbage collection

Version Control

svn | cvs | rcs | sccs

distributed client-server local
$ git CMD $ hg CMD $ svn CMD $ cvs CMD $ rcs CMD
online documentation help help help $ man cvs $ man rcs
repository
create new repository init init init $ mkdir RCS
get local copy of repository from server or existing repository clone clone checkout/co checkout/co
show remote repositories remote -v show paths
add remote repository remote add
working directory
update working directory to most recent version of a branch checkout update/up update/up update co
lock a file lock co -l
unlock a file unlock co -u
make working directory match the most recent commit reset revert revert
show files in working directory which don't match the most recent commit status status/st status/st status
show difference between file in working directory and most recent commit diff diff diff/di diff diff
store uncommitted working directory changes in a temporary location stash shelve
tracking and committing
put file under version control add add add add ci -i
change name of a file under version control mv rename/mv move/mv
mark file as not present in the next commit rm remove/rm delete/rm remove
create new commit commit commit/ci commit/ci commit/ci ci
create commit which undoes the result of a previous commit revert backout
branching and merging
create branch branch branch(es) copy/cp tag -b ci -r
merge branches merge merge merge
move commits on a branch to the end of another branch rebase rebase
mark file with merge conflicts as resolved add resolve resolve
history
annotate lines of source code with commit info blame annotate blame/ann annotate
show commit information for current branch in reverse chronological order log log -b tip log log log
show difference between two commits diff diff diff -rREV1 -rREV2
find commit which introduced a change log -S grep
write contents of a file version to standard out show cat cat checkout -p co
give name to a commit tag tag(s) copy/cp tag $ rcs -nTAG:REV
pulling and pushing
show commits available to be pulled or update tracking branches fetch incoming/in
get commits from a remote repository pull pull
push commits to a remote repository push push
configuration
add user information config
___________________________________________________ __________________________ __________________________ __________________________ __________________________ __________________________

sccs (1972)

In his 1975 paper Rochkind describes SCCS as a "radical departure from conventional methods for controlling source code". SCCS was initially implemented in 1972 on the IBM 370. The implementation language was SNOBOL. Rochkind was an employee of Bell Laboratories and SCCS was soon ported to Unix where it became a cornerstone of the "Programmer's Workbench", a suite of software distributed with early Unix.

The radical departure of SCCS appears to be the decision to store every version of each file under source control. This is done in a space efficient manner by means of deltas: the original file is stored with a delta for each change. To get the most recent version of the file all of the deltas must be applied to the original file. Also stored with each delta is the name of the user who made the change, the date and time of the change, and a user supplied comment explaining the change.

SCCS introduces a file format so that the original file, the deltas, and the meta-information can all be stored in a single history file. If the original file was foo.c, a common early convention was for the history file to be named s.foo.c. In the original Unix implementation the SCCS commands were standalone Unix commands. Starting with the version of SCCS which Allman wrote for BSD Unix in 1980 the SCCS commands became arguments or subcommands to a sccs executable.

Here is an sample SCCS session. The file foo.txt is put under source control. It is then checked out, edited, and the change committed. Finally a non-editable copy of the most recent version is checked out.

$ echo "foo" > foo.txt
$ sccs admin -ifoo.txt s.foo.txt
$ rm foo.txt
$ sccs get -e s.foo.txt
$ vi foo.txt
$ sccs delta s.foo.txt
$ sccs sccsdiff -r1.1 -r1.2 s.foo.txt
$ sccs get -p s.foo.txt > foo.txt

The SCCS history file format consists of fields separated by the Ctrl-A (ASCII 1) characters. The fields are divided into headers, which contain the meta-information, and the body, which contains the original file and the deltas. The original file is given revision number 1, and the number is incremented with each change.

The body consists of the original file interspersed with nested insert blocks and delete blocks. The format for an insert block is

^AI REV
added line one
added line two
...
^AE REV

where REV is the revision number for which the lines were added. Similarly the format for a delete block is

^AD REV
deleted line one
deleted line two
...
^AE REV

When extracting a version of the file, the desired version is compared with each block. Insert blocks are ignored if they have a higher number than the desired version and delete blocks are ignored if they have a lower or equal number than the desired version.

rcs (1982)

In RCS, the history file is always identified with a ,v suffix; the history file for foo.txt is foo.txt,v. Because this convention is used consistently, RCS commands can take the original file as an argument instead of the history file like in SCCS.

One can keep the ,v files in a separate directory. RCS has built in support for using a subdirectory named RCS in the same directory as the source. When this convention is used, it is not necessary to specify both the ,v file and the source file when using rcs ci and rcs co. If the source code tree has subdirectories, each subdirectory should contain an RCS subdirectory.

RCS supports multiline commit messages and it adds the log command for getting all the commit messages for a file.

example session

Here is an example work session using RCS. It is equivalent to the SCCS work session in the previous section.

$ echo "foo" > foo.txt
$ rcs ci -i foo.txt
$ rcs co -l foo.txt
$ vi foo.txt
$ rcs ci foo.txt
$ rcs co foo.txt

make has a built-in rule for creating a file from its ,v file. The file will be checked out as read-only.

rcs file format

An RCS history file has four sections:

  • head
  • deltas
  • description
  • deltatexts

The head contains the revision number of the current version. If any of the revision numbers have been assigned symbolic names (i.e tags), they are listed here. If there

There is a delta section for each revision. It contains the time the revision was added to the history file and the author.

The description is a string describing the file. Strings are delimited by ampersands @. An ampersand in the string is escaped by doubling it.

There is a deltatext section for each revision. It contains a log, which is the commit message, and the text), which is either the full text of the revision or an ed style edit describing how to generate the revision from another revision. Both the log and the text are ampersand delimited strings with ampersands escaped by doubling.

Here is an example of a text which adds two lines after line 6:

@a6 2
added line one
added line two
@

Here is an example o a text which deletes two lines after line 6:

@d6 2
@

revision numbers and branching

RCS revision numbers consist of 2n positive integers joined by decimals. n is itself positive. By default the revision number given to a file when it is first placed under version control is 1.1.

RCS revision numbers with 2 integers refer to revisions on the trunk. RCS revision numbers with 4 integers refer to branches off the trunk. The first two integers indicate the trunk revision that is the root of the branch. RCS revision numbers with 6 integers refer to branches off of a branch of the trunk.

When a commit is made, by default the last integer of the revision is incremented.

When a branch is created off of a revision that does not have a branch, the revision number of the new revision is created from the revision number of the root of the branch by appending .1.1 to it. If the root revision already has a branch off of it, the new branch revision number will have a .2.1 appended to it.

cvs (1990)

CVS uses RCS to manage the history of each file under version control.

How to set up a CVS repository:

$ mkdir cvsroot
$ export CVSROOT=/PATH/TO/cvsroot
$ cvs init

For CVS commands to work, either the CVSROOT environment variable must be set or the location of the repository root must be passed to the command with each invocation using the -d option.

It is possible to set up a server on the repository host which listens on port 2041. The server uses ssh authentication, so the client must have an account on the host. To use the server, the client sets the CVSROOT environment variable to something like this:

$ export CVSROOT=:pserver:foo.com:/PATH/TO/cvsroot

How to create a project in the repository:

$ mkdir foo
$ cd foo
$ touch README
$ cvs import foo FOO_CORP V1

The 2nd and 3rd argument to cvs import are required and are used to create tags.

Here is an example of how to check out the foo project but name the working directory bar:

$ cvs checkout -d bar foo

Files only needed to only needed to be added once, like Mercurial and unlike Git:

$ cd bar
$ vim Makefile
$ cvs status
$ cvs diff
$ cvs add Makefile
$ cvs commit -m 'adding a Makefile'
$ cvs log

Branching is a three step process: (1) tag the commit you are branching from, (2) create the branch, and (3) update the working directory to the branch.

$ cvs tag WUMPUS_ROOT
$ cvs tag -r WUMPUS_ROOT -b WUMPUS
$ cvs update -r WUMPUS

How to merge

svn (2000)

How to set up an SVN server:

$ svnadmin create /PATH/TO/svn
$ vim /PATH/TO/svn/conf/svnserve.conf
$ vim /PATH/TO/svn/conf/passwd
$ mkdir /tmp/empty
$ svn import /tmp/empty file:///PATH/TO/svn/NAME
$ svnserve -d  -r /PATH/TO/svn --log-file /PATH/TO/svn/svnserve.log
$ svn co svn://localhost/NAME

When editing svn/conf/svnserve.conf, add these lines to the [general] section:

anon-access = none
auth-access = write
password-db = passwd

In svn/conf/passwd, create a username and password:

joe = passwerd123

To have svnserver start automatically at boot on an Ubuntu server, put this script in /etc/init.d. Change the value of DAEMON_ARGS from -d -r /usr/local/svn/repos to -d -r /PATH/TO/svn —log-file /PATH/TO/svnserve.log.

Archive and Patch Tools

diff | cpio | diff3 | ar | tar | patch | zip | diffstat | jar | rsync | colordiff

diff (1974)

To implement an efficient version control system one needs to find a minimal delta or difference between two similar text files. The problem led to the development of the Unix diff utility. Regarding a file as a sequence of lines, the problem can be treated as an example of the longest common subsequence problem. The standard solution to this problem has O(nm) performance in both time and space, where n and m are the lengths of the two files. To facilitate quick comparison of lines, each line is replaced with a hash code. When implementing diff McIlroy developed an algorithm that was more efficient than the standard solution in most cases.

The standard diff notation prefixes lines with < and > to indicate whether the line originated in the first or second file. It also uses the letters a, c, and d to indicate lines being added, changed, or deleted:

$ echo "foo" > foo.txt

$ echo "bar" > bar.txt

$ diff foo.txt bar.txt 
1c1
< foo
---
> bar

$ diff foo.txt /dev/null
1d0
< foo

$ diff /dev/null foo.txt 
0a1
> foo

These letters used in diff notation are also ed commands. In fact, diff -e will output an ed script which can be used to convert the first file into the second:

$ diff -e foo.txt bar.txt > diff.ed

$ ( cat diff.ed ; echo "w" ) | ed foo.txt

The version of diff released with BSD 2.8 in 1981 added the -c option to show three lines of context around each change. This is called the context format.

The BSD 2.8 diff also added an -r option to perform a recursive diff on directories.

In 1990 the -u option was added, which gives a diff inunified format. In the context format, if a line is changed, the context is repeated: once around the old version of the line and once around the new. The uniformed format puts both version of the line in the same context, reducing the size of the diff file.

The -C NUM and -U NUM options are like the -c} and {{-u options, except that they show NUM lines of context.

normal format:

$ sed s/^root:/ROOT:/ /etc/passwd > /tmp/passwd

$ diff /etc/passwd /tmp/passwd
12c12
< root:*:0:0:System Administrator:/var/root:/bin/sh
---
> ROOT:*:0:0:System Administrator:/var/root:/bin/sh

ed script format:

$ sed s/^root:/ROOT:/ /etc/passwd > /tmp/passwd

$ diff -e /etc/passwd /tmp/passwd
12c
ROOT:*:0:0:System Administrator:/var/root:/bin/sh
.

context format:

$ sed s/^root:/ROOT:/ /etc/passwd > /tmp/passwd

$ diff -c /etc/passwd /tmp/passwd
*** /etc/passwd    2013-10-24 17:38:39.000000000 -0700
--- /tmp/passwd    2014-04-26 12:57:57.000000000 -0700
***************
*** 9,15 ****
  # Open Directory.
  ##
  nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
! root:*:0:0:System Administrator:/var/root:/bin/sh
  daemon:*:1:1:System Services:/var/root:/usr/bin/false
  _uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
  _taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
--- 9,15 ----
  # Open Directory.
  ##
  nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
! ROOT:*:0:0:System Administrator:/var/root:/bin/sh
  daemon:*:1:1:System Services:/var/root:/usr/bin/false
  _uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
  _taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false

unified format:

$ sed s/^root:/ROOT:/ /etc/passwd > /tmp/passwd

$ diff -u /etc/passwd /tmp/passwd
--- /etc/passwd    2013-10-24 17:38:39.000000000 -0700
+++ /tmp/passwd    2014-04-26 12:57:57.000000000 -0700
@@ -9,7 +9,7 @@
 # Open Directory.
 ##
 nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
-root:*:0:0:System Administrator:/var/root:/bin/sh
+ROOT:*:0:0:System Administrator:/var/root:/bin/sh
 daemon:*:1:1:System Services:/var/root:/usr/bin/false
 _uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
 _taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false

recursive format:

$ mkdir /tmp/a /tmp/b

$ cp /etc/passwd /tmp/a

$ $ sed s/^root:/ROOT:/ /etc/passwd > /tmp/b/passwd

$ diff -r /tmp/a /tmp/b
diff -r /tmp/a/passwd /tmp/b/passwd
12c12
< root:*:0:0:System Administrator:/var/root:/bin/sh
---
> ROOT:*:0:0:System Administrator:/var/root:/bin/sh

cpio (1977)

An ancient and to most people unfamiliar Unix archiving tool which is roughly equivalent to tar. The suffix .cpio is often used for cpio archive files.

The format is used by RPM packages, though RPM 5.0 and later also support the xar format. The Linux kernel since version 2.6 has a cpio archive called initramfs which it uses during the boot process. cpio is also used by the Mac OS X .pkg format.

The cpio file format is similar to the tar file format in that for each file which is added to an archive, a header and the file contents are appended to the archive file. In the case of cpio the header is smaller (76 bytes vs 512 bytes). This is in part because the header only contains the file name length; the actual file name is appended to the archive file between the header and the file contents. By contrast the tar format stores the name in fixed length fields, putting a limit on the possible path length. Another different is the cpio format lacks a checksum.

header format
offset length field description
0 6 c_magic The identifying value "070707"
6 6 c_dev
12 6 c_ino c_dev and c_ino together must be unique for each file in the archive
18 6 c_mode
24 6 c_uid
30 6 c_gid
36 6 c_nlink number of links to the file in the archive; can be incorrect if the -a flag was used to append files
42 6 c_rdev a place for implementations to store character or block special file information
48 11 c_mtime
59 6 c_namesize
65 11 c_filesize

Another difference between tar and cpio is that whereas tar takes the files to be archived on the command line, recursively descending any arguments which are directories, cpio when used with the -o flag takes its list of files to be archived from standard input. cpio was designed to be used with the find command. Similarly when using the i flag cpio reads the files to be extracted from an archive from standard input.

diff3 (1979)

diff3 displays the differences between three versions of the same file.

The three way diff is the foundation of branch merging. A two way diff is insufficient for merging because deleting a line in one branch looks like adding a line in the other branch. Only by comparing both branches with the original can these two cases be distinguished.

diff3 has three basic invocations:

diff3 EDIT1 ORIG EDIT2
diff3 -e EDIT1 ORIG EDIT2
diff3 -m EDIT1 ORIG EDIT2

The first invocation writes a description of the three-way diff to standard out.

The second invocation writes an ed script to standard out which will merge the changes in EDIT2 to EDIT1.

The third invocation performs the merge. It writes a version of the file with changes from both EDIT1 and EDIT2 to standard out.

Here is an example of the output format used by the first invocation:

$ cat /tmp/orig.txt 
a
b
c
d
e

$ cat /tmp/edit1.txt 
a
b1
c
d
e
f

$ cat /tmp/edit2.txt 
a
b
c
d1
e

$ diff3 /tmp/edit1.txt /tmp/orig.txt /tmp/edit2.txt
====1
1:2c
  b1
2:2c
3:2c
  b
====3
1:4c
2:4c
  d
3:4c
  d1
====1
1:6c
  f
2:5a
3:5a

Each hunk of the diff3 output starts with four hyphens. All of the hunks in the example above are two-way hunks, meaning that two of the three files are the same. In this case the number of the differing file as it appears in the diff3 arguments is placed after the hyphens.

Here is an example of a three-way hunk, where all three files differ and no number is placed after the hyphens:

$ cat /tmp/orig.txt 
a

$ cat /tmp/edit1.txt                               
a1

$ cat /tmp/edit2.txt 
a2

$ diff3 /tmp/edit1.txt /tmp/orig.txt /tmp/edit2.txt
====
1:1c
  a1
2:1c
  a
3:1c
  a2

ar (1979)

A tool on Unix systems to create static libraries from compiled objects. In other words, to create a .a file from a set of .o files. The format is understood by the linker—which these days is usually built into the compiler—and the loader ld.

The command line interface is broadly similar to tar. Here is how to create an archive; remove files from an archive; list the archive contents; extract files from an archive:

ar -c NAME.a FILE ...
ar -d ARCHIVE FILE ...
ar -t ARCHIVE
ar -x ARCHIVE FILE ...

The ar file format is not standardized and may differ between systems.

The file format used by GNU ar on Linux starts with the new line terminated string "!<arch>".

Each file starts with a 60 bytes header, followed by the file contents. The header has the following fixed-width fields:

offfset length name
0 16 file name in ASCII
16 12 file modification timestamp
28 6 uid
34 6 gid
40 8 file mode
48 10 file size in bytes
58 2 0x60 0x0A

The space allocated for the file name in the header is quite short. GNU ar actually stores a special file named "//" in the archive with a new line separated list of file names. A header can reference a name in this special file by storing "/" and a the decimal offset in the "//" file of the file name. When file names are stored directly in the header, a "/" is used to mark the end of the file and the rest of the field is space padded. This supports spaces in the file name.

GNU ar also stores a special file named "/" is the archive for a symbol table. The format is

  • a 32-bit integer containing the number of symbols
  • a list of 32-bit integers, one for each symbol, containing the offset of the header in the archive for the file containing the symbol
  • a list of null terminated strings, in the same order as the previous list, containing the symbol names

tar (1979)

The more portable twin of ar. Originally used for creating and using magnetic tape archives.

How to create a tar file; list the contents of a tar file; compare a tar file with the file system; and extract the contents of a tar file:

tar [-]cf NAME.tar DIR
tar [-]tf TARFILE
tar [-]df TARFILE [DIR]
tar [-]xf TARFILE

The -v option can be used with -c or -x to write the files being added or extracted to standard error.

Tar files store the files in sequential order. Each file is precede by a 512 byte header. The file itself is null byte padded to a multiple of 512 bytes.

Tar can write to and read from stdout. The following two invocations behave identically:

tar cf - . | (cd DIR ; tar xf -)
tar cf . - | tar xf - -C DIR

Tar can append data to an existing tar file. These commands append the contents of a directory to a tar file; append the contents of the directory which are newer than what is already on a tarfile; append subsequent tar files to the first tar file:

tar [-]rf TARFILE DIR
tar [-]uf TARFILE DIR
tar [-]Af TARFILE1 TARFILE2 ...

How to create a compressed tar file:

tar [-]czf NAME.tar.gz
tar [-]cjf NAME.tar.bz2
tar [-]cJf NAME.tar.xz

In 1988 POSIX extended the format of the header block in a backwardly compatible way. Additional header type flags were added in 2001.

header format
offset length original format ustar
0 100 file name
100 8 file mode
108 8 owner user id
116 8 group id
124 12 file size in bytes
136 12 last modification time
148 8 header checksum
156 1 type flag
157 100 name of linked file
257 6 "ustar"
263 2 "00"
265 32 owner user name
297 32 group name
329 8 device major number
337 8 device minor number
345 155 filename prefix
header type flags
flag original meaning ustar 2001
'\0' normal file
'0' normal file
'1' hard line
'2' symlink
'3' character device
'4' block device
'5' directory
'6' FIFO
'7' contiguous file
'g' global extended header
'x' extended header for the next file

patch (1985)

The patch command can apply the output of diff to the file that was the first argument of diff to recover the file that was the second argument of diff. patch reads the output of diff from standard input:

$ echo "foo" > foo.txt
$ echo "bar" > bar.txt 
$ diff foo.txt bar.txt > foo.patch
$ patch foo.txt < foo.patch 
patching file foo.txt
$ cat foo.txt 
bar

The above is only a slight improvement over what could have been achieved with diff -e and ed. The novelty of patch is its ability to apply a patch file to an entire directory:

$ mkdir old
$ echo "bar" > old/bar.txt
$ echo "baz" > old/baz.txt
$ cp -R old new
$ echo "qux" > new/bar.txt
$ diff -Naur old new > foo.patch
$ rm -rf fnew
$ patch -Np0 < foo.patch
patching file old/bar.txt
$ cat old/bar.txt 
qux

This is a good way to create a patch file:

diff -Naur OLD NEW

When creating the patch file with diff, the -u or -c flags seem to be necessary so that patch has the file names. The -N flag is necessary if files are added or removed. The -a flag prevents diff from skipping files which it thinks are binary.

If the diff was performed outside of the directories, then the patch should be performed outside of the directory to be patched with the -p0 flag. Optionally the patch can be performed inside the directory to be patched with the -p1 flag. The -N flag instructs patch to not make a change if the patch appears to be reversed or already applied.

zip (1989)

zip combines file compression and archiving. It is a better choice for sharing files with Windows hosts than tar, which most Windows hosts don't have installed.

zip [-r] [-0] ARCHIVE FILE ...
zip -d ARCHIVE FILE ...
zip -u ARCHIVE [FILE ...]

unzip -l ARCHIVE
unzip ARCHIVE [FILE ...]

Compression is the DEFLATE algorithm, or no compression if the -0 flag is used.

zip stores the file name, file size, and last modification time of the file. The information is in a header which precedes the file itself and in the "central directory" at the end of the file.

By default zip does not recursively descend directories, adding their contents to the archive. Use the -r flag to get this behavior.

diffstat (1992)

Summarizes a recursive diff on two directories. It lists the files that were modified, added, or deleted with the number of lines which changed.

jar (1995)

jar supports some of the tar commands:

jar cf NAME.jar DIR
jar tf JARFILE
jar xf JARFILE
jar uf JARFILE DIR

jar can write to and read from stdout; the syntax is different from tar:

jar c . | (cd DIR ; jar x)
jar c . | jar x -C DIR

Use jar -e to make a jar file runnable by java. The argument to -e is a class with a main routine which will be used as the entry point.

$ mkdir

$ cat > foo/A.java
package foo;

public class A {
    public static void main(String[] args) {
        System.out.println("A");
    }
}

$ sed s/A/B/ foo/A.java > foo/B.java

$ javac foo/*.java

$ jar cef foo.A foo.jar foo

$ java -jar foo.jar        
A

A jar file is a zip file; unzip can also be used to extract the contents. jar stores extra information about the jar file in META-INF/MANIFEST.MF:

$ unzip foo.jar

$ cat META-INF/MANIFEST.MF 
Manifest-Version: 1.0
Created-By: 1.6.0_26 (Sun Microsystems Inc.)
Main-Class: foo.A

rsync (1996)

A tool for copying files and directories between hosts. Usually it uses ssh. It is faster than scp when some of the files are already on the destination or when copying files that have been modified.

Here is the usage for putting and getting:

  rysnc -a PATH ... HOST:PATH
  rsync -a HOST:'PATH ...' PATH

The -a flag is equivalent to the flags -rptoglD which (1) recursively copy the contents of directories, (2) copy file permissions, (3) copy file times, (4) copy owner, (5) copy group, (6) copy symlinks, and (7) copy special devices.

Other useful flags are -v for verbose mode and --exclude which takes a file glob pattern to specify files to skip.

If the source and target paths have the same basename, then rsync will copy the contents of the source into the contents of the target. If the basenames are different, rsync will create a directory with the same name as the source inside the target. This behavior can be suppressed by putting a trailing slash / on the end of the source.

rsync can be used to backup a directory on a remote host. With the --backup flag, files which are already on the destination but have been modified on the source will be copied into a separate incremental directory with a tilde (~) suffix. The --backup-dir flag can be used to specify a different incremental directory.

colordiff (2002)

A version of diff which colorizes the output. It takes the same options as diff.

issue tracker | content of this page licensed under creative commons attribution-sharealike 3.0