distributed version control: version and help | local and remote repository | working directory | track and commit | branch and merge | history | push and pulll | configuration files | external repositories | packaging | integrity check and garbage collection
version control: svn | cvs | rcs
archive and patch tools: diff | cpio | diff3 | ar | tar | patch | zip | diffstat | jar | rsync | colordiff
version and help | ||
---|---|---|
git | hg | |
show version | $ git version | $ hg version |
list subcommands | $ git help -a commonly used subcommands only: $ git help |
$ hg [-v] help -v: show aliases and global options |
get help for subcommand | $ git help CMD | $ hg help CMD |
list topic guides | $ git help -g | $ hg help |
get help for topic | $ git help TOPIC | $ hg help TOPIC |
local and remote repository | ||
git | hg | |
create repository from a directory | $ git init [DIR] | $ hg init [DIR] |
create repository with no working directory | $ git init --bare [DIR] puts repo in DIR |
none |
clone entire repository | $ git clone [-b BRANCH] [-o NAME] [--depth NUM] URL [DIR] -b: checkout BRANCH in working dir -o: assign NAME to remote repo --depth: copy commit history to depth NUM |
|
repository url formats | ssh://[user@]host.xz[:port]/path/to/repo.git git://host.xz[:port]/path/to/repo.git http[s]://host.xz[:port]/path/to/repo.git ftp[s]://host.xz[:port]/path/to/repo.git rsync://host.xz/path/to/repo.git /path/to/repo.git file:///path/to/repo.git |
local/filesystem/path[#revision] file://local/filesystem/path[#revision] http://[user[:pass]@]host[:port]/[path][#revision] https://[user[:pass]@]host[:port]/[path][#revision] ssh://[user@]host[:port]/[path][#revision] |
clone branch from repository | $ hg clone [-r REV|-b BRANCH] … URL [DIR] | |
clone repository; new repository has no working directory | $ git clone --bare URL [DIR] puts repo in DIR |
$ hg clone -U URL [DIR] puts repo in DIR/.hg |
list remote repositories | $ git [[-v] show] remote -v: show url of remote |
$ hg paths |
add remote repository | $ git remote add [-t BRANCH] … NAME URL $ git remote add [-m BRANCH] NAME URL |
edit the [paths] section of .hg/hgrc |
remove remote repository | $ git remote rm REMOTE | |
rename remote repository | $ git remote rename REMOTE NAME | |
show remote repository details | $ git remote show [-n] REMOTE -n: do not connect to remote repo |
|
edit remote repository details | $ git remote set-head REMOTE (-a|-d) BRANCH $ git remote set-url --add REMOTE URL $ git remote set-url --delete REMOTE URL $ git remote set-branches REMOTE [--add] BRANCH … |
|
working directory | ||
git | hg | |
check out version | $ git checkout [-f] (BRANCH|TREEISH) -f: overwrite changes in working dir and index |
$ hg update [-c|-C] -r REV -c: fail if changes in working dir -C: discard changes in working dir |
list modified files | $ git status [-s] [--ignored] [PATH] … -s: short format --ignored: also files excluded by .gitignore |
$ hg status [PATH] |
ignore file | .gitignore | .hgignore |
check out specific files | $ git checkout [TREEISH] -- PATHSPEC … | $ hg revert [-C] [-r REV] PATH … -C: do not create backups w/ .orig suffix |
check out from index | $ git checkout -p PATHSPEC … | no index |
clear index | $ git reset | |
clear index and working directory | $ git reset --hard | $ hg revert -C -a |
list or remove untracked files | $ git clean (-f|-n) -f: remove untracked files -n: list untracked files |
$ hg purge [-p] -p: remove untracked files without -p untracked files are listed |
move working directory changes to shelf | $ git stash [save [STR]] | $ hg [-n STR] shelve |
list sets of changes in shelf | $ git stash list | $ hg shelve --list [-p] -p: show each set of changes as diff |
show set of changes in shelf as diff | $ git stash show [STASH] | none |
restore changes from shelf | $ git stash pop [STASH] | $ hg unshelve [STR] |
delete set of changes from shelf | $ git stash drop [STASH] | $ hg shelve -d STR |
clear shelf | $ git stash clear | $ hg shelve --clear |
track and commit | ||
git | hg | |
track files | files are not tracked; changes must be added to staging area before each commit | $ hg add PATH … |
track files matching pattern | $ hg add -I PATTERN | |
track all files in working directory | $ hg add | |
add modified or new files to staging area | $ git add PATHSPEC … -u: files already under version control only |
working directory is the staging area |
add part of modified file to staging area | $ git add -e PATH | |
track any new files in working directory, and remove any tracked files not in working directory | $ hg addremove [PATH] … | |
remove files from working directory and next commit | $ git rm [-f] PATH … -f: force if changed in index |
$ hg remove [-f] PATH … -f: force if added or changed |
remove files matching pattern | $ git rm [-f] PATHSPEC … -f: force if changed in index |
$ hg remove [-f] -I PATTERN -f: force if added or changed |
remove files in next commit which are no longer in working directory | $ hg remove -A PATH … | |
mark file to be removed in next commit without removing from working directory | $ hg forget PATH … | |
remove subdirectory from working directory and next commit | $ git rm -r DIR | $ hg remove DIR |
remove files from index | $ git rm --cached FILE … | |
copy files from head to index | $ git reset -p FILE … | |
copy files from index to working directory | $ git checkout -p PATH … | |
move file in working directory and next commit | $ git mv OLDPATH NEWPATH | $ hg rename OLDPATH NEWPATH |
move files into directory | $ git mv PATH … DIR | $ hg rename PATH … DIR |
copy file in working directory and next commit | $ hg copy (-A|-f) SRC_PATH DEST_PATH -A: if file already copied -f: if target is tracked |
|
show difference between staging area and head | $ git diff --cached [PATHSPEC …] | $ hg diff [PATH …] |
show difference between working directory and staging area | $ git diff [PATHSPEC …] | working directory and staging area are the same |
diff options | --name-only: list modified file names --name-status: status (M, A, D, R, ..) and modified file names --stat: histogram of changes by file --dirstat: histogram of changes by directory --word-diff: show changes to line inline --word-diff-regex=REGEX: set regex used by --word-diff -W: show entire modified function in context -R: reverse direction of diff -w: ignore whitespace differences --ignore-blank-lines: ignore blank lines --quiet: no output; exit status 1 if changes, otherwise 0 |
--stat: histogram of changes by file -p: show entire modified function in context -U NUM: show NUM lines of context --reverse: reverse direction of diff -w: ignore whitespace differences -b: ignore blank lines -X PATTERN: exclude files matching fileglob -S: recurse into subrepos |
grep staging area | $ git grep --cached [-i] [-v] [-E|F|P] [-h|H] [-l|L] [-n] \ -e STR TREEISH [--] PATHSPEC |
|
commit changes in staging area | $ git commit [-a] [-m STR] -a: add all modified files with version history to staging area before commit. |
$ hg commit [-m STR] |
commit changes to selected files in staging area | $ git commit [-m STR] PATH … | $ hg commit [-m STR] PATH … |
commit changes in working directory | $ hg commit -A [-m STR] | |
amend most recent commit | $ git commit --amend | $ hg commit --amend |
change author of most recent commit | $ git commit --amend --author=STR | $ hg commit --amend -u STR |
commit identifiers | 40 hex digit hashes, e.g: bbf3837d6c9bb54f11a4c620a1c81975156c2a49 A prefix can be used to refer to a commit if it uniquely specifies it; often an 8 digit prefix is sufficient. HEAD refers to the most recent commit of the current branch. |
Each revision has a 40 hex digit hash and a sequential integer identifier. The hash is unique across all repositories, but the sequential integer is local to the repository. A prefix can be used to refer to a revision if it uniquely specifies it; often an 8 digit prefix is sufficient. A period . refers to the parent revision of the working directory. tip refers to the most recent revision in the repository. |
other ways to refer to commits | A circumflex postfix can be used to refer to the parent commit of a commit, e.g. bbf3837d6^ The circumflex can be used multiple times to refer to a grandparent, great-grandparent, and so on: bbf3837d6^^ bbf3837d6^^^ An alternative to multiple circumflexes is tilde notation: bbf3837d6~2 bbf3837d6~3 when a commit has multiple parents (i.e. the commit is a merge), use a circumflex followed by a number: bbf3837d6^2 |
|
resolve commit notation | $ git rev-parse COMMIT e.g. to get commit id of parent of tip: git rev-parse HEAD^ |
|
create commit which reverts another commit | $ git revert [-n] COMMIT -n: no commit, just change working directory |
$ hg backout -r REV |
create commit which reverts a merge commit | $ git revert [-n] -m NUM COMMIT NUM is the number of the parent (e.g. 1, 2, …) to restore |
|
create commits which revert a sequence of commits | $ git revert [-n] COMMIT1..COMMIT2 reverts commits from COMMIT1 up to but not including COMMIT2. Use triple dots ... to include COMMIT2. |
|
tag a commit | $ git tag [-f] NAME [COMMIT] -f: replace existing tag with same NAME if no commit specified, HEAD is tagged |
$ hg tag [-r REV] NAME |
delete a tag | $ git tag -d TAG | $ hg tag --remove NAME |
list tags | $ git tag | $ hg tags |
branch and merge | ||
git | hg | |
current branch | $ git rev-parse --abbrev-ref HEAD | |
list branches | $ git branch [-r|-a] -r: list remote tracking branches -a: list local and remote tracking branches |
$ hg bookmarks |
list branches by commit | $ git branch (--contains|--merged) COMMIT --contains: branches descended from COMMIT --merged:branches ancestors of COMMIT |
|
checkout branch | $ git checkout BRANCH -f: discard changes in index and working directory |
$ hg update BRANCH -C: discard changes in working directory -c: fail if changes in working directory |
move branch head without changing working directory | $ git reset [--soft] COMMIT | |
create new branch by cloning branch | $ git checkout -b NAME [BRANCH] | $ hg bookmarks NAME |
create a tracking branch | $ git branch --track NAME [BRANCH] | |
create a branch from a commit | $ git branch NAME COMMIT | $ hg bookmarks -r REV NAME |
rename a branch | $ git branch -m BRANCH NAME | $ hg bookmarks -m NAME1 NAME2 |
delete a branch | $ git branch (-d|-D) BRANCH -d: fail if tracking branch with unmerged commits -D: delete even if unmerged commits exist |
$ hg bookmarks -d NAME |
list "dedicated commit" branches | $ hg branches | |
show current "dedicated commit" branch | $ hg branch | |
change branch of next commit | $ hg branch BRANCH | |
list branch tips | $ hg heads [-c] -c: include closed tips |
|
close branch tip | $ hg commit --close-branch | |
merge | $ git merge [--squash] COMMIT … --squash: make changes to index and working directory only; do not create a commit |
$ hg merge [[-r] REV] Only modifies working directory; must be following by hg commit to create a new changeset. |
show conflicts | $ git status | $ hg resolve -l |
mark file with conflicts as resolved | $ git add PATH | $ hg resolve PATH |
unmark file with conflicts as resolved | $ hg resolve -u PATH | |
abort merge | $ git merge --abort | $ hg update -C |
rebase current branch | $ git rebase BRANCH | |
continue rebase | $ git rebase --continue | |
abort rebase | $ git rebase --abort | |
squash commits | $ git rebase -i COMMIT commits after COMMIT can be squashed |
|
apply commit to current branch | $ git cherry-pick COMMIT | $ hg graft -r REV |
continue cherry pick | $ git cherry-pick --continue | |
abort cherry pick | $ git cherry-pick --abort | |
create detached head from branch and sequence of commits | $ git rebase onto BRANCH COMMIT1 COMMIT2 | |
history | ||
git | hg | |
write version of file to standard out | $ git show COMMIT:FILE | $ hg cat -r REV FILE |
annotate lines of file with commit info | $ git blame [-l] [-s] [-s] PATH [COMMIT] -l: show full commit id (default is 8 chars) -s: suppress author name and timestamp -w: ignore whitespace differences |
$ hg annotate -cudln [-r REV] [PATH] -c: changeset -u: author -d: date -l: line number -n: local revision number |
commits which are ancestors of head/tip | $ git log [--parents] --parents: after commit identifier show parent commit identifiers |
$ hg log |
commits as graph | $ git log --graph | $ hg log -G |
first parent history | $ git log --first-parent | |
chronological order | $ git log --reverse | |
all commits in repository | $ git log [--source] --all --source: print ref name after each commit |
|
limit commits | $ git log [--skip=NUM] -(n NUM|-NUM) --skip: skip first NUM commits |
|
commits which touched files | $ git log [--follow] [--] PATH … --follow: follow renamed files --: prevent interpreting PATH as option |
$ hg log [-f] PATH … -f: follow copied and renamed files |
commits which touched lines | $ git log -L NUM,NUM:PATH | |
one line commits | $ git log --oneline | |
format string for commit | $ git log —pretty=format:FORMAT format string specifiers: %H commit hash %h abbrev. commit hash %T tree hash %t abbrev. tree hash %P parent hash(es) %p abbrev. parent hash(es) %s subject %b body %an author name %ae author email %ad author date %cn committer name %ce committer email %cd committer date %n newline %% percent sign |
|
show commit diffs | $ git log -p | |
show commits touching lines matching regular expression | $ git log -p [--pickaxe-all] -G REGEX | $ hg grep PATTERN |
grep commit messages | $ git log --grep=REGEX | $ hg log --keyword STR case insensitive search; also searches source |
show changes to head | $ git reflog The output format is: <commit> HEAD@{<num>} <description> <num> is the number of changes since HEAD was at <commit>. HEAD@{<num>} can be used as an alias for <commit>. This outputs reflog info in log style: $ git log -g |
|
difference between commit and its parent | $ git diff [--name-only] COMMIT^ COMMIT [--] [PATH …] | $ hg diff -c REV [PATH …] |
difference between two comits | $ git diff [--name-only] COMMIT1 COMMIT2 [--] [PATH …] | $ hg diff -r REV1 -r REV2 [PATH …] |
diff options | --name-only: list modified file names --name-status: status (M, A, D, R, ..) and modified file names --stat: histogram of changes by file --dirstat: histogram of changes by directory --word-diff: show changes to line inline --word-diff-regex=REGEX: set regex used by --word-diff -W: show entire modified function in context -R: reverse direction of diff -w: ignore whitespace differences --ignore-blank-lines: ignore blank lines --quiet: no output; exit status 1 if changes, otherwise 0 |
--stat: histogram of changes by file -p: show entire modified function in context -U NUM: show NUM lines of context --reverse: reverse direction of diff -w: ignore whitespace differences -b: ignore blank lines -X PATTERN: exclude files matching fileglob -S: recurse into subrepos |
grep commit | $ git grep [-i] [-v] [-E|F|P] [-h|H] [-l|L] [-n] \ -e STR TREEISH [--] PATHSPEC |
|
start bisection | $ git bisect BAD_COMMIT GOOD_COMMIT | |
mark bisection commit as good | $ git bisect good | |
mark bisection commit as bad | $ git bisect bad | |
mark bisection commits automatically | $ git bisect run SCRIPT [ARG]… Exit status of 0 indicates a good commit. Exit status of 125 indicates an untestable commit. Any other status in the range 1…127 indicates a bad commit. |
|
show bisection decisions | $ git bisect log | |
terminate bisection | $ git bisect reset | |
push and pull | ||
git | hg | |
pull commits from remote | $ git fetch [-f] [-t] [-p] [REPO] -f: force if update not a fast-forward -t: copy tags -p: remove local references gone from remote Pulls from origin if REPO not specified. Updates remote tracking branches per refspec of REPO in .git/config. |
$ hg pull [-u] [SOURCE] -u: update working directory to most recent changeset Pushes all revisions in local repository not on remote repository. If no SOURCE specified, uses value of default in the [paths] section of .hg/hgrc. |
pull commits from remote for branch | $ git fetch [-f] [-t] [-p] REPO REFSPEC | $ hg pull (-b BRANCH|-B BOOKMARK|-r REV) … [SOURCE] -b: pull revisions on BRANCH and parents -B: pull BOOKMARKed revision and parents -r: pull REV and parents |
pull commits from multiple remotes | $ git fetch [-f] [-t] [-p] (--all|--multiple REPO …) | |
show remote commits available for pulling | $ hg incoming | |
pull commits from remote and merge current branch with remote head | $ git pull [-f] REPO [REFSPEC] | |
push commits to remote | $ git push [-f] [--prune] [--tags] [REPO] | $ hg push [-f] [SOURCE] |
push commits to remote for branch | $ git push [-f] [-u] REPO [BRANCH] … | $ hg push [--new-branch] (-b BRANCH|-B BOOKMARK|-r REV) … [SOURCE] |
delete remote branches | $ git push --delete REPO BRANCH … | |
show commits which have not been pushed | $ hg outgoing | |
move commits by archive | $ git bundle | $ hg bundle $ hg unbundle |
configuration files | ||
git | hg | |
repository config file | .git/config | .hg/hgrc |
user config file | ~/.gitconfig | ~/.hgrc |
show config options | $ git help config | $ hg help config |
set config value when cloning | $ git clone [-c SECTION.KEY=VAL] URL [DIR] | |
list configuration settings | $ git config -l [--global] | $ hg showconfig |
show specific configuration setting | $ git config —get [global] SECTION.KEY | |
external repositories | ||
git | hg | |
external repository file | .gitmodules | .hgsub |
external repository file format | [submodule "vendor/modules/system"] path = vendor/modules/system url = ../dm-kohana-core.git |
|
add submodule | $ git submodule add URL PATH Records submodule in .gitmodules. If PATH does not exist, URL is cloned there. |
|
register submodules | $ git submodule init Copies data from .gitmodules in index to .git/config. |
|
$ git submodule update | ||
clone repository and external repositories | $ git clone --recursive URL | |
packaging | ||
git | hg | |
create tarball | $ git archive --format=tar TREEISH > repo.tar | $ hg archive -t tar ../NAME.tar |
create gzipped tarball | $ git archive --format=tgz TREEISH > repo.tgz | |
create zip archive | $ git archive --format=zip TREEISH > repo.zip | |
create a patch | $ git format-patch ??? | $ hg export ??? |
apply patch | $ git apply ??? | $ hg import ??? |
integrity check and garbage collection | ||
git | hg | |
integrity check | $ git fsck | $ hg verify |
garbage collection | $ git gc | none |
_______________________________________________ | _______________________________________________________________________ | _______________________________________________________________________ |
Metasyntactic Variables
git | |
---|---|
BRANCH | the name of a branch. |
CMD | the name of a version control command: the first argument of the base command. |
COMMIT | the HASH for a commit. A commit can be referenced indirectly via a branch or tag name or via commit notation. The symbolic references HEAD or FETCH_HEAD can also be used to reference commits. |
DIR | a directory on the file system. In some cases it must exist; in others it will be created. |
FILE | a regular file on the file system. In some cases it must exist; in others it will be created. |
HASH | a 40 digit hex string used as an identifier for something in the object database. |
HEAD | the literal string HEAD. |
NAME | a name for an entity which will be created. Usually there are restrictions on the characters that can be used. |
PATH | a path on the file system. In some cases it must exist; in others it will be created. |
PATHSPEC | like a file glob pattern, except that ? and * can match the directory separator: /. Characters special to the shell must be escaped. |
REFSPEC | [+]SRC_REF:DEST_REF where SRC_REF and DEST_REF are ref paths relative to the .git directory. SRC_REF is on the remote repository in a fetch or a pull and on the local repository in a push. An asterisk can be used in place of a component of the relative path to match everything in the directory. If the SRC_REF has an asterisk, the DEST_REF must also have one. A plus sign prefix + is used to indicate that the update should be made even when it is not a fast-forward. If the SRC_REF is the empty string, then the DEST_REF is deleted. Leading components of SRC_REF or DEST_REF can be omitted if no ambiguity results. |
REMOTE | the name of a remote. |
REPO | A REMOTE or a URL. |
STASH | stash identifier format: stash@{0}, stash@{1}, … |
STR | a string. There are no restrictions on the characters that can be used, but if they include whitespace or characters special to the shell they must be escaped or quoted. |
TREEISH | the HASH for a tree, a commit, or a tag. If the HASH is for a commit or a tag the tree in the commit is used. |
URL | a url for a repository. |
hg | |
BRANCH | the name of a branch. |
CMD | the name of a version control command: the first argument of the base command. |
DIR | a directory on the file system. In some cases it must exist; in others it will be created. |
FILE | a regular file on the file system. In some cases it must exist; in others it will be created. |
NAME | a name for an entity which will be created. Usually there are restrictions on the characters that can be used. |
PATH | a path on the file system. In some cases it must exist; in others it will be created. |
PATTERN | a file glob pattern. The metacharacters ?, *, and ** are supported. Characters special to the shell must be escaped. |
REV | the revision number for a changeset. It can be either the local revision number, which is a small decimal integer, or the 12 hex digit universal revision number. |
SOURCE | A URL or a name for a URL in the [paths] section of the .hg/hgrc file |
STR | a string. There are no restrictions on the characters that can be used, but if they include whitespace or characters special to the shell they must be escaped or quoted. |
URL | a url for a repository. |
Help
show version
list subcommands
get help for subcommand
list topic guides
get help for topic
Local and Remote Repository
When a set of files is edited, we can view the set of files as a sequence of versions or revisions which are ordered in time. A repository is a record of the versions of a set of files; it permits recovering the set files as they were at any of the recorded points in time.
A working directory (q.v the working directory of a process) can contain a copy of one of the file set versions; it can also be edited to create a new version of the file set. A commit is the act of recording the new version of the file set in the repository. This is done by comparing the working directory with the previous version or an empty file set in the case of an initial commit. The commit can also be regarded as the difference—that is, the output of diff -r—between the previous and the new version. This difference is also called the changeset.
When calculating the changeset, the version control system may ignore files in the working directory which are not tracked files.
A set of files and directories under version control is called a repository.
git
A file or directory under version control has one or more versions. One adds new versions to the repository by making a commit. The set of all files and directories in the repository can also be seen as having versions; these versions are called commits; they consist of at most one version of each file or directory in the repository.
hg
A file or directory under version control has one or more revisions. One adds new revisions to the repository by making a commit. The set of files and directories in the repository can also be seen as having revisions; these revisions are called changesets.
Working Directory
check out version
git:
These two commands appear to do the same thing:
$ git checkout -f COMMIT
$ git reset --hard COMMIT
list modified files
ignore file
git
A list of file patterns, one per line. The patterns specify files that git status and git add should ignore. Shell glob syntax (i.e. the asterisk: *) can be used.
A .gitignore can be placed in any directory in the repository. The rules in a given .giitignore file will only apply to the current directory and the directories beneath it.
Lines starting with a pound sign: # are ignored.
A pattern starting with an exclamation point: ! will negate a pattern. This can be used to include files that were excluded by a pattern higher in the file matching a broader set of files.
hg
Unlike .gitignore, an .hgignore file must be in the root of the working directory.
The format is one Perl regular expression per line. All files which match the regular expression will be ignored.
Comments start with the pound sign: #
It is also possible to use glob syntax:
# regexp to ignore twiddle files:
~$
# glob to ignore compiled python files:
syntax: glob
*.pyc
# additional patterns will use regexp format:
syntax: regexp
Track and Commit
git
Git keeps copies of all versions of files and directories that have been committed, as well as the commits themselves, in the directory .git/objects. All objects are identified by their 40 character SHA-1 checksum called the hash. There are three types of git objects in this directory: a blob, which is the contents of a file. A tree, which corresponds to file system directory; it contains the file system name of the objects, which can be blobs (regular files) or trees (directories) and their hashes. Finally, a commit contains the top level tree for the commit and the parents of the commit. There will be zero parents for the initial commit and more than one parent for a commit which was created by a merge. Git stores a separate, albeit compressed, copy of each version of a file, tree, or commit in the .git/objects directory.
The git cat-file -p HASH command, though not needed for day-to-day use, provides a way to inspect a git object. It shows the additional information stored in trees and commits which we have not mentioned here.
hg
Mercurial uses a storage format called a revlog to store the versions of a file. Most revlogs are kept in .hg/store/data. A revlog usually consists of two files: one with an .i suffix and another with a .d suffix. If the file is small and has little or no history, the revlog might consist of only a .i file. A revlog which tracks the history of a file is called a filelog. When the file is first committed, it is written to the filelog. Each time a commit is made which alters it, a delta describing the change is appended to the file. Thus, to fetch the current version of a file, all the deltas must be applied in order to the original version of the file. As a performance optimization, Mercurial will sometimes append the full version of the file to a filelog. Thus, when reconstructing the current version, one need only apply delta starting from the last time the full version was stored.
Mercurial stores a manifest for each revision of the repository. A reviion of the repository is called a changeset. The manifest is list of the pathnames relative to the root of all files in the changeset. Rather than store the manifests in separate files, all the manifests for the repository are stored in a revlog in .hg/store. Each time a new changeset is added to a repository by a push, pull, or commit command, it is assigned a local revision number which is the order in which it was appended to the local manifest revlog. If the changeset was pulled from a different repository, the local revision numbers might not match.
Information about changesets is also stored in the changelog, which is another type of revlog. The changelog has a pointer to manifest revision, pointers to parents of the changeset, and information about the committer.
git
Git has three types of objects: commits, trees, and blobs. Each is assigned a unique hash ID which is a 40 digit hex string. The identifier is called the hash, SHA1, object name, or object identifier with no difference in meaning. When the underlying object is a commit or tree it is also called a tree-ish.
Commit hashes are the hashes the user most commonly sees and needs to reference. Only as many of the digits that are necessary to uniquely identify an object in the object database need to be provided to a git command; usually the first 6 or 7 is sufficient.
HEAD is a special name which refers to the most recent commit of the current branch. It is stored in .git/HEAD. The previous commit is HEAD^ and the commit before that is HEAD^^. The is also numerical notation: HEAD~4 is 4 commits ahead of HEAD. If HEAD is the result of a merge, then the antecedents can be be referenced with HEAD^1 and HEAD^2.
hg
In Mercurial, every commit is assigned two identifiers: a local revision number and a universal changeset identifier. The local revision number is a small integer that is unique only to the local repository. The first local revision number issued is zero, and it increments with each local commit. The changeset identifier is a twelve digit hex number which is unique across all repositories.
The -r option is used to pass a mercurial commit identifier to a command. The argument can be a local revision number, a changeset identifier, or both separated by a colon.
move file in working directory and next commit
It is desirable for a version control system to track file name changes. Otherwise commands like blame and log when used on a single path will not show activity before the name change. If the version control system is aware of a name change, it can correctly handle the case when merging where the name was changed on one branch and edited on the other.
git
Although Git provides a git mv subcommand, it does not actually track name changes. Instead, it will assume that a name change occurred during a commit when one file disappeared, another appeared, and they have similar contents. Hence, even if the user uses git rm, a Unix command mv, and git add, Git will preserve the history for the file.
hg
Mercurial keeps track of the name a file had in each revision of a filelog. The hg rename subcommand must be used to preserve history.
move file
commit changes in staging area
In the case of Git, the staging area is the index.
In the case of Mercurial, the staging area is the files in the working directory which are tracked.
commit identifiers
mercurial:
null refers to an empty revision which is the parent of revision 0.
Branch and Merge
git
Git has a low level feature called a ref which it uses to implement branches and tags. A ref is a file in .git/refs which contains the hash of a commit. Branches are in .git/refs/heads and tags are in .git/refs/tags. Whenever a commit is made, the value in .git/refs/heads/BRANCH is updated where BRANCH is the current branch. The values in .git/refs/tags/TAG do not change.
The name of the branch which is currently checked out is stored in .git/HEAD. It is stored as the relative path refs/head/NAME.
Git also stores remote branches and tags in .git/refs/remotes/REPO. The git branch -r command can be used to list remote branches. Remote branches have names of the form REPO/BRANCH, and each remote branch will usually have a tracking branch, which is a local branch named BRANCH. This will be the case for any branches which were copied when a repository is created via git clone. A tracking branch can also be created when a remote repository is added using git remote -t BRANCH REPO URL. git fetch will only update remote branches. git pull will update remote branches and merge them with their tracking branches.
The default branch is called master. It is created by git init, and is the branch that is copied by git clone if no branch is explicitly specified.
Commits have zero or more parent commits. git commit creates a commit with one parent, except in the case of the initial commit. git merge creates a commit with two or more parent commits. If the commit has three or more parents, the merge is called an octopus merge.
staging numbers:
To perform a merge Git gets the tree contained in the common ancestor and puts its items into the staging area with staging number 1. It puts the current branch tree items in the staging area with staging number 2. It puts the tree items of the other branches in the staging area with staging number 3 or higher.
fast-forward commits aren't actually commits:
Suppose that bar is a branch of foo. If commits have subsequently been made to foo but not to bar, then running the following when bar is the current branch will perform a fast-forward:
git merge foo
In a fast-forward no merge commit is created. Instead the head of bar is simply moved to point to the same commit as the head of foo.
hg
A Mercurial branch is a name which is stored in a changeset. When a commit is made, the new changeset inherits the branch name of the previous changeset, unless a different name was specified before the commit with hg branch. To switch to a new branch one must make a commit.
Mercurial branches differ from Git branches in that:
- every commit belongs to a single branch
- a branch can have multiple heads
Mercurial tags are names for changesets. They are stored in the .hgtags file at the repository root. Creating a tag requires making a commit.
Mercurial does not support octopus merges. Thus changesets have at most two parents. A changeset created by hg merge sets the branch of the new changeset to be the branch of the first argument.
Changesets can have no branch specified. This is also called the default branch.
bookmarks:
Mercurial bookmarks work like Git branches, with the exception that Mercurial does not have the equivalent of Git tracking branches.
list "dedicated commit" branches
"dedicated commit" branches is the term used in this sheet to distinguish Mercurial-style branches from Git-style branches. Mercurial bookmarks are equivalent to Git-style branches.
In Git, branches are names which contain the hash of a commit. This commit is the head of the branch, and because each commit knows its parents, the entire history of the branch can be constructed. Git commits can belong to multiple branches, or they can belong to none at all. If a Git commit is not reachable from any branch or tag, it is at risk of being garbage collected.
In Mercurial, each changeset has a branch name associated with it. Thus, each changeset must belong to exactly one branch. Also, Mercurial branches can have multiple tips, whereas a Git branch always has a single head.
Mercurial does not provide a mechanism for renaming or deleting branches. The recommended way to get rid of unwanted branches is to rename the repository and then clone it to the original name with:
$ hg clone -r REV
History
Push and Pull
git
The basic command for getting changes from a remote repository origin is:
$ git fetch
Which branches are fetched is controlled by the fetch key in the remote section of .git/config. If the local repository was created by git clone, here is a likely value:
[remote "origin"]
fetch = +refs/heads/*:refs/remotes/origin/*
In this case, git fetch origin connects to the remote repository and copies all of the remote branches to refs/remotes/origin. Then it adds all remote objects referred to by the remote branches to the local objects database. It also puts the remote HEAD into FETCH_HEAD. The + indicates that local branches should be updated even if the commits are not fast-forwards.
The basic command for sending changes back to the remote repository origin is:
$ git push
Which branches are pushed is controlled by the push key in the remote section of .git/config. Here is an example entry which pushes commits on the master branch, and fails if the commits are not fast-forwards:
[remote "origin"]
push = refs/heads/master:refs/heads/master
A git pull is a git fetch followed by a git merge FETCH_HEAD, which git fetch sets to whatever was in HEAD on the remote repository.
hg
hg pull pulls changesets for all the remote branches that are also local branches unless branches are listed explicitly with the -b flag. hg pull -u is equivalent to hg pull followed by hg update. Pulling can create local branches with multiple heads, in which case an hg update will fail. An hg merge is used to merge the two heads, or an hg commit --close-branch is used to mark one of them as closed.
hg push pushes changsets for all local branches that are also remote branches unless branches are listed explicitly with the -b flag. A push which would create a branch with multiple heads will fail unless the -f flag is used. The --new-branch flag must be used to create a new branch.
Configuration Files
External Repositories
Packaging
Integrity Check and Garbage Collection
integrity check
garbage collection
Version Control
distributed | client-server | local | |||
---|---|---|---|---|---|
$ git CMD | $ hg CMD | $ svn CMD | $ cvs CMD | $ rcs CMD | |
online documentation | help | help | help | $ man cvs | $ man rcs |
repository | |||||
create new repository | init | init | init | $ mkdir RCS | |
get local copy of repository from server or existing repository | clone | clone | checkout/co | checkout/co | |
show remote repositories | remote -v show | paths | |||
add remote repository | remote add | ||||
working directory | |||||
update working directory to most recent version of a branch | checkout | update/up | update/up | update | co |
lock a file | lock | co -l | |||
unlock a file | unlock | co -u | |||
make working directory match the most recent commit | reset | revert | revert | ||
show files in working directory which don't match the most recent commit | status | status/st | status/st | status | |
show difference between file in working directory and most recent commit | diff | diff | diff/di | diff | diff |
store uncommitted working directory changes in a temporary location | stash | shelve | |||
tracking and committing | |||||
put file under version control | add | add | add | add | ci -i |
change name of a file under version control | mv | rename/mv | move/mv | ||
mark file as not present in the next commit | rm | remove/rm | delete/rm | remove | |
create new commit | commit | commit/ci | commit/ci | commit/ci | ci |
create commit which undoes the result of a previous commit | revert | backout | |||
branching and merging | |||||
create branch | branch | branch(es) | copy/cp | tag -b | ci -r |
merge branches | merge | merge | merge | ||
move commits on a branch to the end of another branch | rebase | rebase | |||
mark file with merge conflicts as resolved | add | resolve | resolve | ||
history | |||||
annotate lines of source code with commit info | blame | annotate | blame/ann | annotate | |
show commit information for current branch in reverse chronological order | log | log -b tip | log | log | log |
show difference between two commits | diff | diff | diff -rREV1 -rREV2 | ||
find commit which introduced a change | log -S | grep | |||
write contents of a file version to standard out | show | cat | cat | checkout -p | co |
give name to a commit | tag | tag(s) | copy/cp | tag | $ rcs -nTAG:REV |
pulling and pushing | |||||
show commits available to be pulled or update tracking branches | fetch | incoming/in | |||
get commits from a remote repository | pull | pull | |||
push commits to a remote repository | push | push | |||
configuration | |||||
add user information | config | ||||
___________________________________________________ | __________________________ | __________________________ | __________________________ | __________________________ | __________________________ |
sccs (1972)
- CSSC Documentation CSSC is the GNU implementation of SCCS
- The Source Code Control System Rochkind 1975
In his 1975 paper Rochkind describes SCCS as a "radical departure from conventional methods for controlling source code". SCCS was initially implemented in 1972 on the IBM 370. The implementation language was SNOBOL. Rochkind was an employee of Bell Laboratories and SCCS was soon ported to Unix where it became a cornerstone of the "Programmer's Workbench", a suite of software distributed with early Unix.
The radical departure of SCCS appears to be the decision to store every version of each file under source control. This is done in a space efficient manner by means of deltas: the original file is stored with a delta for each change. To get the most recent version of the file all of the deltas must be applied to the original file. Also stored with each delta is the name of the user who made the change, the date and time of the change, and a user supplied comment explaining the change.
SCCS introduces a file format so that the original file, the deltas, and the meta-information can all be stored in a single history file. If the original file was foo.c, a common early convention was for the history file to be named s.foo.c. In the original Unix implementation the SCCS commands were standalone Unix commands. Starting with the version of SCCS which Allman wrote for BSD Unix in 1980 the SCCS commands became arguments or subcommands to a sccs executable.
Here is an sample SCCS session. The file foo.txt is put under source control. It is then checked out, edited, and the change committed. Finally a non-editable copy of the most recent version is checked out.
$ echo "foo" > foo.txt
$ sccs admin -ifoo.txt s.foo.txt
$ rm foo.txt
$ sccs get -e s.foo.txt
$ vi foo.txt
$ sccs delta s.foo.txt
$ sccs sccsdiff -r1.1 -r1.2 s.foo.txt
$ sccs get -p s.foo.txt > foo.txt
The SCCS history file format consists of fields separated by the Ctrl-A (ASCII 1) characters. The fields are divided into headers, which contain the meta-information, and the body, which contains the original file and the deltas. The original file is given revision number 1, and the number is incremented with each change.
The body consists of the original file interspersed with nested insert blocks and delete blocks. The format for an insert block is
^AI REV
added line one
added line two
...
^AE REV
where REV is the revision number for which the lines were added. Similarly the format for a delete block is
^AD REV
deleted line one
deleted line two
...
^AE REV
When extracting a version of the file, the desired version is compared with each block. Insert blocks are ignored if they have a higher number than the desired version and delete blocks are ignored if they have a lower or equal number than the desired version.
rcs (1982)
In RCS, the history file is always identified with a ,v suffix; the history file for foo.txt is foo.txt,v. Because this convention is used consistently, RCS commands can take the original file as an argument instead of the history file like in SCCS.
One can keep the ,v files in a separate directory. RCS has built in support for using a subdirectory named RCS in the same directory as the source. When this convention is used, it is not necessary to specify both the ,v file and the source file when using rcs ci and rcs co. If the source code tree has subdirectories, each subdirectory should contain an RCS subdirectory.
RCS supports multiline commit messages and it adds the log command for getting all the commit messages for a file.
example session
Here is an example work session using RCS. It is equivalent to the SCCS work session in the previous section.
$ echo "foo" > foo.txt
$ rcs ci -i foo.txt
$ rcs co -l foo.txt
$ vi foo.txt
$ rcs ci foo.txt
$ rcs co foo.txt
make has a built-in rule for creating a file from its ,v file. The file will be checked out as read-only.
rcs file format
- man rcsfile The RCS history file format
An RCS history file has four sections:
- head
- deltas
- description
- deltatexts
The head contains the revision number of the current version. If any of the revision numbers have been assigned symbolic names (i.e tags), they are listed here. If there
There is a delta section for each revision. It contains the time the revision was added to the history file and the author.
The description is a string describing the file. Strings are delimited by ampersands @. An ampersand in the string is escaped by doubling it.
There is a deltatext section for each revision. It contains a log, which is the commit message, and the text), which is either the full text of the revision or an ed style edit describing how to generate the revision from another revision. Both the log and the text are ampersand delimited strings with ampersands escaped by doubling.
Here is an example of a text which adds two lines after line 6:
@a6 2
added line one
added line two
@
Here is an example o a text which deletes two lines after line 6:
@d6 2
@
revision numbers and branching
RCS revision numbers consist of 2n positive integers joined by decimals. n is itself positive. By default the revision number given to a file when it is first placed under version control is 1.1.
RCS revision numbers with 2 integers refer to revisions on the trunk. RCS revision numbers with 4 integers refer to branches off the trunk. The first two integers indicate the trunk revision that is the root of the branch. RCS revision numbers with 6 integers refer to branches off of a branch of the trunk.
When a commit is made, by default the last integer of the revision is incremented.
When a branch is created off of a revision that does not have a branch, the revision number of the new revision is created from the revision number of the root of the branch by appending .1.1 to it. If the root revision already has a branch off of it, the new branch revision number will have a .2.1 appended to it.
cvs (1990)
CVS uses RCS to manage the history of each file under version control.
How to set up a CVS repository:
$ mkdir cvsroot
$ export CVSROOT=/PATH/TO/cvsroot
$ cvs init
For CVS commands to work, either the CVSROOT environment variable must be set or the location of the repository root must be passed to the command with each invocation using the -d option.
It is possible to set up a server on the repository host which listens on port 2041. The server uses ssh authentication, so the client must have an account on the host. To use the server, the client sets the CVSROOT environment variable to something like this:
$ export CVSROOT=:pserver:foo.com:/PATH/TO/cvsroot
How to create a project in the repository:
$ mkdir foo
$ cd foo
$ touch README
$ cvs import foo FOO_CORP V1
The 2nd and 3rd argument to cvs import are required and are used to create tags.
Here is an example of how to check out the foo project but name the working directory bar:
$ cvs checkout -d bar foo
Files only needed to only needed to be added once, like Mercurial and unlike Git:
$ cd bar
$ vim Makefile
$ cvs status
$ cvs diff
$ cvs add Makefile
$ cvs commit -m 'adding a Makefile'
$ cvs log
Branching is a three step process: (1) tag the commit you are branching from, (2) create the branch, and (3) update the working directory to the branch.
$ cvs tag WUMPUS_ROOT
$ cvs tag -r WUMPUS_ROOT -b WUMPUS
$ cvs update -r WUMPUS
How to merge
svn (2000)
How to set up an SVN server:
$ svnadmin create /PATH/TO/svn
$ vim /PATH/TO/svn/conf/svnserve.conf
$ vim /PATH/TO/svn/conf/passwd
$ mkdir /tmp/empty
$ svn import /tmp/empty file:///PATH/TO/svn/NAME
$ svnserve -d -r /PATH/TO/svn --log-file /PATH/TO/svn/svnserve.log
$ svn co svn://localhost/NAME
When editing svn/conf/svnserve.conf, add these lines to the [general] section:
anon-access = none
auth-access = write
password-db = passwd
In svn/conf/passwd, create a username and password:
joe = passwerd123
To have svnserver start automatically at boot on an Ubuntu server, put this script in /etc/init.d. Change the value of DAEMON_ARGS from -d -r /usr/local/svn/repos to -d -r /PATH/TO/svn —log-file /PATH/TO/svnserve.log.
Archive and Patch Tools
diff | cpio | diff3 | ar | tar | patch | zip | diffstat | jar | rsync | colordiff
diff (1974)
- An Algorithm for Differential File Comparison Hunt & McIlroy 1976
- man diff
To implement an efficient version control system one needs to find a minimal delta or difference between two similar text files. The problem led to the development of the Unix diff utility. Regarding a file as a sequence of lines, the problem can be treated as an example of the longest common subsequence problem. The standard solution to this problem has O(nm) performance in both time and space, where n and m are the lengths of the two files. To facilitate quick comparison of lines, each line is replaced with a hash code. When implementing diff McIlroy developed an algorithm that was more efficient than the standard solution in most cases.
The standard diff notation prefixes lines with < and > to indicate whether the line originated in the first or second file. It also uses the letters a, c, and d to indicate lines being added, changed, or deleted:
$ echo "foo" > foo.txt
$ echo "bar" > bar.txt
$ diff foo.txt bar.txt
1c1
< foo
---
> bar
$ diff foo.txt /dev/null
1d0
< foo
$ diff /dev/null foo.txt
0a1
> foo
These letters used in diff notation are also ed commands. In fact, diff -e will output an ed script which can be used to convert the first file into the second:
$ diff -e foo.txt bar.txt > diff.ed
$ ( cat diff.ed ; echo "w" ) | ed foo.txt
The version of diff released with BSD 2.8 in 1981 added the -c option to show three lines of context around each change. This is called the context format.
The BSD 2.8 diff also added an -r option to perform a recursive diff on directories.
In 1990 the -u option was added, which gives a diff inunified format. In the context format, if a line is changed, the context is repeated: once around the old version of the line and once around the new. The uniformed format puts both version of the line in the same context, reducing the size of the diff file.
The -C NUM and -U NUM options are like the -c} and {{-u options, except that they show NUM lines of context.
normal format:
$ sed s/^root:/ROOT:/ /etc/passwd > /tmp/passwd
$ diff /etc/passwd /tmp/passwd
12c12
< root:*:0:0:System Administrator:/var/root:/bin/sh
---
> ROOT:*:0:0:System Administrator:/var/root:/bin/sh
ed script format:
$ sed s/^root:/ROOT:/ /etc/passwd > /tmp/passwd
$ diff -e /etc/passwd /tmp/passwd
12c
ROOT:*:0:0:System Administrator:/var/root:/bin/sh
.
context format:
$ sed s/^root:/ROOT:/ /etc/passwd > /tmp/passwd
$ diff -c /etc/passwd /tmp/passwd
*** /etc/passwd 2013-10-24 17:38:39.000000000 -0700
--- /tmp/passwd 2014-04-26 12:57:57.000000000 -0700
***************
*** 9,15 ****
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
! root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
--- 9,15 ----
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
! ROOT:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
unified format:
$ sed s/^root:/ROOT:/ /etc/passwd > /tmp/passwd
$ diff -u /etc/passwd /tmp/passwd
--- /etc/passwd 2013-10-24 17:38:39.000000000 -0700
+++ /tmp/passwd 2014-04-26 12:57:57.000000000 -0700
@@ -9,7 +9,7 @@
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
-root:*:0:0:System Administrator:/var/root:/bin/sh
+ROOT:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
recursive format:
$ mkdir /tmp/a /tmp/b
$ cp /etc/passwd /tmp/a
$ $ sed s/^root:/ROOT:/ /etc/passwd > /tmp/b/passwd
$ diff -r /tmp/a /tmp/b
diff -r /tmp/a/passwd /tmp/b/passwd
12c12
< root:*:0:0:System Administrator:/var/root:/bin/sh
---
> ROOT:*:0:0:System Administrator:/var/root:/bin/sh
cpio (1977)
An ancient and to most people unfamiliar Unix archiving tool which is roughly equivalent to tar. The suffix .cpio is often used for cpio archive files.
The format is used by RPM packages, though RPM 5.0 and later also support the xar format. The Linux kernel since version 2.6 has a cpio archive called initramfs which it uses during the boot process. cpio is also used by the Mac OS X .pkg format.
The cpio file format is similar to the tar file format in that for each file which is added to an archive, a header and the file contents are appended to the archive file. In the case of cpio the header is smaller (76 bytes vs 512 bytes). This is in part because the header only contains the file name length; the actual file name is appended to the archive file between the header and the file contents. By contrast the tar format stores the name in fixed length fields, putting a limit on the possible path length. Another different is the cpio format lacks a checksum.
header format | |||
---|---|---|---|
offset | length | field | description |
0 | 6 | c_magic | The identifying value "070707" |
6 | 6 | c_dev | |
12 | 6 | c_ino | c_dev and c_ino together must be unique for each file in the archive |
18 | 6 | c_mode | |
24 | 6 | c_uid | |
30 | 6 | c_gid | |
36 | 6 | c_nlink | number of links to the file in the archive; can be incorrect if the -a flag was used to append files |
42 | 6 | c_rdev | a place for implementations to store character or block special file information |
48 | 11 | c_mtime | |
59 | 6 | c_namesize | |
65 | 11 | c_filesize |
Another difference between tar and cpio is that whereas tar takes the files to be archived on the command line, recursively descending any arguments which are directories, cpio when used with the -o flag takes its list of files to be archived from standard input. cpio was designed to be used with the find command. Similarly when using the i flag cpio reads the files to be extracted from an archive from standard input.
diff3 (1979)
diff3 displays the differences between three versions of the same file.
The three way diff is the foundation of branch merging. A two way diff is insufficient for merging because deleting a line in one branch looks like adding a line in the other branch. Only by comparing both branches with the original can these two cases be distinguished.
diff3 has three basic invocations:
diff3 EDIT1 ORIG EDIT2
diff3 -e EDIT1 ORIG EDIT2
diff3 -m EDIT1 ORIG EDIT2
The first invocation writes a description of the three-way diff to standard out.
The second invocation writes an ed script to standard out which will merge the changes in EDIT2 to EDIT1.
The third invocation performs the merge. It writes a version of the file with changes from both EDIT1 and EDIT2 to standard out.
Here is an example of the output format used by the first invocation:
$ cat /tmp/orig.txt
a
b
c
d
e
$ cat /tmp/edit1.txt
a
b1
c
d
e
f
$ cat /tmp/edit2.txt
a
b
c
d1
e
$ diff3 /tmp/edit1.txt /tmp/orig.txt /tmp/edit2.txt
====1
1:2c
b1
2:2c
3:2c
b
====3
1:4c
2:4c
d
3:4c
d1
====1
1:6c
f
2:5a
3:5a
Each hunk of the diff3 output starts with four hyphens. All of the hunks in the example above are two-way hunks, meaning that two of the three files are the same. In this case the number of the differing file as it appears in the diff3 arguments is placed after the hyphens.
Here is an example of a three-way hunk, where all three files differ and no number is placed after the hyphens:
$ cat /tmp/orig.txt
a
$ cat /tmp/edit1.txt
a1
$ cat /tmp/edit2.txt
a2
$ diff3 /tmp/edit1.txt /tmp/orig.txt /tmp/edit2.txt
====
1:1c
a1
2:1c
a
3:1c
a2
ar (1979)
A tool on Unix systems to create static libraries from compiled objects. In other words, to create a .a file from a set of .o files. The format is understood by the linker—which these days is usually built into the compiler—and the loader ld.
The command line interface is broadly similar to tar. Here is how to create an archive; remove files from an archive; list the archive contents; extract files from an archive:
ar -c NAME.a FILE ...
ar -d ARCHIVE FILE ...
ar -t ARCHIVE
ar -x ARCHIVE FILE ...
The ar file format is not standardized and may differ between systems.
The file format used by GNU ar on Linux starts with the new line terminated string "!<arch>".
Each file starts with a 60 bytes header, followed by the file contents. The header has the following fixed-width fields:
offfset | length | name |
---|---|---|
0 | 16 | file name in ASCII |
16 | 12 | file modification timestamp |
28 | 6 | uid |
34 | 6 | gid |
40 | 8 | file mode |
48 | 10 | file size in bytes |
58 | 2 | 0x60 0x0A |
The space allocated for the file name in the header is quite short. GNU ar actually stores a special file named "//" in the archive with a new line separated list of file names. A header can reference a name in this special file by storing "/" and a the decimal offset in the "//" file of the file name. When file names are stored directly in the header, a "/" is used to mark the end of the file and the rest of the field is space padded. This supports spaces in the file name.
GNU ar also stores a special file named "/" is the archive for a symbol table. The format is
- a 32-bit integer containing the number of symbols
- a list of 32-bit integers, one for each symbol, containing the offset of the header in the archive for the file containing the symbol
- a list of null terminated strings, in the same order as the previous list, containing the symbol names
tar (1979)
The more portable twin of ar. Originally used for creating and using magnetic tape archives.
How to create a tar file; list the contents of a tar file; compare a tar file with the file system; and extract the contents of a tar file:
tar [-]cf NAME.tar DIR
tar [-]tf TARFILE
tar [-]df TARFILE [DIR]
tar [-]xf TARFILE
The -v option can be used with -c or -x to write the files being added or extracted to standard error.
Tar files store the files in sequential order. Each file is precede by a 512 byte header. The file itself is null byte padded to a multiple of 512 bytes.
Tar can write to and read from stdout. The following two invocations behave identically:
tar cf - . | (cd DIR ; tar xf -)
tar cf . - | tar xf - -C DIR
Tar can append data to an existing tar file. These commands append the contents of a directory to a tar file; append the contents of the directory which are newer than what is already on a tarfile; append subsequent tar files to the first tar file:
tar [-]rf TARFILE DIR
tar [-]uf TARFILE DIR
tar [-]Af TARFILE1 TARFILE2 ...
How to create a compressed tar file:
tar [-]czf NAME.tar.gz
tar [-]cjf NAME.tar.bz2
tar [-]cJf NAME.tar.xz
In 1988 POSIX extended the format of the header block in a backwardly compatible way. Additional header type flags were added in 2001.
header format | |||
---|---|---|---|
offset | length | original format | ustar |
0 | 100 | file name | |
100 | 8 | file mode | |
108 | 8 | owner user id | |
116 | 8 | group id | |
124 | 12 | file size in bytes | |
136 | 12 | last modification time | |
148 | 8 | header checksum | |
156 | 1 | type flag | |
157 | 100 | name of linked file | |
257 | 6 | "ustar" | |
263 | 2 | "00" | |
265 | 32 | owner user name | |
297 | 32 | group name | |
329 | 8 | device major number | |
337 | 8 | device minor number | |
345 | 155 | filename prefix |
header type flags | |||
---|---|---|---|
flag | original meaning | ustar | 2001 |
'\0' | normal file | ||
'0' | normal file | ||
'1' | hard line | ||
'2' | symlink | ||
'3' | character device | ||
'4' | block device | ||
'5' | directory | ||
'6' | FIFO | ||
'7' | contiguous file | ||
'g' | global extended header | ||
'x' | extended header for the next file |
patch (1985)
The patch command can apply the output of diff to the file that was the first argument of diff to recover the file that was the second argument of diff. patch reads the output of diff from standard input:
$ echo "foo" > foo.txt
$ echo "bar" > bar.txt
$ diff foo.txt bar.txt > foo.patch
$ patch foo.txt < foo.patch
patching file foo.txt
$ cat foo.txt
bar
The above is only a slight improvement over what could have been achieved with diff -e and ed. The novelty of patch is its ability to apply a patch file to an entire directory:
$ mkdir old
$ echo "bar" > old/bar.txt
$ echo "baz" > old/baz.txt
$ cp -R old new
$ echo "qux" > new/bar.txt
$ diff -Naur old new > foo.patch
$ rm -rf fnew
$ patch -Np0 < foo.patch
patching file old/bar.txt
$ cat old/bar.txt
qux
This is a good way to create a patch file:
diff -Naur OLD NEW
When creating the patch file with diff, the -u or -c flags seem to be necessary so that patch has the file names. The -N flag is necessary if files are added or removed. The -a flag prevents diff from skipping files which it thinks are binary.
If the diff was performed outside of the directories, then the patch should be performed outside of the directory to be patched with the -p0 flag. Optionally the patch can be performed inside the directory to be patched with the -p1 flag. The -N flag instructs patch to not make a change if the patch appears to be reversed or already applied.
zip (1989)
zip combines file compression and archiving. It is a better choice for sharing files with Windows hosts than tar, which most Windows hosts don't have installed.
zip [-r] [-0] ARCHIVE FILE ...
zip -d ARCHIVE FILE ...
zip -u ARCHIVE [FILE ...]
unzip -l ARCHIVE
unzip ARCHIVE [FILE ...]
Compression is the DEFLATE algorithm, or no compression if the -0 flag is used.
zip stores the file name, file size, and last modification time of the file. The information is in a header which precedes the file itself and in the "central directory" at the end of the file.
By default zip does not recursively descend directories, adding their contents to the archive. Use the -r flag to get this behavior.
diffstat (1992)
Summarizes a recursive diff on two directories. It lists the files that were modified, added, or deleted with the number of lines which changed.
jar (1995)
jar supports some of the tar commands:
jar cf NAME.jar DIR
jar tf JARFILE
jar xf JARFILE
jar uf JARFILE DIR
jar can write to and read from stdout; the syntax is different from tar:
jar c . | (cd DIR ; jar x)
jar c . | jar x -C DIR
Use jar -e to make a jar file runnable by java. The argument to -e is a class with a main routine which will be used as the entry point.
$ mkdir
$ cat > foo/A.java
package foo;
public class A {
public static void main(String[] args) {
System.out.println("A");
}
}
$ sed s/A/B/ foo/A.java > foo/B.java
$ javac foo/*.java
$ jar cef foo.A foo.jar foo
$ java -jar foo.jar
A
A jar file is a zip file; unzip can also be used to extract the contents. jar stores extra information about the jar file in META-INF/MANIFEST.MF:
$ unzip foo.jar
$ cat META-INF/MANIFEST.MF
Manifest-Version: 1.0
Created-By: 1.6.0_26 (Sun Microsystems Inc.)
Main-Class: foo.A
rsync (1996)
A tool for copying files and directories between hosts. Usually it uses ssh. It is faster than scp when some of the files are already on the destination or when copying files that have been modified.
Here is the usage for putting and getting:
rysnc -a PATH ... HOST:PATH
rsync -a HOST:'PATH ...' PATH
The -a flag is equivalent to the flags -rptoglD which (1) recursively copy the contents of directories, (2) copy file permissions, (3) copy file times, (4) copy owner, (5) copy group, (6) copy symlinks, and (7) copy special devices.
Other useful flags are -v for verbose mode and --exclude which takes a file glob pattern to specify files to skip.
If the source and target paths have the same basename, then rsync will copy the contents of the source into the contents of the target. If the basenames are different, rsync will create a directory with the same name as the source inside the target. This behavior can be suppressed by putting a trailing slash / on the end of the source.
rsync can be used to backup a directory on a remote host. With the --backup flag, files which are already on the destination but have been modified on the source will be copied into a separate incremental directory with a tilde (~) suffix. The --backup-dir flag can be used to specify a different incremental directory.
colordiff (2002)
A version of diff which colorizes the output. It takes the same options as diff.