Description:
Here are some of the conventions we use in the
Metamath Proof Explorer (aka "set.mm"), and how they correspond to
typical textbook language (skipping the many cases
where they are identical).
For conventions related to labels, see conventions-label 27259.
- Notation.
Where possible, the notation attempts to conform to modern
conventions, with variations due to our choice of the axiom system
or to make proofs shorter. However, our notation is strictly
sequential (left-to-right). For example, summation is written in the
form Σ𝑘 ∈ 𝐴𝐵 (df-sum 14417) which denotes that index
variable 𝑘 ranges over 𝐴 when evaluating 𝐵. Thus,
Σ𝑘 ∈ ℕ (1 / (2↑𝑘)) = 1 means 1/2 + 1/4 + 1/8 + ...
= 1 (geoihalfsum 14614).
The notation is usually explained in more detail when first introduced.
- Axiomatic assertions ($a).
All axiomatic assertions ($a statements)
starting with " ⊢ " have labels starting
with "ax-" (axioms) or "df-" (definitions). A statement with a
label starting with "ax-" corresponds to what is traditionally
called an axiom. A statement with a label starting with "df-"
introduces new symbols or a new relationship among symbols
that can be eliminated; they always extend the definition of
a wff or class. Metamath blindly treats $a statements as new
given facts but does not try to justify them. The mmj2 program
will justify the definitions as sound as discussed below,
except for 4 definitions (df-bi 197, df-cleq 2615, df-clel 2618, df-clab 2609)
that require a more complex metalogical justification by hand.
- Proven axioms.
In some cases we wish to treat an expression as an axiom in
later theorems, even though it can be proved. For example,
we derive the postulates or axioms of complex arithmetic as
theorems of ZFC set theory. For convenience, after deriving
the postulates, we reintroduce them as new axioms on
top of set theory. This lets us easily identify which axioms
are needed for a particular complex number proof, without the
obfuscation of the set theory used to derive them. For more, see
mmcomplex.html. When we wish
to use a previously-proven assertion as an axiom, our convention
is that we use the
regular "ax-NAME" label naming convention to define the axiom,
but we precede it with a proof of the same statement with the label
"axNAME" . An example is complex arithmetic axiom ax-1cn 9994,
proven by the preceding theorem ax1cn 9970.
The metamath.exe program will warn if an axiom does not match the
preceding theorem that justifies it if the names match in this way.
- Definitions (df-...).
We encourage definitions to include hypertext links to proven examples.
- Statements with hypotheses. Many theorems and some axioms,
such as ax-mp 5, have hypotheses that must be satisfied in order for
the conclusion to hold, in this case min and maj. When presented in
summarized form such as in the Theorem List (click on "Nearby theorems"
on the ax-mp 5 page), the hypotheses are connected with an ampersand and
separated from the conclusion with a big arrow, such as in " ⊢ 𝜑
& ⊢ (𝜑 → 𝜓) => ⊢ 𝜓". These symbols are _not_
part of the Metamath language but are just informal notation meaning
"and" and "implies".
- Discouraged use and modification.
If something should only be used in limited ways, it is marked with
"(New usage is discouraged.)". This is used, for example, when something
can be constructed in more than one way, and we do not want later
theorems to depend on that specific construction.
This marking is also used if we want later proofs to use proven axioms.
For example, we want later proofs to
use ax-1cn 9994 (not ax1cn 9970) and ax-1ne0 10005 (not ax1ne0 9981), as these
are proven axioms for complex arithmetic. Thus, both
ax1cn 9970 and ax1ne0 9981 are marked as "(New usage is discouraged.)".
In some cases a proof should not normally be changed, e.g., when it
demonstrates some specific technique.
These are marked with "(Proof modification is discouraged.)".
- New definitions infrequent.
Typically, we are minimalist when introducing new definitions; they are
introduced only when a clear advantage becomes apparent for reducing
the number of symbols, shortening proofs, etc. We generally avoid
the introduction of gratuitous definitions because each one requires
associated theorems and additional elimination steps in proofs.
For example, we use < and ≤ for inequality expressions, and
use ((sin‘(i · 𝐴)) / i) instead of (sinh‘𝐴)
for the hyperbolic sine.
- Minimizing axioms and the axiom of choice.
We prefer proofs that depend on fewer and/or weaker axioms,
even if the proofs are longer. In particular, we prefer proofs that do
not use the axiom of choice (df-ac 8939) where such proofs can be found.
The axiom of choice is widely accepted, and ZFC is the most
commonly-accepted fundamental set of axioms for mathematics.
However, there have been and still are some lingering controversies
about the Axiom of Choice. Therefore, where a proof
does not require the axiom of choice, we prefer that proof instead.
E.g., our proof of the Schroeder-Bernstein Theorem (sbth 8080)
does not use the axiom of choice.
In some cases, the weaker axiom of countable choice (ax-cc 9257)
or axiom of dependent choice (ax-dc 9268) can be used instead.
Similarly, any theorem in first order logic (FOL) that
contains only set variables that are all mutually distinct,
and has no wff variables, can be proved *without* using
ax-10 2019 through ax-13 2246, by invoking ax10w 2006 through ax13w 2013.
We encourage proving theorems *without* ax-10 2019 through ax-13 2246
and moving them up to the ax-4 1737 through ax-9 1999 section.
- Alternative (ALT) proofs.
If a different proof is significantly shorter or clearer but
uses more or stronger axioms, we prefer to make that proof an
"alternative" proof (marked with an ALT label suffix), even if
this alternative proof was formalized first.
We then make the proof that requires fewer axioms the main proof.
This has the effect of reducing (over time)
the number and strength of axioms used by any particular proof.
There can be multiple alternatives if it makes sense to do so.
Alternative (*ALT) theorems should have "(Proof modification is
discouraged.) (New usage is discouraged.)" in their comment and should
follow the main statement, so that people reading the text in order will
see the main statement first. The alternative and main statement
comments should use hyperlinks to refer to each other (so that a reader
of one will become easily aware of the other).
- Alternative (ALTV) versions.
If a theorem or definition is an alternative/variant of an already
existing theorem resp. definition, its label should have the same name
with suffix ALTV. Such alternatives should be temporary only, until it
is decided which alternative should be used in the future. Alternative
(*ALTV) theorems or definitions are usually contained in mathboxes.
Their comments need not to contain "(Proof modification is discouraged.)
(New usage is discouraged.)". Alternative statements should follow the
main statement, so that people reading the text in order will see the
main statement first.
- Old (OLD) versions or proofs.
If a proof, definition, axiom, or theorem is going to be removed,
we often stage that change by first renaming its
label with an OLD suffix (to make it clear that it is going to
be removed). Old (*OLD) statements should have "(Proof modification is
discouraged.) (New usage is discouraged.)" and "Obsolete version of
~ xxx as of dd-mmm-yyyy." (not enclosed in parentheses) in the comment.
An old statement should follow the main statement, so that people
reading the text in order will see the main statement first.
This typically happens when a shorter proof to an existing theorem is
found: the existing theorem is kept as an *OLD statement for one year.
When a proof is shortened automatically (using Metamath's minimize_with
command), then it is not necessary to keep the old proof, nor to add
credit for the shortening.
- Variables.
Propositional variables (variables for well-formed formulas or wffs) are
represented with lowercase Greek letters and are normally used
in this order:
𝜑 = phi, 𝜓 = psi, 𝜒 = chi, 𝜃 = theta,
𝜏 = tau, 𝜂 = eta, 𝜁 = zeta, and 𝜎 = sigma.
Individual setvar variables are represented with lowercase Latin letters
and are normally used in this order:
𝑥, 𝑦, 𝑧, 𝑤, 𝑣, 𝑢, and 𝑡.
Variables that represent classes are often represented by
uppercase Latin letters:
𝐴, 𝐵, 𝐶, 𝐷, 𝐸, and so on.
There are other symbols that also represent class variables and suggest
specific purposes, e.g., 0 for poset zero (see p0val 17041) and
connective symbols such as + for some group addition operation.
(See prdsplusgval 16133 for an example of the use of +).
Class variables are selected in alphabetical order starting
from 𝐴 if there is no reason to do otherwise, but many
assertions select different class variables or a different order
to make their intended meaning clearer.
- Turnstile.
"⊢ ", meaning "It is provable that," is the first token
of all assertions
and hypotheses that aren't syntax constructions. This is a standard
convention in logic. For us, it also prevents any ambiguity with
statements that are syntax constructions, such as "wff ¬ 𝜑".
- Biconditional (↔).
There are basically two ways to maximize the effectiveness of
biconditionals (↔):
you can either have one-directional simplifications of all theorems
that produce biconditionals, or you can have one-directional
simplifications of theorems that consume biconditionals.
Some tools (like Lean) follow the first approach, but set.mm follows
the second approach. Practically, this means that in set.mm, for
every theorem that uses an implication in the hypothesis, like
ax-mp 5, there is a corresponding version with a biconditional or a
reversed biconditional, like mpbi 220 or mpbir 221. We prefer this
second approach because the number of duplications in the second
approach is bounded by the size of the propositional calculus section,
which is much smaller than the number of possible theorems in all later
sections that produce biconditionals. So although theorems like
biimpi 206 are available, in most cases there is already a theorem that
combines it with your theorem of choice, like mpbir2an 955, sylbir 225,
or 3imtr4i 281.
- Substitution.
"[𝑦 / 𝑥]𝜑" should be read "the wff that results from the
proper substitution of 𝑦 for 𝑥 in wff 𝜑." See df-sb 1881
and the related df-sbc 3436 and df-csb 3534.
- Is-a-set.
"𝐴 ∈ V" should be read "Class 𝐴 is a set (i.e. exists)."
This is a convention based on Definition 2.9 of [Quine] p. 19.
See df-v 3202 and isset 3207.
However, instead of using 𝐼 ∈ V in the antecedent of a theorem for
some variable 𝐼, we now prefer to use 𝐼 ∈ 𝑉 (or another
variable if 𝑉 is not available) to make it more general. That way we
can often avoid needing extra uses of elex 3212 and syl 17 in the common
case where 𝐼 is already a member of something.
For hypotheses ($e statement) of theorems (mostly in inference form),
however, ⊢ 𝐴 ∈ V is used rather than ⊢ 𝐴 ∈ 𝑉 (e.g.
difexi 4809). This is because 𝐴 ∈ V is almost always satisfied using
an existence theorem stating "... ∈ V", and a hard-coded V in
the $e statement saves a couple of syntax building steps that substitute
V into 𝑉. Notice that this does not hold for hypotheses of
theorems in deduction form: Here still ⊢ (𝜑 → 𝐴 ∈ 𝑉) should be
used rather than ⊢ (𝜑 → 𝐴 ∈ V).
- Converse.
"◡𝑅" should be read "converse of (relation) 𝑅"
and is the same as the more standard notation R^{-1}
(the standard notation is ambiguous). See df-cnv 5122.
This can be used to define a subset, e.g., df-tan 14802 notates
"the set of values whose cosine is a nonzero complex number" as
(◡cos “ (ℂ ∖ {0})).
- Function application.
"(𝐹‘𝑥)" should be read "the value
of function 𝐹 at 𝑥" and has the same meaning as the more
familiar but ambiguous notation F(x). For example,
(cos‘0) = 1 (see cos0 14880). The left apostrophe notation
originated with Peano and was adopted in Definition *30.01 of
[WhiteheadRussell] p. 235, Definition 10.11 of [Quine] p. 68, and
Definition 6.11 of [TakeutiZaring] p. 26. See df-fv 5896.
In the ASCII (input) representation there are spaces around the grave
accent; there is a single accent when it is used directly,
and it is doubled within comments.
- Infix and parentheses.
When a function that takes two classes and produces a class
is applied as part of an infix expression, the expression is always
surrounded by parentheses (see df-ov 6653).
For example, the + in (2 + 2); see 2p2e4 11144.
Function application is itself an example of this.
Similarly, predicate expressions
in infix form that take two or three wffs and produce a wff
are also always surrounded by parentheses, such as
(𝜑 → 𝜓), (𝜑 ∨ 𝜓), (𝜑 ∧ 𝜓), and
(𝜑 ↔ 𝜓)
(see wi 4, df-or 385, df-an 386, and df-bi 197 respectively).
In contrast, a binary relation (which compares two _classes_ and
produces a _wff_) applied in an infix expression is _not_
surrounded by parentheses.
This includes set membership 𝐴 ∈ 𝐵 (see wel 1991),
equality 𝐴 = 𝐵 (see df-cleq 2615),
subset 𝐴 ⊆ 𝐵 (see df-ss 3588), and
less-than 𝐴 < 𝐵 (see df-lt 9949). For the general definition
of a binary relation in the form 𝐴𝑅𝐵, see df-br 4654.
For example, 0 < 1 (see 0lt1 10550) does not use parentheses.
- Unary minus.
The symbol - is used to indicate a unary minus, e.g., -1.
It is specially defined because it is so commonly used.
See cneg 10267.
- Function definition.
Functions are typically defined by first defining the constant symbol
(using $c) and declaring that its symbol is a class with the
label cNAME (e.g., ccos 14795).
The function is then defined labeled df-NAME; definitions
are typically given using the maps-to notation (e.g., df-cos 14801).
Typically, there are other proofs such as its
closure labeled NAMEcl (e.g., coscl 14857), its
function application form labeled NAMEval (e.g., cosval 14853),
and at least one simple value (e.g., cos0 14880).
- Factorial.
The factorial function is traditionally a postfix operation,
but we treat it as a normal function applied in prefix form, e.g.,
(!‘4) = ;24 (df-fac 13061 and fac4 13068).
- Unambiguous symbols.
A given symbol has a single unambiguous meaning in general.
Thus, where the literature might use the same symbol with different
meanings, here we use different (variant) symbols for different
meanings. These variant symbols often have suffixes, subscripts,
or underlines to distinguish them. For example, here
"0" always means the value zero (df-0 9943), while
"0g" is the group identity element (df-0g 16102),
"0." is the poset zero (df-p0 17039),
"0𝑝" is the zero polynomial (df-0p 23437),
"0vec" is the zero vector in a normed subcomplex vector space
(df-0v 27453), and
"0" is a class variable for use as a connective symbol
(this is used, for example, in p0val 17041).
There are other class variables used as connective symbols
where traditional notation would use ambiguous symbols, including
"1", "+", "∗", and "∥".
These symbols are very similar to traditional notation, but because
they are different symbols they eliminate ambiguity.
- ASCII representation of symbols.
We must have an ASCII representation for each symbol.
We generally choose short sequences, ideally digraphs, and generally
choose sequences that vaguely resemble the mathematical symbol.
Here are some of the conventions we use when selecting an
ASCII representation.
We generally do not include parentheses inside a symbol because
that confuses text editors (such as emacs).
Greek letters for wff variables always use the first two letters
of their English names, making them easy to type and easy to remember.
Symbols that almost look like letters, such as ∀,
are often represented by that letter followed by a period.
For example, "A." is used to represent ∀,
"e." is used to represent ∈, and
"E." is used to represent ∃.
Single letters are now always variable names, so constants that are
often shown as single letters are now typically preceded with "_"
in their ASCII representation, for example,
"_i" is the ASCII representation for the imaginary unit i.
A script font constant is often the letter
preceded by "~" meaning "curly", such as "~P" to represent
the power class 𝒫.
Originally, all setvar and class variables used only single letters
a-z and A-Z, respectively. A big change in recent years was to
allow the use of certain symbols as variable names to make formulas
more readable, such as a variable representing an additive group
operation. The convention is to take the original constant token
(in this case "+" which means complex number addition) and put
a period in front of it to result in the ASCII representation of the
variable ".+", shown as +, that can
be used instead of say the letter "P" that had to be used before.
Choosing tokens for more advanced concepts that have no standard
symbols but are represented by words in books, is hard. A few are
reasonably obvious, like "Grp" for group and "Top" for topology,
but often they seem to end up being either too long or too
cryptic. It would be nice if the math community came up with
standardized short abbreviations for English math terminology,
like they have more or less done with symbols, but that probably
won't happen any time soon.
Another informal convention that we've somewhat followed, that is also
not uncommon in the literature, is to start tokens with a
capital letter for collection-like objects and lower case for
function-like objects. For example, we have the collections On
(ordinal numbers), Fin, Prime, Grp, and we have the functions sin,
tan, log, sup. Predicates like Ord and Lim also tend to start
with upper case, but in a sense they are really collection-like,
e.g. Lim indirectly represents the collection of limit ordinals,
but it can't be an actual class since not all limit ordinals
are sets.
This initial capital vs. lower case letter convention is sometimes
ambiguous. In the past there's been a debate about whether
domain and range are collection-like or function-like, thus whether
we should use Dom, Ran or dom, ran. Both are used in the literature.
In the end dom, ran won out for aesthetic reasons
(Norm Megill simply just felt they looked nicer).
- Typography conventions.
Class symbols for functions (e.g., abs, sin)
should usually not have leading or trailing blanks in their
HTML/Latex representation.
This is in contrast to class symbols for operations
(e.g., gcd, sadd, eval), which usually do
include leading and trailing blanks in their representation.
If a class symbol is used for a function as well as an operation
(according to the definition df-ov 6653, each operation value can be
written as function value of an ordered pair), the convention for its
primary usage should be used, e.g. (iEdg‘𝐺) versus
(𝑉iEdg𝐸) for the edges of a graph 𝐺 = 〈𝑉, 𝐸〉.
- Number construction independence.
There are many ways to model complex numbers.
After deriving the complex number postulates we
reintroduce them as new axioms on top of set theory.
This lets us easily identify which axioms are needed
for a particular complex number proof, without the obfuscation
of the set theory used to derive them.
This also lets us be independent of the specific construction,
which we believe is valuable.
See mmcomplex.html for details.
Thus, for example, we don't allow the use of ∅ ∉ ℂ,
as handy as that would be, because that would be
construction-specific. We want proofs about ℂ to be independent
of whether or not ∅ ∈ ℂ.
- Minimize hypotheses
(except for construction independence and number theorem domains).
In most cases we try to minimize hypotheses, that is,
we eliminate or reduce what must be true to prove something, so that
the proof is more general and easier to use.
There are exceptions. For example, we intentionally add hypotheses
if they help make proofs independent of a particular construction
(e.g., the contruction of complex numbers ℂ).
We also intentionally add hypotheses for many real and complex
number theorems to expressly state their domains even when they
are not strictly needed. For example, we could show that
(𝐴 < 𝐵 → 𝐵 ≠ 𝐴) without any other hypotheses, but in
practice we also require proving at least some domains
(e.g., see ltnei 10161). Here are the reasons as discussed in
https://groups.google.com/g/metamath/c/2AW7T3d2YiQ:
- Having the hypotheses immediately shows the intended domain of
applicability (is it ℝ, ℝ*, ω, or something else?),
without having to trace back to definitions.
- Having the hypotheses forces its use in the intended
domain, which generally is desirable.
- The behavior is dependent on accidental behavior of definitions
outside of their domains, so the theorems are non-portable and
"brittle".
- Only a few theorems can have their hypotheses removed
in this fashion due to happy coincidences for our particular
set-theoretical definitions. The poor user (especially a
novice learning real number arithmetic) is going to be
confused not knowing when hypotheses are needed and when
they are not. For someone who hasn't traced back the
set-theoretical foundations of the definitions, it is
seemingly random and isn't intuitive at all.
- The consensus of opinion of people on this group seemed to be
against doing this.
- Natural numbers.
There are different definitions of "natural" numbers in the literature.
We use ℕ (df-nn 11021) for the set of positive integers starting
from 1, and ℕ0 (df-n0 11293) for the set of nonnegative integers
starting at zero.
- Decimal numbers.
Numbers larger than nine are often expressed in base 10 using the
decimal constructor df-dec 11494, e.g., ;;;4001 (see 4001prm 15852
for a proof that 4001 is prime).
- Theorem forms.
We will use the following descriptive terms to categorize theorems:
- A theorem is in "closed form" if it has no $e hypotheses
(e.g., unss 3787). The term "tautology" is also used, especially in
propositional calculus. This form was formerly called "theorem form"
or "closed theorem form".
- A theorem is in "deduction form" (or is a "deduction") if it
has zero or more $e hypotheses, and the hypotheses and the conclusion
are implications that share the same antecedent. More precisely, the
conclusion is an implication with a wff variable as the antecedent
(usually 𝜑), and every hypothesis ($e statement) is either:
- an implication with the same antecedent as the conclusion, or
- a definition. A definition can be for a class variable (this is a
class variable followed by =, e.g. the definition of 𝐷 in
lhop 23779) or a wff variable (this is a wff variable followed by
↔); class variable definitions are more common.
In practice, a proof of a theorem in deduction form will also contain
many steps that are implications where the antecedent is either that
wff variable (usually 𝜑) or is a conjunction (𝜑 ∩ ...)
including that wff variable (𝜑). E.g. a1d 25, unssd 3789.
Although they are no real deductions, theorems without $e hypotheses,
but in the form (𝜑 → ...), are also said to be in "deduction
form". Such theorems usually have a two step proof, applying a1i 11 to a
given theorem, and are used as convenience theorems to shorten many
proofs. E.g. eqidd 2623, which is used more than 1500 times.
- A theorem is in "inference form" (or is an "inference") if
it has one or more $e hypotheses, but is not in deduction form,
i.e. there is no common antecedent (e.g., unssi 3788).
Any theorem whose conclusion is an implication has an associated
inference, whose hypotheses are the hypotheses of that theorem
together with the antecedent of its conclusion, and whose conclusion is
the consequent of that conclusion. When both theorems are in set.mm,
then the associated inference is often labeled by adding the suffix "i"
to the label of the original theorem (for instance, con3i 150 is the
inference associated with con3 149). The inference associated with a
theorem is easily derivable from that theorem by a simple use of
ax-mp 5. The other direction is the subject of the Deduction Theorem
discussed below. We may also use the term "associated inference" when
the above process is iterated. For instance, syl 17 is an
inference associated with imim1 83 because it is the inference
associated with imim1i 63 which is itself the inference
associated with imim1 83.
"Deduction form" is the preferred form for theorems because this form
allows us to easily use the theorem in places where (in traditional
textbook formalizations) the standard Deduction Theorem (see below)
would be used. We call this approach "deduction style".
In contrast, we usually avoid theorems in "inference form" when that
would end up requiring us to use the deduction theorem.
Deductions have a label suffix of "d", especially if there are other
forms of the same theorem (e.g., pm2.43d 53). The labels for inferences
usually have the suffix "i" (e.g., pm2.43i 52). The labels of theorems
in "closed form" would have no special suffix (e.g., pm2.43 56). When
an inference is converted to a theorem by eliminating an "is a set"
hypothesis, we sometimes suffix the closed form with "g" (for "more
general") as in uniex 6953 vs. uniexg 6955.
- Deduction theorem.
The Deduction Theorem is a metalogical theorem that provides an
algorithm for constructing a proof of a theorem from the proof of its
corresponding deduction (its associated inference). See for instance
Theorem 3 in [Margaris] p. 56. In ordinary mathematics, no one actually
carries out the algorithm, because (in its most basic form) it involves
an exponential explosion of the number of proof steps as more hypotheses
are eliminated. Instead, in ordinary mathematics the Deduction Theorem
is invoked simply to claim that something can be done in principle,
without actually doing it. For more details, see mmdeduction.html.
The Deduction Theorem is a metalogical theorem that cannot be applied
directly in metamath, and the explosion of steps would be a problem
anyway, so alternatives are used. One alternative we use sometimes is
the "weak deduction theorem" dedth 4139, which works in certain cases in
set theory. We also sometimes use dedhb 3376. However, the primary
mechanism we use today for emulating the deduction theorem is to write
proofs in deduction form (aka "deduction style") as described earlier;
the prefixed 𝜑 → mimics the context in a deduction proof system.
In practice this mechanism works very well. This approach is described
in the deduction form and natural deduction page mmnatded.html; a
list of translations for common natural deduction rules is given in
natded 27260.
- Recursion.
We define recursive functions using various "recursion constructors".
These allow us to define, with compact direct definitions, functions
that are usually defined in textbooks with indirect self-referencing
recursive definitions. This produces compact definition and much
simpler proofs, and greatly reduces the risk of creating unsound
definitions. Examples of recursion constructors include
recs(𝐹) in df-recs 7468, rec(𝐹, 𝐼) in df-rdg 7506,
seq𝜔(𝐹, 𝐼) in df-seqom 7543, and seq𝑀( + , 𝐹) in
df-seq 12802. These have characteristic function 𝐹 and initial value
𝐼. (Σg in df-gsum 16103 isn't really designed for arbitrary
recursion, but you could do it with the right magma.) The logically
primary one is df-recs 7468, but for the "average user" the most useful
one is probably df-seq 12802- provided that a countable sequence is
sufficient for the recursion.
- Extensible structures.
Mathematics includes many structures such as ring, group, poset, etc.
We define an "extensible structure" which is then used to define group,
ring, poset, etc. This allows theorems from more general structures
(groups) to be reused for more specialized structures (rings) without
having to reprove them. See df-struct 15859.
- Undefined results and "junk theorems".
Some expressions are only expected to be meaningful in certain contexts.
For example, consider Russell's definition description binder iota,
where (℩𝑥𝜑) is meant to be "the 𝑥 such that 𝜑"
(where 𝜑 typically depends on x).
What should that expression produce when there is no such 𝑥?
In set.mm we primarily use one of two approaches.
One approach is to make the expression evaluate to the empty set
whenever the expression is being used outside of its expected context.
While not perfect, it makes it a bit more clear when something
is undefined, and it has the advantage that it makes more
things equal outside their domain which can remove hypotheses when
you feel like exploiting these so-called junk theorems.
Note that Quine does this with iota (his definition of iota
evaluates to the empty set when there is no unique value of 𝑥).
Quine has no problem with that and we don't see why we should,
so we define iota exactly the same way that Quine does.
The main place where you see this being systematically exploited is in
"reverse closure" theorems like 𝐴 ∈ (𝐹‘𝐵) → 𝐵 ∈ dom 𝐹,
which is useful when 𝐹 is a family of sets. (by this we
mean it's a set set even in a type theoretic interpretation.)
The second approach uses "(New usage is discouraged.)" to prevent
unintentional uses of certain properties.
For example, you could define some construct df-NAME whose
usage is discouraged, and prove only the specific properties
you wish to use (and add those proofs to the list of permitted uses
of "discouraged" information). From then on, you can only use
those specific properties without a warning.
Other approaches often have hidden problems.
For example, you could try to "not define undefined terms"
by creating definitions like ${ $d 𝑦𝑥 $. $d 𝑦𝜑 $.
df-iota $a ⊢ (∃!𝑥𝜑 → (℩𝑥𝜑) = ∪ {𝑥 ∣ 𝜑}) $. $}.
This will be rejected by the definition checker, but the bigger
theoretical reason to reject this axiom is that it breaks equality -
the metatheorem (𝑥 = 𝑦 → P(x) = P(y) ) fails
to hold if definitions don't unfold without some assumptions.
(That is, iotabidv 5872 is no longer provable and must be added
as an axiom.) It is important for every syntax constructor to
satisfy equality theorems *unconditionally*, e.g., expressions
like (1 / 0) = (1 / 0) should not be rejected.
This is forced on us by the context free term
language, and anything else requires a lot more infrastructure
(e.g., a type checker) to support without making everything else
more painful to use.
Another approach would be to try to make nonsensical
statements syntactically invalid, but that can create its own
complexities; in some cases that would make parsing itself undecidable.
In practice this does not seem to be a serious issue.
No one does these things deliberately in "real" situations,
and some knowledgeable people (such as Mario Carneiro)
have never seen this happen accidentally.
Norman Megill doesn't agree that these "junk" consequences are
necessarily bad anyway, and they can significantly shorten proofs
in some cases. This database would be much larger if, for example,
we had to condition fvex 6201 on the argument being in the domain
of the function. It is impossible to derive a contradiction
from sound definitions (i.e. that pass the definition check),
assuming ZFC is consistent, and he doesn't see the point of all the
extra busy work and huge increase in set.mm size that would result
from restricting *all* definitions.
So instead of implementing a complex system to counter a
problem that does not appear to occur in practice, we use
a significantly simpler set of approaches.
- Organizing proofs.
Humans have trouble understanding long proofs.
It is often preferable to break longer proofs into
smaller parts (just as with traditional proofs). In Metamath
this is done by creating separate proofs of the separate parts.
A proof with the sole purpose of supporting a final proof is a
lemma; the naming convention for a lemma is the final proof's name
followed by "lem", and a number if there is more than one. E.g.,
sbthlem1 8070 is the first lemma for sbth 8080. Also, consider proving
reusable results separately, so that others will be able to easily
reuse that part of your work.
- Limit proof size.
It is often preferable to break longer proofs into
smaller parts, just as you would do with traditional proofs.
One reason is that humans have trouble understanding long proofs.
Another reason is that it's generally best to prove
reusable results separately,
so that others will be able to easily reuse them.
Finally, the "minimize" routine can take much longer with
very long proofs.
We encourage proofs to be no more than 200 essential steps, and
generally no more than 500 essential steps,
though these are simply guidelines and not hard-and-fast rules.
Much smaller proofs are fine!
We also acknowledge that some proofs, especially autogenerated ones,
should sometimes not be broken up (e.g., because
breaking them up might be useless and inefficient due to many
interconnections and reused terms within the proof).
In Metamath, breaking up longer proofs is done by creating multiple
separate proofs of separate parts.
A proof with the sole purpose of supporting a final proof is a
lemma; the naming convention for a lemma is the final proof's name
followed by "lem", and a number if there is more than one. E.g.,
sbthlem1 8070 is the first lemma for sbth 8080.
- Hypertext links.
We strongly encourage comments to have many links to related material,
with accompanying text that explains the relationship. These can help
readers understand the context. Links to other statements, or to
HTTP/HTTPS URLs, can be inserted in ASCII source text by prepending a
space-separated tilde (e.g., " ~ df-prm " results in " df-prm 15386").
When metamath.exe is used to generate HTML it automatically inserts
hypertext links for syntax used (e.g., every symbol used), every axiom
and definition depended on, the justification for each step in a proof,
and to both the next and previous assertion.
- Hypertext links to section headers.
Some section headers have text under them that describes or explains the
section. However, they are not part of the description of axioms or
theorems, and there is no way to link to them directly. To provide for
this, section headers with accompanying text (indicated with "*"
prefixed to mmtheorems.html#mmdtoc entries) have an anchor in
mmtheorems.html whose name is the first $a or $p statement that
follows the header. For example there is a glossary under the section
heading called GRAPH THEORY. The first $a or $p statement that follows
is cedgf 25867. To reference it we link to the anchor using a
space-separated tilde followed by the space-separated link
mmtheorems.html#cedgf, which will become the hyperlink
mmtheorems.html#cedgf. Note that no theorem in set.mm is allowed to
begin with "mm" (enforced by "verify markup" in the metamath program).
Whenever the software sees a tilde reference beginning with "http:",
"https:", or "mm", the reference is assumed to be a link to something
other than a statement label, and the tilde reference is used as is.
This can also be useful for relative links to other pages such as
mmcomplex.html.
- Bibliography references.
Please include a bibliographic reference to any external material used.
A name in square brackets in a comment indicates a
bibliographic reference. The full reference must be of the form
KEYWORD IDENTIFIER? NOISEWORD(S)* [AUTHOR(S)] p. NUMBER -
note that this is a very specific form that requires a page number.
There should be no comma between the author reference and the
"p." (a constant indicator).
Whitespace, comma, period, or semicolon should follow NUMBER.
An example is Theorem 3.1 of [Monk1] p. 22,
The KEYWORD, which is not case-sensitive,
must be one of the following: Axiom, Chapter, Compare, Condition,
Corollary, Definition, Equation, Example, Exercise, Figure, Item,
Lemma, Lemmas, Line, Lines, Notation, Part, Postulate, Problem,
Property, Proposition, Remark, Rule, Scheme, Section, or Theorem.
The IDENTIFIER is optional, as in for example
"Remark in [Monk1] p. 22".
The NOISEWORDS(S) are zero or more from the list: from, in, of, on.
The AUTHOR(S) must be present in the file identified with the
htmlbibliography assignment (e.g., mmset.html) as a named anchor
(NAME=). If there is more than one document by the same author(s),
add a numeric suffix (as shown here).
The NUMBER is a page number, and may be any alphanumeric string such as
an integer or Roman numeral.
Note that we _require_ page numbers in comments for individual
$a or $p statements. We allow names in square brackets without
page numbers (a reference to an entire document) in
heading comments.
If this is a new reference, please also add it to the
"Bibliography" section of mmset.html.
(The file mmbiblio.html is automatically rebuilt, e.g.,
using the metamath.exe "write bibliography" command.)
- Acceptable shorter proofs
Shorter proofs are welcome, and any shorter proof we accept
will be acknowledged in the theorem's description. However,
in some cases a proof may be "shorter" or not depending on
how it is formatted. This section provides general guidelines.
Usually we automatically accept shorter proofs that (1)
shorten the set.mm file (with compressed proofs), (2) reduce
the size of the HTML file generated with SHOW STATEMENT xx
/ HTML, (3) use only existing, unmodified theorems in the
database (the order of theorems may be changed, though), and
(4) use no additional axioms.
Usually we will also automatically accept a _new_ theorem
that is used to shorten multiple proofs, if the total size
of set.mm (including the comment of the new theorem, not
including the acknowledgment) decreases as a result.
In borderline cases, we typically place more importance on
the number of compressed proof steps and less on the length
of the label section (since the names are in principle
arbitrary). If two proofs have the same number of compressed
proof steps, we will typically give preference to the one
with the smaller number of different labels, or if these
numbers are the same, the proof with the fewest number of
characters that the proofs happen to have by chance when
label lengths are included.
A few theorems have a longer proof than necessary in order
to avoid the use of certain axioms, for pedagogical purposes,
and for other reasons. These theorems will (or should) have
a "(Proof modification is discouraged.)" tag in their
description. For example, idALT 23 shows a proof directly from
axioms. Shorter proofs for such cases won't be accepted,
of course, unless the criteria described continues to be
satisfied.
- Input format.
The input is in ASCII with two-space indents. Tab characters are not
allowed. Use embedded math comments or HTML entities for non-ASCII
characters (e.g., "é" for "é").
- Information on syntax, axioms, and definitions.
For a hyperlinked list of syntax, axioms, and definitions, see
mmdefinitions.html.
If you have questions about a specific symbol or axiom, it is best
to go directly to its definition to learn more about it.
The generated HTML for each theorem and axiom includes hypertext
links to each symbol's definition.
- Reserved symbols: 'LETTER.
Some symbols are reserved for potential future use.
Symbols with the pattern 'LETTER are reserved for possibly
representing characters (this is somewhat similar to Lisp).
We would expect '\n to represent newline, 'sp for space, and perhaps
'\x24 for the dollar character.
- Language and spelling.
It is preferred to use American English for comments and symbols, e.g.
we use "neighborhood" instead of the British English "neighbourhood".
An exception is the word "analog", which can be either a noun or an
adjective. Furthermore, "analog" has the confounding meaning "not
digital", whereas "analogue" is often used in the sense something that
bears analogy to something else also in American English. Therefore,
"analogue" is used for the noun and "analogous" for the adjective in
set.mm.
- Comments and layout.
As for formatting of the file set.mm, and in particular formatting and
layout of the comments, the foremost rule is consistency. The first
sections of set.mm, in particular Part 1 "Classical first-order logic
with equality" can serve as a model for contributors. Some formatting
rules are enforced when using the Metamath program's "WRITE SOURCE"
command with the "REWRAP" option. Here are a few other rules, which are
not enforced, but that we try follow:
-
The file set.mm should have a double blank line before each section
header, and at no other places. In particular, there are no triple
blank lines. If there is a "@( Begin $[ ... $] @)" comment (where "@"
is actually "$") before the section header, then the double blank line
should go before that comment.
-
The header comments should be spaced as those of Part 1, namely, with
a blank line before and after the comment, and an indentation of two
spaces.
-
Header comments are not rewrapped by the Metamath program [as of
24-Oct-2021], but similar spacing and wrapping should be used as for
other comments: double spaces after a period ending a sentence, line
wrapping with line width of 79, and no trailing spaces at the end of
lines.
The challenge of varying mathematical conventions
We try to follow mathematical conventions, but in many cases
different texts use different conventions.
In those cases we pick some reasonably common convention and stick to
it.
We have already mentioned that the term "natural number" has
varying definitions (some start from 0, others start from 1), but
that is not the only such case.
A useful example is the set of metavariables used to represent
arbitrary well-formed formulas (wffs).
We use an open phi, φ, to represent the first arbitrary wff in an
assertion with one or more wffs; this is a common convention and
this symbol is easily distinguished from the empty set symbol.
That said, it is impossible to please everyone or simply "follow
the literature" because there are many different conventions for
a variable that represents any arbitrary wff.
To demonstrate the point,
here are some conventions for variables that represent an arbitrary
wff and some texts that use each convention:
- open phi φ (and so on): Tarski's papers,
Rasiowa & Sikorski's
The Mathematics of Metamathematics (1963),
Monk's Introduction to Set Theory (1969),
Enderton's Elements of Set Theory (1977),
Bell & Machover's A Course in Mathematical Logic (1977),
Jech's Set Theory (1978),
Takeuti & Zaring's
Introduction to Axiomatic Set Theory (1982).
- closed phi ϕ (and so on):
Levy's Basic Set Theory (1979),
Kunen's Set Theory (1980),
Paulson's Isabelle: A Generic Theorem Prover (1994),
Huth and Ryan's Logic in Computer Science (2004/2006).
- Greek α, β, γ:
Duffy's Principles of Automated Theorem Proving (1991).
- Roman A, B, C:
Kleene's Introduction to Metamathematics (1974),
Smullyan's First-Order Logic (1968/1995).
- script A, B, C:
Hamilton's Logic for Mathematicians (1988).
- italic A, B, C:
Mendelson's Introduction to Mathematical Logic (1997).
- italic P, Q, R:
Suppes's Axiomatic Set Theory (1972),
Gries and Schneider's A Logical Approach to Discrete Math
(1993/1994),
Rosser's Logic for Mathematicians (2008).
- italic p, q, r:
Quine's Set Theory and Its Logic (1969),
Kuratowski & Mostowski's Set Theory (1976).
- italic X, Y, Z:
Dijkstra and Scholten's
Predicate Calculus and Program Semantics (1990).
- Fraktur letters:
Fraenkel et. al's Foundations of Set Theory (1973).
Distinctness or freeness
Here are some conventions that address distinctness or freeness of a
variable:
- Ⅎ𝑥𝜑 is read " 𝑥 is not free in (wff) 𝜑";
see df-nf 1710 (whose description has some important technical
details). Similarly, Ⅎ𝑥𝐴 is read 𝑥 is not free in (class)
𝐴, see df-nfc 2753.
- "$d x y $." should be read "Assume x and y are distinct
variables."
- "$d x 𝜑 $." should be read "Assume x does not occur in phi $."
Sometimes a theorem is proved using
Ⅎ𝑥𝜑 (df-nf 1710) in place of
"$d 𝑥𝜑 $." when a more general result is desired;
ax-5 1839 can be used to derive the $d version. For an example of
how to get from the $d version back to the $e version, see the
proof of euf 2478 from df-eu 2474.
- "$d x A $." should be read "Assume x is not a variable occurring in
class A."
- "$d x A $. $d x ps $. $e |- (𝑥 = 𝐴 → (𝜑 ↔ 𝜓)) $."
is an idiom
often used instead of explicit substitution, meaning "Assume psi results
from the proper substitution of A for x in phi."
- " ⊢ (¬ ∀𝑥𝑥 = 𝑦 → ..." occurs early in some cases, and
should be read "If x and y are distinct
variables, then..." This antecedent provides us with a technical
device (called a "distinctor" in Section 7 of [Megill] p. 444)
to avoid the need for the
$d statement early in our development of predicate calculus, permitting
unrestricted substitutions as conceptually simple as those in
propositional calculus. However, the $d eventually becomes a
requirement, and after that this device is rarely used.
There is a general technique to replace a $d x A or
$d x ph condition in a theorem with the corresponding
Ⅎ𝑥𝐴 or Ⅎ𝑥𝜑; here it is.
⊢ T[x, A] where $d 𝑥𝐴,
and you wish to prove ⊢ Ⅎ𝑥𝐴 ⇒ ⊢ T[x, A].
You apply the theorem substituting 𝑦 for 𝑥 and 𝐴 for 𝐴,
where 𝑦 is a new dummy variable, so that
$d y A is satisfied.
You obtain ⊢ T[y, A], and apply chvar to obtain ⊢
T[x, A] (or just use mpbir 221 if T[x, A] binds 𝑥).
The side goal is ⊢ (𝑥 = 𝑦 → ( T[y, A] ↔ T[x, A] )),
where you can use equality theorems, except
that when you get to a bound variable you use a non-dv bound variable
renamer theorem like cbval 2271. The section
mmtheorems32.html#mm3146s also describes the
metatheorem that underlies this.
Standard Metamath verifiers do not distinguish between axioms and
definitions (both are $a statements).
In practice, we require that definitions (1) be conservative
(a definition should not allow an expression
that previously qualified as a wff but was not provable
to become provable) and be eliminable
(there should exist an algorithmic method for converting any
expression using the definition into
a logically equivalent expression that previously qualified as a wff).
To ensure this, we have additional rules on almost all definitions
($a statements with a label that does not begin with ax-).
These additional rules are not applied in a few cases where they
are too strict (df-bi 197, df-clab 2609, df-cleq 2615, and df-clel 2618);
see those definitions for more information.
These additional rules for definitions are checked by at least
mmj2's definition check (see
mmj2 master file mmj2jar/macros/definitionCheck.js).
This definition check relies on the database being very much like
set.mm, down to the names of certain constants and types, so it
cannot apply to all Metamath databases... but it is useful in set.mm.
In this definition check, a $a-statement with a given label and
typecode ⊢ passes the test if and only if it
respects the following rules (these rules require that we have
an unambiguous tree parse, which is checked separately):
- The expression must be a biconditional or an equality (i.e. its
root-symbol must be ↔ or =).
If the proposed definition passes this first rule, we then
define its definiendum as its left hand side (LHS) and
its definiens as its right hand side (RHS).
We define the *defined symbol* as the root-symbol of the LHS.
We define a *dummy variable* as a variable occurring
in the RHS but not in the LHS.
Note that the "root-symbol" is the root of the considered tree;
it need not correspond to a single token in the database
(e.g., see w3o 1036 or wsb 1880).
- The defined expression must not appear in any statement
between its syntax axiom ($a wff ) and its definition,
and the defined expression must not be used in its definiens.
See df-3an 1039 for an example where the same symbol is used in
different ways (this is allowed).
- No two variables occurring in the LHS may share a
disjoint variable (DV) condition.
- All dummy variables are required to be disjoint from any
other (dummy or not) variable occurring in this labeled expression.
- Either
(a) there must be no non-setvar dummy variables, or
(b) there must be a justification theorem.
The justification theorem must be of form
⊢ ( definiens root-symbol definiens' )
where definiens' is definiens but the dummy variables are all
replaced with other unused dummy variables of the same type.
Note that root-symbol is ↔ or =, and that setvar
variables are simply variables with the setvar typecode.
- One of the following must be true:
(a) there must be no setvar dummy variables,
(b) there must be a justification theorem as described in rule 5, or
(c) if there are setvar dummy variables, every one must not be free.
That is, it must be true that
(𝜑 → ∀𝑥𝜑) for each setvar dummy variable 𝑥
where 𝜑 is the definiens.
We use two different tests for non-freeness; one must succeed
for each setvar dummy variable 𝑥.
The first test requires that the setvar dummy variable 𝑥
be syntactically bound
(this is sometimes called the "fast" test, and this implies
that we must track binding operators).
The second test requires a successful
search for the directly-stated proof of (𝜑 → ∀𝑥𝜑)
Part c of this rule is how most setvar dummy variables
are handled.
Rule 3 may seem unnecessary, but it is needed.
Without this rule, you can define something like
cbar $a wff Foo x y $.
${ $d x y $. df-foo $a |- ( Foo x y <-> x = y ) $. $}
and now "Foo x x" is not eliminable;
there is no way to prove that it means anything in particular,
because the definitional theorem that is supposed to be
responsible for connecting it to the original language wants
nothing to do with this expression, even though it is well formed.
A justification theorem for a definition (if used this way)
must be proven before the definition that depends on it.
One example of a justification theorem is vjust 3201.
The definition df-v 3202 ⊢ V = {𝑥 ∣ 𝑥 = 𝑥} is justified
by the justification theorem vjust 3201
⊢ {𝑥 ∣ 𝑥 = 𝑥} = {𝑦 ∣ 𝑦 = 𝑦}.
Another example of a justification theorem is trujust 1485;
the definition df-tru 1486 ⊢ (⊤ ↔ (∀𝑥𝑥 = 𝑥 → ∀𝑥𝑥 = 𝑥))
is justified by trujust 1485 ⊢ ((∀𝑥𝑥 = 𝑥 → ∀𝑥𝑥 = 𝑥) ↔ (∀𝑦𝑦 = 𝑦 → ∀𝑦𝑦 = 𝑦)).
Here is more information about our processes for checking and
contributing to this work:
- Multiple verifiers.
This entire file is verified by multiple independently-implemented
verifiers when it is checked in, giving us extremely high
confidence that all proofs follow from the assumptions.
The checkers also check for various other problems such as
overly long lines.
- Maximum text line length is 79 characters.
You can fix comment line length by running the commands scripts/rewrap
or metamath 'read set.mm' 'save proof */c/f'
'write source set.mm/rewrap' quit .
As a general rule, a math string in a comment should be surrounded
by backquotes on the same line, and if it is too long it should
be broken into multiple adjacent mathstrings on multiple lines.
Those commands don't modify the math content of statements.
In statements we try to break before the outermost important connective
(not including the typecode and perhaps not the antecedent).
For examples, see sqrtmulii 14126 and absmax 14069.
- Discouraged information.
A separate file named "discouraged" lists all
discouraged statements and uses of them, and this file is checked.
If you change the use of discouraged things, you will need to change
this file.
This makes it obvious when there is a change to anything discouraged
(triggering further review).
- LRParser check.
Metamath verifiers ensure that $p statements follow from previous
$a and $p statements.
However, by itself the Metamath language permits certain kinds of
syntactic ambiguity that we choose to avoid in this database.
Thus, we require that this database unambiguously parse
using the "LRParser" check (implemented by at least mmj2).
(For details, see mmj2 master file src/mmj/verify/LRParser.java).
This check
counters, for example, a devious ambiguous construct
developed by saueran at oregonstate dot edu
posted on Mon, 11 Feb 2019 17:32:32 -0800 (PST)
based on creating definitions with mismatched parentheses.
- Proposing specific changes.
Please propose specific changes as pull requests (PRs) against the
"develop" branch of set.mm, at:
https://github.com/metamath/set.mm/tree/develop
- Community.
We encourage anyone interested in Metamath to join our mailing list:
https://groups.google.com/forum/#!forum/metamath.
(Contributed by DAW, 27-Dec-2016.) (New usage is
discouraged.) |