1. CONSTRUCTIVE VERSUS CLASSICAL MATHEMATICS

The classical view of mathematics is essentially descriptive: we try to describe the facts about a static mathematical universe. Thus, for example, we report that every polynomial of odd degree has a root, and that there is a digit that occurs infinitely often in the decimal expansion of π. In opposition to this is the constructive view of mathematics, which focuses attention on the dynamic interaction of the individual with the mathematical universe; in the words of Hao Wang, it is a mathematics of doing, rather than a mathematics of being. The constructive mathematician must show how to construct a root of a polynomial of odd degree, and how to find a digit that occurs infinitely often in the decimal expansion of π.
We picture an idealized mathematician U interacting with the mathematical universe; this is the "you" who finds the δ, and to whom the ε is given, when we say "given ε you can find δ." The phrases "there exists" and "you can find" mean that U can carry out the indicated constructions. The disjunction of two statements "P1 or P2" means that either P1 is true or P2 is true, and that U can determine which of these alternatives holds. As "P1 or P2" means that there exists i in {1,2} such that Pi is true, the meaning of "or" can be derived from the meaning of "there exists", and it is the interpretation of this latter phrase that is fundamental to constructive mathematics.
Classical mathematics can also be encompassed by this picture; the difference lies in what powers we ascribe to U. An omniscient U can decide whether any given mathematical statement is true or false; so U can, for example, survey the decimal expansion of π and determine which digits appear infinitely often. With an omniscient U, our picture is just a more dynamic, anthropomorphic portrayal of classical mathematics.
In constructive mathematics we assume only that U can carry out constructions that are finite in nature. As Errett Bishop put it, "The only way to show that an object exists is to give a finite routine for finding it." In this setting, we are not entitled to say that some digit appears infinitely often in the decimal expansion of π until we are prepared to exhibit such a digit, or at least produce an algorithm that will compute such a digit.
We consider U to be capable of carrying out any finite construction that is specified by an algorithm, but we do not rule out the possibility that U can do other things-even that U might be omniscient. The picture that results when we restrict U to finite constructions is the computational interpretation of mathematics. Because any statement that admits a constructive proof is true under the computational interpretation, we say that constructive mathematics has numerical meaning; because any statement that admits a constructive proof is true under the classical interpretation, we say that constructive mathematics is a generalization of classical mathematics.
Constructive mathematics is pure mathematics done algorithmically in order to respect the computational interpretation. The central notion of a finite routine, or algorithm, is taken as primitive. Any attempt to define what an algorithm is ultimately involves an appeal to the notion of existence-for example, we might demand that there exist a step at which a certain computer program produces an output. If the term "exist" is used here in the classical sense, then we have failed to capture the constructive notion of an algorithm; if it is used in the constructive sense, then the definition is circular.
Consider the difference between the constructive and the classical use of the connective "or". In order to prove "P1 or P2" constructively, we must construct an algorithm that will either prove P1 or prove P2, and by executing that algorithm we (the idealized mathematician) can determine which is true. To prove "P1 or P2" classically, it suffices to show that P1 and P2 cannot both be false. For example, let P1 be the statement:
 there exist positive integers x, y, z, and n such that:

 xn+2 + yn+2 = zn+2,
and let P2, the famous unproved theorem of Fermat, be the denial of P1. If P1 is false then P2 is true, so "P1 or P2" is classically provable; but we don't know, at this time, how to determine which of P1 or P2 holds, so we do not yet have a constructive proof of "P1 or P2".
A constructive proof of a theorem proves more than a classical one: a constructive proof that a sequence of real numbers converges implies that we can compute the rate of convergence; a constructive proof that a vector space is finite dimensional implies that we can construct a basis; a constructive proof that a polynomial is a product of irreducible polynomials implies that we can construct those irreducible polynomials.
Two statements P and Q may be classically equivalent without being constructively equivalent. Let P be the statement that every subgroup of the additive group Z of integers is cyclic. This means that we can always produce a generator from the data specifying the subgroup. Let Q be the statement that no subgroup G of Z can have the property that for each m in G, there is an integer in G that is not a multiple of m. The statements P and Q are immediately equivalent classically, but quite different constructively. The statement Q is true: as 0 is in G, there must be a nonzero integer n in G; as n is in G, there must be an integer properly dividing n in G, and so on until we arrive at a contradiction. But P is unlikely to be true, as we can see by considering the subgroup G generated by the perfect numbers: to construct a generator of G we must construct an odd perfect number or show that all perfect numbers are even.
On the other hand, any two constructively equivalent statements are classically equivalent; indeed any theorem in constructive mathematics is also a theorem in classical mathematics: a constructive proof is a proof.
Suppose we are trying to find a constructive proof of a statement P which is classically true. After many unsuccessful attempts to prove P, we may be inclined to look for a counterexample. But we cannot hope to prove the denial ¬P of P, which is what a bona fide counterexample would entail, because ¬P is classically false. As this avenue is closed to us, we need some other alternative to persisting in trying to prove P.
One approach is to fix a formal language in which P is expressible, specify precisely what sequences of words in that language constitute proofs of P, and show that no such proof can be constructed (possibly by giving an unintended interpretation of the formal language, and showing that P is false in that interpretation). Such a program can be illuminating, but it is often doubtful that the formal system adequately reflects the informal mathematics. A more serious objection is that engaging in such independence arguments requires a drastic shift in point of view; a procedure that stays closer to the subject at hand would be preferable. To this end we introduce the idea of an omniscience principle and of a Brouwerian example.
A rule α that assigns to each positive integer n an element αn of {0,1} is called a binary sequence. An omniscience principle is a classically true statement of the form "P(α) for all binary sequences α" which is considered not to have a constructive proof. For example, from the classical point of view, for each binary sequence α either

(P) there exists n such that αn=1, or
(Q) αn=0 for each positive integer n.
The assertion that either P or Q holds for any binary sequence α is called the limited principle of omniscience (LPO). As Q is the negation of P, the limited principle of omniscience is a form of the law of excluded middle: the assertion that for any statement P, the statement "P or not P" holds. The limited principle of omniscience and, a fortiori, the law of excluded middle, is rejected in the constructive approach as no one seriously believes that we can construct an algorithm that, given α, chooses the correct alternative, P or Q.
Another argument against LPO is that if we restrict our idealized mathematician to specific kinds of algorithms, as is done in the Russian school of constructive mathematics, we can show that LPO is false. Fix a computer programming language capable of expressing the usual number theoretic functions and symbol manipulations. It can be shown that there is no computer program that accepts computer programs as inputs, and when applied to a program that computes a binary sequence, returns 1 if the sequence contains a 1, and returns 0 otherwise. So if we require our rules to be given by computer programs, then LPO is demonstrably false. This argues against accepting LPO because any informal algorithm that we use will undoubtedly be programmable, so our theorems will be true in this computer-program intepretation; but we do not restrict ourselves to computer programs lest we rule out the possibility of intepreting our theorems classically: indeed, LPO is classically true.
If a statement P can be shown to imply LPO, then we abandon the search for a constructive proof of P. We do not assert that P is false, as we denied the existence of the computer program in the preceding paragraph; after all, P may admit a classical proof. Rather, statements like LPO are thought of as independent in the sense that neither they nor their denials are valid.
For example, consider the classically true statement P that every subset of the integers is either empty or contains an element. Given a binary sequence α, let A = {1} and let B = {αn : nN}. Then AB is a subset of the integers. If AB contains an element, then that element must be 1, so, from the definition of B, there exists n such that αn = 1; if AB is empty, then αn = 0 for all n. Thus if P holds, then so does LPO.
A weaker omniscience principle is the lesser limited principle of omniscience (LLPO), which states that for any binary sequence α that contains at most one 1, either αn=0 for all odd n, or αn=0 for all even n. This implies that, given any binary sequence, we can tell whether the first occurrence of a 1, if any, occurs at an even or at an odd index n. As in the case of LPO, if we restrict ourselves to rules given by computer programs, then we can refute LLPO. If you think of α as a black box into which you put n and get out αn, then it is fairly clear that you cannot hope to establish LPO or LLPO. Or consider the sequence α defined by:

 α2n=1 if, and only if, there are 100 consecutive 6's in the first n places of the decimal expansion of π. α2n+1=1 if, and only if, there are 100 consecutive 7's in the first n places of the decimal expansion of π.

As π/4=1−1/3+1/5−1/7+1/9−…, there is an algorithm for computing the terms αn. But unless we chance upon 100 consecutive 6's or 7's, we are hard put to find an algorithm that will tell us the parity of the index n for which αn=1 for the first time (if ever).
A Brouwerian example E is a construction E(α) based on an arbitrary binary sequence α. We say that a Brouwerian example E satisfies a condition C if E(α) satisfies C for each α; we say that E does not satisfy a condition C if there is an omniscience principle "P(α) for all α" such that whenever E(α) satisfies C, then P(α) holds. A Brouwerian counterexample to a statement of the form "C1 implies C2" is a Brouwerian example that satisfies C1 but does not satisfy C2.
Our construction AB above is a Brouwerian example of a subset of the integers that neither contains an element nor is empty. We now construct a Brouwerian example of a bounded increasing sequence of real numbers that does not have a least upper bound. For each binary sequence α let E(α) be the sequence β of real numbers such that βn = supni=1αi. Then E(α) is a bounded increasing sequence of real numbers. Let C be the condition, on a sequence of real numbers, that it has a least upper bound. We shall show that E does not satisfy C. Let P(α) be the property that either αn = 0 for all n, or there exists n such that αn = 1, and suppose E(α) satisfies C. If the least upper bound of E(α) is less than 1, then αn = 0 for all n. If the least upper bound of E(α) is positive, then there exists n such that αn > 0, and hence αn = 1. Thus P(α) holds.
EXERCISES
1. Show that LPO implies LLPO.
2. Each subset of {0,1} has 0, 1 or 2 elements. Construct a Brouwerian counterexample to that statement.
3. Construct a Brouwerian example of a nonempty set of positive integers that does not contain a smallest element.
4. Construct a Brouwerian example of a subgroup of the additive group of integers that is not cyclic.
5. Construct a Brouwerian example of two binary sequences whose sum contains infinitely many 1's, yet neither of the original sequences does.
6. Call a statement simply existential if it is of the form "there exists n such that an=1" for some binary sequence a. Show that LLPO is equivalent to
 ¬ (A and B) if and only if ¬A or ¬B
for each pair of simply existential statements A and B.
7. The weak limited principle of omniscience (WLPO) is the statement that for each binary sequence α either αn=0 for all n or it is impossible that αn=0 for all n. Show that LPO implies WLPO, and that WLPO implies LLPO.
8. Let S be the set of all finite sequences of positive integers. By a finitary tree we mean a subset T of S such that

(i) For each sS, either sT or sT,
(ii) If (x1,...,xn) ∈ T, then (x1,...,xn−1) ∈ T,
(iii) For each (x1,...,xn) ∈ T, there is mN such that if (x1,...,xn,z) ∈ T, then zm.
An infinite path in T is a sequence {xi} of positive integers such that (x1,...,xn) ∈ T for each n. König's lemma states that if T is infinite (has arbitrarily large finite subsets), then T has an infinite path. Show that König's lemma implies LLPO.
2. SETS, SUBSETS, AND FUNCTIONS

We deal with two sorts of collections of mathematical objects: sets and categories. Our notion of what constitutes a set is a rather liberal one.
A set S is defined when we describe how to construct its members from objects that have been, or could have been, constructed prior to S, and describe what it means for two members of S to be equal.
Following Bishop we regard the equality relation on a set as conventional: something to be determined when the set is defined, subject only to the requirement that it be an equivalence relation, that is
reflexive: a=a.
symmetric: If a=b, then b=a.
transitive: If a=b and b=c, then a=c.
An n-ary relation on a set S is a property P that is applicable to n-tuples of elements of S, and is extensional in the sense that if xi=yi, for i=1,...,n, then P(x1,…,xn) if and only if P(y1,…,yn). Note that equality is a binary relation in this sense. The relation P is decidable if for each n-tuple x1,…,xn either P(x1,…,xn) holds or it doesn't hold.
A unary relation P on S defines a subset A={xS : P(x)} of S: an element of A is an element of S that satisfies P, and two elements of A are equal if and only if they are equal as elements of S. If A and B are subsets of S, and if every element of A is an element of B, then we say that A is contained in B, and write AB. Two subsets A and B of a set S are equal if AB and BA; this is clearly an equivalence relation on subsets of S. We have described how to construct a subset of S, and what it means for two subsets of S to be equal. Thus we have defined the set of all subsets, or the power set, of S. A subset of S is nonempty if there exists an element in it.
The union of two subsets A and B of S is the subset of S defined by AB={xS : xA or xB}. The 'or' in this definition is interpreted constructively, so given x in AB we can determine which subset x is in (although we may not be able to tell whether it is in both). In terms of existence, xAB means that there exists i in {1,2} so that if i=1, then xA, and if i=2, then xB. The intersection of A and B is the subset AB={xS : xA and xB}.
We regard the relation of inequality as conventional and not necessarily the denial of equality; the interpretation of the symbol ab will depend on the context. Every set admits the denial inequality defined by ab if a=b is impossible. Some sets admit other more natural inequalities: if a and b are binary sequences, then the good interpretation of ab is that there exists n such that anbn. If a set has no specified inequality, we interpret ab as being the denial inequality. We employ the usual terminology involving inequality: to say a and b are distinct means ab; to say a is nonzero means a ≠ 0.
An inequality on a set may be one or more of the following:
consistent: aa is impossible.
symmetric: if ab, then ba.
cotransitive: if ac, then for any b either ab or bc.
tight:  if ab is impossible, then a=b.
We almost always want an inequality to be symmetric because ab is supposed to embody the idea that a and b are distinct, which should be a symmetric relation. It is also natural to demand consistency, but in practice this property is usually unnecessary. A consistent, symmetric, cotransitive inequality is called an apartness; the inequality specified above for the set of binary sequences is a tight apartness, as is the standard inequality on real numbers (see II.3). The denial inequality need not be an apartness, nor need it be tight.
An inequality is said to be standard if it can be shown to be equivalent to the denial inequality using the law of excluded middle. A tight consistent inequality is standard because \lnot\lnot(ab) is equivalent to \lnot(a=b). The denial inequality is trivially a standard inequality. With one important exception (local rings), we will be interested only in standard inequalities. It should be noted, however, that the requirement that an inequality be standard has very little constructive content: one cannot even prove that every standard inequality on a one-element set is consistent (a statement that can be refuted using LEM need not be refutable).
A set S with a consistent inequality is discrete if for any two elements a and b of S, either a=b or ab; if S has no specified inequality, then S is discrete if it is discrete under the denial inequality. The inequality of a discrete set is the denial inequality, and is a tight apartness. However the assertion that a set is discrete does not a priori refer to the denial inequality, but rather to whatever inequality comes with S: to say that a set S of binary sequences is discrete means that for all a and b in S, either a=b or there exists n such that anbn.
The set Z of integers is discrete. The set Q of rational numbers is also discrete: a rational number is a pair of integers m/n with n ≠ 0, two rational numbers m1/n1 and m2/n2 being considered equal if m1n2 = m2n1. Another example of a discrete set is the ring Z12 of integers modulo 12: the elements of Z12 are integers, and two elements of Z12 are equal if their difference is divisible by 12. As we can decide whether or not an integer is divisible by 12, the set Z12 is discrete.
If xS and A is a subset of S, then we define xA to mean that xa for each aA; if the inequality on S is the denial inequality, or if S has no specified inequality, then xA if and only if x cannot be in A. The complement of A in S is S\A={xS:xA}. A subset A of S is proper if there exists x in S such that xA. A subset A of a set S is detachable if for each x in S either xA or xA.
Given sets S1,S2,…,Sn we define their Cartesian product S1×S2×…×Sn to consist of the n-tuples (x1,x2,…,xn) with xi in Si for each i. Two such n-tuples (x1,x2,…,xn) and (y1,y2,…,yn) are equal if xi=yi for each i. Relations may be identified with subsets of Cartesian products: a binary relation on S is a subset of S×S.
If A and B are sets, then a function from A to B is a rule that assigns to each element a of A an element f(a) of B, and is extensional in the sense that f(a1)=f(a2) whenever a1=a2. We write f : AB to indicate that f is a function from A to B. Two functions f and g from A to B are equal if f(a)=g(a) for each a in A. The identity function f : AA is defined by setting f(a)=a for each a in A.
To construct a function from A to B it suffices to construct a subset S of the Cartesian product A×B with the properties

(i) for each aA there exists bB such that (a,b) ∈ S,
(ii) if (a,b1) and (a,b2) are elements of S, then b1=b2.
In the computational interpretation, the algorithm for the function comes from (i), which specifies the construction of an element b depending on the parameter a. Without (ii), however, the algorithm implicit in (i) need not be extensional. The fact that a subset S of A×B satisfying (i) and (ii) determines a function f such that (a,f(a)) ∈ S for each a in A is known as the axiom of unique choice.
Let f be a function from A to B. We say that f is
one-to-one if a1=a2 whenever f(a1)=f(a2),
onto if for each b in B there exists a in A such that f(a)=b,
strongly extensional if a1a2 whenever f(a1) ≠ f(a2).
Note that any function between sets with denial inequalities is strongly extensional. If SA, then the image of S under f is the set
 f(S)={b ∈ B : b=f(a) for some a ∈ A}.
Thus f is onto if and only if f(A)=B. If SB, then the preimage of S under f is the set
 f−1(S)={a ∈ A : f(a) ∈ S}.
Two sets A and B have the same cardinality if there are functions f from A to B, and g from B to A, such that fg is the identity function on B and gf is the identity function on A; we say that the functions f and g are inverses of each other, and that each is a bijection. If A and B have the same cardinality, we write #A=#B . The axiom of unique choice implies that a one-to-one onto function is a bijection (Exercise 6). Classically we think of sets of the same cardinality as simply having the same size; constructively it is more accurate to think of them as having the same structure.
When we refer to the cardinality of a set we mean the set itself, ignoring any structure other than equality that it might have. The distinction between referring to a set and referring to its cardinality is primarily one of intent: when we refer to the cardinality of a set, we do not plan on paying attention to any characteristics of the set that are not shared by all sets with the same cardinality. For example, if x is an element of a group, then the set S={1,x,x2,x3,…} is the submonoid generated by x, while the cardinality of S is the order of x. It is much the same distinction as that between a fraction and a rational number. A common device for dealing with this kind situation is to introduce equivalence classes but, following Bishop (1967), we prefer to deal with the equivalence directly and not introduce cumbersome new entities.
If a set A has the same cardinality as {1,…,n} (empty if n=0) for some nonnegative integer n, then we say that A is an n-element set, or A has cardinality n, and write #A=n. A finite set A is a discrete set which has cardinality n for some nonnegative integer n. Recall that a discrete set must be discrete in its specified inequality, if any, so a set can have finite cardinality without being finite; such sets are somewhat pathological, which is why we give them the longer name.
A set A is finitely enumerable if it is empty or there is a function from {1,…,n} onto A. Note that a finitely enumerable set is discrete if and only if it is finite. We say that A has at most n elements if whenever a0,…,anA, then there exist 0 ≤ i < jn such that ai=aj. A set is bounded in number, or bounded, if it has at most n elements for some n. A set is infinite if it contains arbitrarily large finite subsets.
A set A is countable if there is a function from a detachable subset of the positive integers onto A. Thus the empty set is countable, as is the set of odd perfect numbers. Nonempty countable sets are the ranges of functions from the positive integers, so their elements can be enumerated (possibly with repetitions) as a1,a2,….
A sequence of elements of a set A, or a sequence in A, is a function from the nonnegative integers N to A. We shall also speak of functions from the positive integers as being sequences. A family of elements of A, indexed by a set I, is a function f from I to A; the image of i in A under f is usually denoted fi rather than f(i). Thus a sequence is a family indexed by the nonnegative integers N. A finite family of elements of A is a family of elements of A indexed by {1,...,n} for some positive integer n.
If {Ai}iI is a family of subsets of S, then its union is defined by ∪iIAi={xS: there exists iI such that xAi}, and its intersection is defined by ∩iIAi={xS:xAi for all iI}.
If S is a set with an inequality, and X is a set, then the set SX of functions from X to S inherits an inequality from S by setting fg if there exists x in X such that f(x) ≠ g(x).
Let S be a set with inequality and let X be a set. If the inequality on S is consistent, symmetric, cotransitive, or tight, then so, respectively, is the inequality on SX.
Consistency and symmetry are clear. Suppose that the inequality on S is tight. If f1f2 is impossible, then there cannot exist x such that f1(x) ≠ f2(x). Thus, given x, it is impossible that f1(x) ≠ f2(x), whence f1(x)=f2(x) for each x, so f1=f2. Now suppose the inequality on S is cotransitive. If f1f3, then for some x we have f1(x) ≠ f3(x) so either f1(x) ≠ f2(x) or f2(x) ≠ f3(x) whereupon either f1f2 or f2f3.

As an example of Theorem , take S to be the discrete set {0,1} and X to be the nonnegative integers. Then SX is the set of binary sequences. As {0,1} is discrete, the inequality on {0,1} is a consistent tight apartness, so the inequality on the set of binary sequences is also a consistent tight apartness. However if the set of binary sequences were discrete, then we could establish LPO.
EXERCISES
1. Give an example (not Brouwerian) of a consistent apartness that is not tight.
2. Show that the set of binary sequences is discrete if and only if LPO holds.
3. A denial inequality that is not an apartness. Let A be the set of binary sequences. For x and y in A define x=y if there exists N such that xn=yn for all nN, and let xy be the denial inequality. Show that if this inequality is an apartness, then WLPO holds; show that if it is a tight apartness, then LPO holds.
4. A notty problem. A difference relation is a symmetric inequality such that any of the following three conditions holds:

(i) xz implies ¬ (¬ xy and ¬ yz)
(ii) ¬ xy and ¬ yz implies ¬ xz
(iii) xz and ¬ xy implies ¬¬ yz.
Show that these conditions are equivalent, and that an apartness is a difference relation.
5. Define a natural tight apartness on the set of detachable subsets of a set. Show that a subset A of a set S is detachable if and only if it has a characteristic function, that is, a function f from S to {0,1} such that
 A={s ∈ S : f(s)=1}.
6. Show that a function is a bijection if and only if it is one-to-one and onto.
7. Show that a finitely enumerable discrete set is finite. Construct a Brouwerian example of a finitely enumerable set, with a tight apartness, that is not finite. Show that a finitely enumerable set is bounded in number. Construct a Brouwerian example of a set that is bounded in number but is not finitely enumerable.
8. Show that a nonempty set A is countable if and only if there exists a function from N onto A. Show that a discrete set is countable if and only if it has the same cardinality as a detachable subset of N.
9. Show that a subset A of N is countable if and only if there is a detachable subset S of N×N such that AS, where π is the projection of N×N on its first component.
10. Show that the set of functions from a discrete bounded set A to {0,1} need not be discrete. (Hint: Let A be the range of a binary sequence)
11. Show that if a set S is bounded in number, then any one-to-one function from S to S is onto.
12. Give a Brouwerian example of a subset A of N such that it is contradictory for A to be finite, but A is not infinite.
13. Let S be a nonempty set with an apartness, and n a positive integer. Show that the following are equivalent.

(i) There exist elements x0,…,xn in S such that xixj for ij.
(ii) Given y1,…,yn in S, there exists z in S such that zyi for i=1,…,n.
14. A set with inequality is Dedekind infinite if it is isomorphic, as a set with inequality, to a proper subset. Show that any Dedekind-infinite set satisfies (i) of Exercise 13 for each n.
15. Call a set S ω-bounded if for each sequence {si} in S, there exists mn such that sm=sn. Show that if S is a discrete ω-bounded set, and {si} is a sequence in S, and m is a positive integer, then there is a finite set I of m positive integers such that si=sj if i and j are in I. Show that if A and B are discrete ω-bounded sets, then so is A×B.
3. CHOICE

The axiom of choice asserts the existence of a function with a certain property, so we might well suspect that its validity would be more dubious in a constructive setting where functions may be interpreted as algorithms. We phrase the axiom of choice as follows:
AXIOM OF CHOICE. Let A and B be sets, and S a subset of A×B such that for each a in A there is an element b in B such that (a,b) ∈ S. Then there is a function f : AB such that (a,f(a)) ∈ S for each a in A.
The axiom of choice can be criticized from the computational point of view on two grounds. The first concerns whether we can find an algorithm f (not necessarily extensional) with the required property. We have already come across this issue with the axiom of unique choice, and we take the position that the algorithm is inherent in the interpretation of the phrase "for each a in A there is an element b in B".
A more serious criticism is that, although we can find an algorithm f, we cannot find a function. In fact we can construct a Brouwerian counterexample to the axiom of choice.
Let α be a binary sequence, let A={x,y} with the equality on A defined by setting x=y if and only if there exists n such that αn=1, and let B={0,1}. Consider the subset S={(x,0),(y,1)} of A×B. Suppose f : AB satisfies (a,f(a)) ∈ S for each aA. If f(x)=f(y), then αn=1 for some n; if f(x) ≠ f(y), then αn=0 for all n.
There are two restricted versions of the axiom of choice which are commonly accepted in constructive mathematics. The weaker of these is:
COUNTABLE AXIOM OF CHOICE. This is the axiom of choice with A being the set of positive integers.
If A is the set of positive integers, then there is no real distinction between an algorithm and a function in the computational interpretation, as each integer has a canonical representation. So this axiom follows from the interpretation of the phrase "for all a there exists b" as entailing the existence of an algorithm for transforming elements of A to elements of B.
Even stronger than the countable axiom of choice is:
AXIOM OF DEPENDENT CHOICES. Let A be a nonempty set and R a subset of A×A such that for each a in A there is an element a' in A with (a,a') ∈ R. Then there is a sequence a0,a1,… of elements of A such that (ai,ai+1) ∈ R for each i.
The axiom of dependent choices implies the countable axiom of choice as follows. Suppose S is a subset of N×B such that for each n in N there is an element b in B with (n,b) ∈ S. Let A consist of all finite sequences b0,b1,…,bm in B such that (i,bi) ∈ S for each i, and let R consist of all pairs (α,α') of elements of A such that deleting the last element of α' gives α. Applying the axiom of dependent choices to R yields a sequence in A whose last elements form the required sequence in B.
The argument for the axiom of dependent choices is much the same as that for the countable axiom of choice. We shall freely employ these axioms, although we will often point out when they are used.
We shall have occasion to refer to the following axiom of choice for which we have no Brouwerian counterexample, yet which we believe is unprovable within the context of constructive mathematics.
WORLD'S SIMPLEST AXIOM OF CHOICE. Let A be a set of two-element sets such that if a1A and a2A, then a1=a2. Then there is a function f from A to {x : xa for some aA} such that f(a) ∈ a for each aA.

EXERCISES
1. Modify Example to show that the axiom of choice implies the law of excluded middle.
2. Show that LLPO, together with dependent choice, implies König's lemma (see Exercise 1.7).
3. Show that the axiom of choice implies the world's simplest axiom of choice.
4. A set P is projective if whenever π : AB is onto, and f : PB, then there exists g : PA such that πg=f. Show that finite sets are projective. Show that countable discrete sets are projective if and only if the countable axiom of choice holds. Show that if discrete sets are projective, then the world's simplest axiom of choice holds.
4. CATEGORIES

The collection of binary sequences forms a set because we know what it means for two binary sequences to be equal. Given two groups, or sets, on the other hand, it is generally incorrect to ask if they are equal; the proper question is whether or not they are isomorphic, or, more generally, what are the homomorphisms between them.
A category, like a set, is a collection of objects. An equality relation on a set constructs, given any two objects a and b in the set, a proposition 'a=b'. To specify a category C, we must show how to construct, given any two objects A and B in C, a set C(A,B). In concrete categories, the objects of C are mathematical structures of some kind, and the set C(A,B) is the set of maps from A to B that respect that structure: if C is the category of sets, then C(A,B) is the set of functions from A to B; if C is the category of groups, then C(A,B) is the set of homomorphisms from A to B.
When we abstract the notion of composition of maps from these concrete situations, we see that a category must have, for any three objects A,B,C in C, a function from C(A,B) ×C(B,C) to C(A,C), called composition and denoted by juxtaposition, and an element 1BC(B,B), such that if fC(C,D), gC(B,C), and hC(A,B), then

(i) 1Bh=h and g1B=g.
(ii) (fg)h=f(gh)
Every set S can be considered a category by setting
 S(a,b)={x ∈ {0} : a=b}.
The reflexive law for equality gives the element 1B, and the transitive law gives (ii).
The sets and functions introduced in the previous section constitute a category: the objects of this category are sets, and the maps are the functions between the sets. We may also consider the category whose objects are sets with inequality, and whose maps are strongly extensional functions. The idea of category theory is to forget about the internal structure of the objects and to concentrate on the way the maps combine under composition. For example, a function f from A to B is one-to-one if a1=a2 whenever f(a1)=f(a2). This definition relies on the internal structure of the sets A and B, that is to say, on the elements of A and B and the equality relations on A and B. The categorical property corresponding to a function f being one-to-one is that if g and h are maps from any set C to A, and fg=fh, then g=h; that is, f is left cancellable. It is routine to show that f is one-to-one if and only if it is left cancellable.
A map f from A to B is onto if for each b in B there exists a in A such that f(a)=b. The corresponding categorical property is that f be right cancellable, that is, if g and h are maps from B to any set C, and gf=hf, then g=h. The proof that a function f is right cancellable if and only if it is onto is less routine than the proof of the corresponding result for left cancellable maps.
A function is right cancellable in the category of sets if and only if it is onto.
Suppose f : AB is onto and gf=hf. If bB, then there exists a in A such that f(a)=b. Thus g(b)=g(f(a))=h(f(a))=h(b), so g=h. Conversely suppose f : AB is right cancellable, and let Ω be the set of all subsets of {0}. Define g : B→Ω by g(b)={0} for all b, and define h : B→Ω by
 h(b)={x ∈ {0} : b=f(a) for some a}.
Thus h(b) is the subset of {0} such that 0 ∈ h(b) if and only if there exists a such that b=f(a). Clearly gf=hf is the map that takes every element of A to the subset {0}. So g=h, whence 0 ∈ h(b), which means that b=f(a) for some a.

An isomorphism between two objects A and B of a category C is an element fC(A,B) such that there is gC(B,A) with fg=1B and gf=1A. The element g is called the inverse of f; it is easily shown to be unique. A bijection between sets is an isomorphism in the category of sets. We say that A and B are isomorphic, and write AB, if there is an isomorphism between A and B.
We will be interested mainly in categories of sets with algebraic structures, in which the maps are the functions that preserve those structures. In this case, the maps are called homomorphisms. If a homomorphism is one-to-one, it is called a monomorphism; if it is onto, it is called an epimorphism. A homomorphism from an object to itself is called an endomorphism, and an endomorphism that is an isomorphism is called an automorphism.
A functor T from a category A to a category B is a rule that assigns to each object AA an object T(A) ∈ B, and to each map fA(A1,A2) a map T(f) ∈ B(T(A1),T(A2)) such that

(i) T : A(A1,A2)→B(T(A1),T(A2)) is a function
(ii) T(fg)=T(f)T(g)
(iii) T(1A)=1T(A).
A functor between two sets, viewed as categories, is simply a function. Note that if f is an isomorphism, then so is T(f).
Using the notion of a functor, we can extend our definition of a family of elements of a set to a family of objects in a category C. Let I be a set. A family A of objects of C indexed by I is a functor from I, viewed as a category, to the category C. We often denote such a family by {Ai}iI. If i=j, then the map from Ai to Aj is denoted by Aji, and is an isomorphism.
An element of the disjoint union of a family {Ai}iI of sets is a pair (i,x) such that iI and xAi. Two elements (i,x) and (j,y) of the disjoint union are equal if i=j and Aji(x)=y. We identify Ai with the subset {(i,x) : xAi} of the disjoint union. Thus after constructing the disjoint union, we may consider the family {Ai}iI to be a family of elements of the power set of the disjoint union.
Let {Ai}iI be a family of sets, and let P be a set. Then a map from P to Ai may be identified with a map f from P to the disjoint union of {Ai}iI such that f(P) ⊆ Ai. Let F denote the set of maps f from P to the disjoint union of {Ai}iI such that f(P) ⊆ Ai for some i in I. By a family of maps πi from P to Ai we mean a family π of elements of F such that πi(P) ⊆ Ai for each i in I.
Let {Ai}iI be a family of objects in a category C. A categorical product of the objects Ai is an object P together with a family of (projection) maps πi from P to Ai such that for any object S and family of maps fi from S to Ai, there exists a unique map f from S to P such that πif=fi for each i in I. A categorical product is unique up to isomorphism in the sense that if (P',π') is another one, then there exists an isomorphism θ from P to P' such that πi'θ = πi for each i. If C is the category of sets, then it is easy to verify that the set of all functions λ from I to the disjoint union of {Ai}iI such that λ(i) ∈ Ai for each i, and πi(λ) defined to be λ(i), is a categorical product of the Ai, which we refer to as the product and denote by ΠiIAi.
If I={1,…,n}, then the product of the sets Ai is the Cartesian product A1×…×An. If Ai=S for each i in I, then we write the product, which is the set of functions from I to S, as SI, or Sn if I={1,…,n}.
EXERCISES
1. Show that a function is one-to-one if and only if it is left cancellable.
2. Show that the categorical product of a family of objects is unique up to isomorphism.
3. Show that, in the category of sets, the set of all functions λ from I to the disjoint union of {Ai}iI, such that λ(i) ∈ Ai for each i, is a categorical product of {Ai}iI
4. Let I be the set of binary sequences, and, for each i in I, let Ai be {x ∈ {0,1} : xij for all j}. Show that the natural map from ΠiAi to A0 is onto if and only if WLPO holds.
5. Consider the category of sets with inequality and strongly extensional functions. Show that the product ΠiAi in this category is the product in the category of sets equipped with the inequality λ ≠ μ if there exists i such that λ(i) ≠ μ(i). Generalize Theorem to this setting.
6. Let a be an object in a category A. Show how T(b)=A(a,b) is a functor from A to the category of sets. Such a functor T is said to be representable.
7. If C is a category, then the dual category C' is defined to have the same objects as C, but C'(a,b)=C(b,a). The coproduct of a family of objects of C is the product in the category C'. Describe the coproduct directly. What is the coproduct in the category of sets?
8. Direct limits. A direct system is a sequence of objects An and maps fn : AnAn+1. An upper bound of a direct system is an object B together with maps bn : AnB such that bn+1fn=bn for each n. A direct limit of a direct system is an upper bound L so that, for any upper bound B, there is a unique map μ : LB such that μln=bn for each n.

(i) Show that any two direct limits are isomorphic.
(ii) Show that the direct limit in the category of sets is the disjoint union of the An with the equality generated by requiring a=fn(a) for each aAn.
(iii) Show that the direct limit of discrete sets need not be discrete, but it is discrete if all the maps are one-to-one.
5. PARTIALLY ORDERED SETS AND LATTICES

A partially ordered set is a set P with a relation ab satisfying:

(i) aa,
(ii) if ab and bc, then ac,
(iii) if ab and ba, then a=b.
A map between two partially ordered sets P1 and P2 is a function f from P1 to P2 such that if ab, then f(a) ≤ f(b). For the most part we will be interested in discrete partially ordered sets; in this case we write a < b for ab and ab.
Let a, b and c be elements of a partially ordered set P. We say that c is the greatest lower bound, or infimum, of a and b, and write c=ab, if for each xL we have xc if and only if xa and xb. It is easily seen that such c is unique, if it exists. Similarly c=ab is the least upper bound, or supremum, of a and b if for each xL we have cx if and only if ax and bx.
A lattice is a partially ordered set in which any two elements have an infimum and a supremum. If S is a set, then the set of all subsets of S, ordered by inclusion, forms a lattice: the supremum of A and B is AB and the infimum is AB. The set of positive integers, ordered by setting ab if b is a multiple of a, is a lattice: the supremum of a and b is their least common multiple, the infimum is their greatest common divisor. Note that ab is decidable in a discrete lattice because it is equivalent to ab=a.
If a lattice has a least element, then we denote that element by 0; if the lattice has greatest element, we denote it by 1.
A lattice is distributive if it satisfies the identity
 a∧(b∨c)=(a∧b)∨(a∧c).
The lattice of all subsets of a set is distributive. The following 5-element discrete lattice is not distributive.
 ∗
 ⁄
 |
 \
 ∗
 ∗
 ∗
 \
 |
 ⁄
 ∗
A lattice is modular if a∨(bc)=b∧(ac) whenever ab. It is easily seen that any distributive lattice is modular; the 5-element nondistributive lattice shown above is also modular. If G is a finite abelian group, then the set of finite subgroups of G is a modular lattice, which is distributive only if G is cyclic. More generally, the set of submodules of an R-module form a modular lattice. The simplest nonmodular lattice is the following 5-element discrete lattice.
 ∗
 |
 \
 |
 ∗
 ∗
 |
 |
 ∗
 |
 ⁄
 ∗
If ab are elements of a partially ordered set P, then we use the interval notation [a,b] to denote the set {xP : axb}. If P is a lattice, then [a,b] is a lattice with the same suprema and infima as in P. A key fact about modular lattices is that [ad,d] and [a,ad] are isomorphic lattices for any elements a and d. We prove this in a slightly disguised form.
Let ab and cd be elements of a modular lattice L. Define
 f(x)=a∨(b∧x)=b∧(a∨x)

 g(y)=c∨(d∧y)=d∧(c∨y).
Then g maps [f(c),f(d)] isomorphically onto [g(a),g(b)] with inverse f.
It suffices to show that if cxd, then fgf(x)=f(x). We can write fgf(x) as
 fgf(x)=b∧(a∨c∨ (d∧b∧(a∨x)))
(∗)
or as
 fgf(x)=a∨(b∧d(c∨a∨ (b∧x))).
(∗∗)
To show that f(x) ≤ fgf(x), use (∗) and f(x)=a∨(bx). To show that fgf(x) ≤ f(x), use (∗∗) and f(x)=b∧(ax).

A subset C of a partially ordered set P is a chain if for each a and b in C, either ab or ba; if P itself is a chain, we say that P is linearly ordered. A maximal chain in a partially ordered set is a chain C such that C∪{a} is a chain only if aC. The simplest nonmodular lattice above has two maximal finite chains, one of length 2 and one of length 3. For modular lattices this can't happen. We say that two linearly ordered sets C and D are piecewise isomorphic if there exist elements c1,…,cn and d1,…,dn such that

(i) {xC : xc1} is isomorphic to {xD : xd1}
(ii) {xC : xcn} is isomorphic to {xD : xdn}
(iii) There is a permutation σ of {1,…,n−1} so that [ci,ci+1] is isomorphic to [dσi,dσi+1] for each i < n.
We leave the proof that piecewise isomorphism is a transitive relation as Exercise 4. If C and D are piecewise isomorphic discrete linearly ordered sets, then C and D have the same cardinality (Exercise 5).
[Jordan-Hölder-Dedekind]If a modular lattice has a maximal finitely enumerable chain X, then each finitely enumerable chain is contained in a maximal finitely enumerable chain that is piecewise isomorphic to X.
Let x0x1 ≤ … ≤ xm be the maximal chain X; we will refer to m as the formal length of X. It is readily verified that x0=0 and xm=1. Let y1 ≤ … ≤ yn be a chain Y. If x1y1=x1, then Y is contained in the lattice [x1,1]. By induction on m, the chain Y is contained in a maximal finitely enumerable chain in [x1,1] that is piecewise isomorphic to x1 ≤ … ≤ xm, and therefore in a maximal finitely enumerable chain that is piecewise isomorphic to X. Otherwise x1y1=0 and we have the following picture
 x1∨y1
 ⁄
 \
 x1
 y1
 \
 ⁄
 0
where [y1,x1y1] is isomorphic to [0,x1], and [x1,x1y1] is isomorphic to [0,y1]. By induction on m the chain x1x1y1 is contained in a maximal finitely enumerable chain in [x1,1], of formal length m−1, consisting of a maximal finitely enumerable chain C in [x1,x1y1] of formal length l, and a maximal finitely enumerable chain D in [x1y1,1] of formal length ml−1. The chain {y1}∪D is a maximal chain in [y1,1] of formal length at most ml so, by induction on m, Y is contained in a maximal chain in [y1,1] that is piecewise isomorphic to {y1}∪D. Lemma shows that the chain C is isomorphic to a maximal chain in [0,y1]. Thus Y is contained in a maximal chain that is piecewise isomorphic to X.

It follows from Theorem that if a discrete modular lattice has a maximal finite chain of length n, then any finite chain is contained in a maximal chain of length n.
A partially ordered set P satisfies the ascending chain condition if for each sequence p1p2p3 ≤ … of elements of P, there is n such that pn=pn+1; the descending chain condition is defined similarly. Classically, if P satisfies the ascending chain condition, then we can find n such that pm=pn for each mn. From a constructive point of view, even the two element set {0,1} fails to satisfy this form of the ascending chain condition.
We say that an element p of a partially ordered set P has depth at most n if whenever p=p0p1p2 ≤ … ≤ pn+1, then pi=pi+1 for some in. If P is discrete, we say that p has depth at least n if there exists a chain p=p0 < p1 < … < pn. An element has bounded depth if it has depth at most n for some n, finite depth if it has depth at most n, and at least n, for some n. Similar definitions apply to height instead of depth.
EXERCISES
1. Show that a lattice is discrete if and only if the relation ab is decidable.
2. Show that a lattice is distributive if and only it satisfies the identity a∨(bc)=(ab)∧(ac).
3. Let L be a modular lattice containing a maximal chain that is finite (denial inequality). Show that L is discrete.
4. Show that if two linearly ordered sets are piecewise isomorphic to a third, then they are piecewise isomorphic to each other.
5. Show that two piecewise isomorphic discrete linearly ordered sets have the same cardinality.
6. Two intervals A and B of a modular lattice are called transposes if they are of the form [a,ad] and [ad,d] (in either order), projective if there is sequence A=I1,…,In=B of intervals such that Ii and Ii+1 are transposes for i=1,…,n−1. Show that any two maximal finitely enumerable chains in a modular lattice are piecewise projective.
7. Show that a partially ordered set may be considered a category C in which the set C(a,b)={x ∈ {0} : ab}. What is the categorical description of the infimum of two elements?
8. Suppose that for each binary sequence a1a2a3 ≤ … we could find m such that an=am whenever nm. Conclude that LPO holds.
6. WELL-FOUNDED SETS AND ORDINALS

Let W be a set with a relation a < b. A subset S of W is said to be hereditary if wS whenever w' ∈ S for each w' < w. The set W (or the relation a < b) is well founded if each hereditary subset of W equals W. A discrete partially ordered set is well founded if the relation a < b (that is, ab and ab) on it is well-founded. An ordinal, or a well-ordered set, is a discrete, linearly ordered, well-founded set.
Well-founded sets provide the environment for arguments by induction. The prototype well-founded set is the set N of nonnegative integers, with the usual order.
The set N of nonnegative integers is well founded.
Let S be an hereditary subset of N. Then 0 ∈ S because the hypothesis w' ∈ S for each w' < 0 holds vacuously. From 0 ∈ S we conclude 1 ∈ S, from 0 ∈ S and 1 ∈ S we conclude 2 ∈ S, etc.

The set N of nonnegative integers, viewed as an ordinal, is denoted by ω. We shall see how to construct other well-founded sets from ω. First we observe that any subset of a well-founded set is well founded. More generally we have:
Let P and W be sets, each with a relation a < b, such that W is well founded. Let φ be a map from P to W such that φ(a) < φ(b) whenever a < b. Then P is well founded.
Let S' be an hereditary subset of P, and let S={wW : φ−1(w) ⊆ S'}. We shall show that S is hereditary, so S=W and therefore S'=P. Suppose vS whenever v < w. If x ∈ φ−1(w) and y < x, then φ(y) < w so φ(y) ∈ S, whence yS'. As S' is hereditary, this implies that xS' for each x in φ−1(w), so wS. Thus S is hereditary.

In particular, any subset of ω is an ordinal. The range of a binary sequence provides an example of an ordinal for which we may be unable to exhibit a first element. It follows from Theorem that any subrelation of a well-founded relation is well founded.
We say that a well-founded set, or relation, is transitive if a < b and b < c implies a < c. An example of a well-founded relation on N that is not transitive is constructed by taking defining a < b to mean a+1=b. This relation is well-founded by Theorem . An induction argument with respect to this relation is proof by induction as ordinarily defined; an induction argument with respect to the usual relation a < b is sometimes called proof by complete induction.
A discrete, transitive well-founded set admits a natural partial ordering by defining ab to mean a < b or a = b; the only nontrivial thing to check is that a = b if ab and ba, but this follows from the fact that a < a is impossible in any well-founded set (Exercise 1). Conversely, the relation a < b on a discrete partially ordered set is transitive.
One way to construct a well-founded set is by adding together previously constructed well-founded sets: we get the ordinal ω+ω by placing two copies of the nonnegative integers side by side. More generally, let I be a well-founded set, and let {Ai}iI be a family of well-founded sets indexed by I. Then the disjoint union ΣiIAi={(a,i) : aAi and iI} admits a relation:
 (a,i) < (b,j) if i < j or if i=j and a < b
If I={1,...,n}, with the usual order, we write A1+…+An. Note that if I and each Ai is discrete and transitive (and linearly ordered), then so is ΣiIAi.
If I is a well-founded set, and {Ai}iI is a family of well-founded sets indexed by I, then WiIAi is a well-founded set.
Suppose S is an hereditary subset of W. For each i in I, let Ai'={aAi : (a,i) ∈ S}, and let I'={iI : Ai'=Ai}. We shall show that I' is hereditary, so I'=I, which says S=W. Suppose i' ∈ I' for each i' < i. We shall show that Ai'=Ai by showing that Ai' is hereditary. Suppose a' ∈ Ai' for each a' < a. Then w' ∈ S for each w' < (a,i), so (a,i) ∈ S whence aAi'. Therefore Ai'=Ai as Ai is well-founded, so iI'.

Let {Ai}iI be a family of well-founded sets indexed by a discrete set I. We say that an element f of ΠiIAi has finite support if there is a finite subset J of I so that for each i in I, either iJ, or a < fi is impossible for any a in Ai (that is, fi is a minimal element of Ai). Note that if I is finite, then every element of ΠiIAi has finite support, while if some element has finite support, then all but finitely many of the Ai have minimal elements. If I is a well-founded set, then the last-difference relation on the elements of finite support in the product set ΠiIAi is defined by f < g if:

(i) there exists iI such that fi < gi, and
(ii) for each iI, either fi=gi, or fj < gj for some ji.
If I, and each Ai, is an ordinal, this may be described as ordering distinct elements according to the last place where they differ (reverse lexicographic order).
Let I be an ordinal, and {Ai}iI a family of well-founded sets indexed by I. Then the set F of elements of finite support in ΠiIAi is well founded under the last-difference relation.
We first note that the theorem is true when I={1,2}; in this case F=A1×A2bBA(b) where B=A2 and A(b)=A1 for each b in B. So by Theorem the last-difference product A1×A2 is well-founded if A1 and A2 are.
Adjoining a greatest element ∞ to I and setting A={0} does not change anything, so we may assume that I has a greatest element ∞. Let Fi be the set of elements of finite support in ΠjiAj and let I'={iI : Fi is well-founded}. We shall show that I' is hereditary, hence equal to I, so F=F is well founded.
Suppose kI' for each k < i. Let Fi be the elements of finite support in Πj < iAj. Then Fi=Fi×Ai, so Fi is well founded if Fi is. Write Fi=∪k < iGk where Gk consists of those elements of Πj < iAj of finite support J such that jk for each j in J . The projection from Gk to Fk preserves the relation x < y, so Gk is well founded by Theorem . If S is an hereditary subset of Fi, then SGk is an hereditary subset of Gk for each k < i, so Fi=S. Thus Fi, and therefore Fi, is well founded, so iI.

If I is an arbitrary set, and Ai is a partially ordered set for each i in I, then the categorical product of the Ai is the set ΠAi equipped with the partial order:
 f ≤ g if fi ≤ gi for each i in I
If I={1,…,n}, and each Ai is discrete and well founded, then the identity map of the categorical product into the last-difference product preserves order (but not nonorder), so the categorical product is well-founded by Theorem .
If α is an ordinal, and β is a well-founded set, then the well-founded set of functions with finite support from α to β is denoted by βα.
If λ and μ are ordinals, then an injection of λ into μ is a function ρ from λ to μ such that if a < b then ρa < ρb, and if c < ρb, then there is a ∈ λ such that ρa=c. We shall show that there is at most one injection from λ to μ.
If λ and μ are ordinals, and ρ and σ are injections of λ into μ, then ρ = σ.
Let S={a ∈ λ : ρaa}, and suppose aS for all a < b. If σb < ρb, then there is a ∈ λ such that ρab < ρb, so a < b, so ρaa; but ρab > σa, a contradiction. Similarly we cannot have ρb < σb. Therefore ρbb, so bS; thus S is hereditary, so S=λ.

If there is an injection from the ordinal λ to the ordinal μ we write λ ≤ μ. Clearly compositions of injections are injections, so this relation is transitive. By Theorem it follows that if λ ≤ μ and μ ≤ λ, then λ and μ are isomorphic, that is, there is an invertible order preserving function from λ to μ. It is natural to say that two isomorphic ordinals are equal.
EXERCISES
1. Show that a < a is impossible in any well-founded set.
2. If a < b is a relation on a set W, define the transitive closure relation a < b to mean a < b or there exists x1,…,xn such that a < x1 < x2 < … < xn < b. Show that a < b is well-founded if a < b is well-founded (mimic the proof that ordinary induction on N implies complete induction on N).
3. A relation a < b is acyclic if a < a is impossible for any a (see Exercise 2). Show that any acyclic relation on a two-element set is well founded. Show that any acyclic relation on a set that is bounded in number is well founded.
4. Show that each discrete, well-founded, partially ordered set satisfies the descending chain condition.
5. Let W be a well-founded set. Let S be a subset of W such that, given w in W, either wS or there exists w' < w such that wS if w' ∈ S. Show that S=W.
6. Show that a discrete partially ordered set that satisfies Exercise 5 for each subset W has the descending chain condition.
7. Show the converse of Exercise 6 holds (this uses dependent choice).
8. Let W be the nonnegative integers with a < b defined to mean ab and ba is odd. Show that the relation a < b is well founded but not transitive.
9. Show that the categorical product of partially ordered sets as defined in this section is the categorical product.
10. Show that if every nonempty ordinal has a least element, then LPO holds.
11. A rank relation on a discrete partially ordered set W is an ordinal A together with a subset R of W×A such that

(i) For each w in W there is a in A such that (w,a) ∈ R.
(ii) If v < w and (w,a) ∈ R, then there is b < a such that (v,b) ∈ R.
Prove Theorem for I a discrete, well-founded, partially ordered set with a rank relation.
12. A Grayson ordinal is a set W with a well-founded relation a < b satisfying:

(i) If a < b and b < c, then a < c (transitivity)
(ii) If c < a is equivalent to c < b for each c, then a=b (extensionality).
Define ab in a Grayson ordinal to mean c < a implies c < b for all c. Show that if the relation a < b is decidable, then W is a Grayson ordinal if and only if W is an ordinal. (Hint: prove that a < b or a=b or b < a in a decidable Grayson ordinal)
13. Let α be a binary sequence, and let S={x,y,z} with x=y if α is identically zero. Define a relation u < v on S by setting

(i) y < z
(ii) x < y if αn=1 for some n
(iii) x < z if x=y or x < y.
Show that this turns S into a Brouwerian example of Grayson ordinal with elements such that xy < z, but not x < z.
NOTES

Bishop's definition of when an object may be said to exist can be found in [Bishop 1967, page 8]. The idea that the notion of an algorithm is primitive has also been advanced by the Russian mathematicians Uspenskii and Semenov (1981):
"The concept of algorithm like that of set and of natural number is such a fundamental concept that it cannot be explained through other concepts and should be regarded as [an] undefinable one."
Attempts to explain constructive existence in classical terms are always somewhat unsatisfactory, but the classical notion of existence is no less mysterious than the constructive one, we are just more familiar with its use. What does it mean for a well-ordering of the real numbers to exist? or a basis of the reals as a rational vector space? or an automorphism of the field of complex numbers that takes e to π? or a noncomputable function? Formal systems that specify the correct use of the phrase "there exists" are available to the constructive mathematician as well as to his classical counterpart.
Our definition of a set is a combination of the formulations in [Bridges 1979, page 2] and [Heyting 1971, 3.2.1]. Many authors use the positive term inhabited to describe nonempty sets; this avoids possible confusion with the notion of a set that cannot be empty. The notion of a (tight) apartness is found in [Heyting 1971, 4.1.1]. The terminology "tight" is due to Scott (1979). Troelstra and van Dalen use the term "pre-apartness" to denote what we call an apartness. The part of Theorem 2.2 that states that if the inequality on S is tight, then so is the inequality on SN, is essentially [Bishop 1967, Lemma 5, page 24], which says that the natural inequality on the real numbers is tight.
A standard inequality on the set {0} is gotten by setting 0 ≠ 0 if LPO is false. As LPO is refutable in two main branches of constructive mathematics-intuitionism and Russian constructivism-we cannot show that this inequality is consistent. For more on intuitionism and Russian constructivism from the point of view of constructive mathematics see [Bridges-Richman 1987].
Difference relations are symmetric inequalities that satisfy
 ¬ x ≠ y and ¬ y ≠ z implies ¬ x ≠ z.
They were studied by van Rootselaar (1960) and by Olson (1977).
We could demand that every set come with an inequality, putting inequality on the same footing as equality; it would then be natural to demand that all functions be strongly extensional. With such an approach, whenever we construct a set we must put an inequality on it, and we must check that our functions are strongly extensional. This is cumbersome and easily forgotten, resulting in incomplete constructions and incorrect proofs. An example of the complications: if H is a subgroup of an abelian group G, then the inequality on G/H as a group may differ from the inequality on G/H as a set because the group operation on G/H need not be strongly extensional with respect to the latter inequality (unless the inequality on G is decidable)-see Exercise II.1.6.
Our definition of a subset agrees with the informal treatment in [Bishop 1967, page 32]. A categorical definition of a subset is found in [Bishop 1967, page 63] where a subset of a set S is defined to be a set A together with a one-to-one map from A to S. The categorical definition is attractive for constructive mathematics, where it is important to keep in mind that an element of a subset, by virtue of belonging to the subset, carries implicitly the additional data that establishes its membership in the subset. The categorical approach allows us to make this additional data explicit. For example, if S is the set of binary sequences, and A is the subset of S consisting of those sequences α such that αm=1 for some m, then to specify a member of A we must not only construct a sequence α in S, but also an integer m for which αm=1. Thus an element of A may be viewed as a pair (α,m), two pairs (α,m) and (α',m') being equal if α = α'. The map from A to S that takes (α,m) to α is one-to-one but is not really an inclusion map as it forgets the additional datum m. We find the informal approach more natural.
The axiom of unique choice allows us to identify a function with its graph; Myhill has called this the axiom of nonchoice. It is easy to see that the axiom of unique choice is equivalent to either of the following.

(i) Every one-to-one onto function has an inverse.
(ii) If S is a set, and S is the set of one-element subsets of S, then S has a choice function: that is, a function f from S to S such that f(x) ∈ x for each x in S.
Bishop employs the notion of a nonextensional function or operation. In almost all applications, one can consider an operation from a set A to a set B to be a function from A to the set of nonempty subsets of B.
Our definition of a finite set differs from that in [Bishop 1967], [Bridges 1979] and [Bishop-Bridges 1985] in that we consider the empty set to be finite. Another, more subtle, difference is that we require that finite sets be discrete with respect to their given inequality. Thus a set S of functions between two discrete sets is finite only if for each f and g in S, either f=g or there exists x such that f(x) ≠ g(x).
The functions that we call onto are called surjective in [Bishop 1967] where the word onto is reserved for a map f from A to B that has a cross-section, that is, for which there exists a function g from B to A such that fg is the identity map on B. In [Bishop 1967] a finitely enumerable set is called subfinite, and a set is said to have at most n elements if it can be written in the form {x1,…,xn}. The term 'subfinite' suggests a subset of a finite set to us, while the [Bishop 1967] usage of 'at most' precludes saying that each subset of {1,...,n} contains at most n elements.
Greenleaf (1981) examines cardinality of sets, and related questions, from a constructive point of view.
Our definition of a countable set differs from that of [Bishop 1967], [Bridges 1979] and [Bishop-Bridges 1985] in that we do not require that a countable set be nonempty (or even that we can decide whether or not it is empty); for discrete sets, it is equivalent to the definition in [Brouwer 1981].
We are unlikely to be able to construct a Brouwerian example of a subset of N that is not countable. However, any acceptable proof of the theorem T that every subset of N is countable could probably be converted into a proof that every subset of N is recursively enumerable, which is false. A well-known variant of T is Kripke's schema, which states that for each proposition P there exists a binary sequence α such that P holds if and only if αn=1 for some n. Kripke's schema has some plausibility within Brouwer's framework of the creating subject, where we imagine the idealized mathematician as effecting a nonpredeterminate sequence of attempts at proving P, and we keep in mind the intuitionistic criterion that to prove \lnot P is to show how to convert any proof of P into a contradiction.
Bishop used an italicized not to indicate the existence of a Brouwerian counterexample. For example, he would say "there is an inequality that is not an apartness" to mean "if every inequality were an apartness, then LPO holds". We shall not employ this convention.
The lesser limited principle of omniscience (LLPO) was introduced in [Bishop 1973]. Both LPO and LLPO have simple interpretations in terms of real numbers: LPO is equivalent to the statement that, for each real number x, either x ≤ 0 or x > 0; LLPO is equivalent to the statement that, for each real number x, either x ≤ 0 or x ≥ 0.
Markov's principle states that if α is a binary sequence, and αn=0 for all n is impossible, then there exists n such that αn=1. The idea is that we can construct the number n by successively computing α1, α2, α3, … until we get a 1. Markov's principle is employed by the Russian constructivist school, which is a constructive form of recursive mathematics; for details see [Bridges-Richman 1987]. An argument against Markov's principle is that we have no prior bound, in any sense, on the length of the computation required to construct n. We regard Markov's principle as an omniscience principle.
The position that "for all a in A there exists b in B" entails the existence of an algorithm (not necessarily extensional) from A to B is suggested by the assertion in [Bishop 1967, page 9] that "A choice function exists in constructive mathematics, because a choice is implied by the very meaning of existence." Bishop defines a function f from B to A to be surjective if there is an algorithm g from A to B such that f(g(a))=a for all a in A.
Our Brouwerian counterexample to the axiom of choice, and Exercise 3.1 that the axiom of choice implies the law of excluded middle, is due to Myhill and Goodman (1978). An earlier proof, in a topos-theoretic setting, was given by Diaconescu (1975) who shows that the axiom of choice implies that every subset A of a set B has a complement in the sense that there is a subset A' such that B=AA' and AA'=ϕ.
Fourman and Scedrov (1982) have shown, using topos-theoretic methods, that the world's simplest axiom of choice is not provable in intuitionistic set theory with dependent choice.
The arguments against countable choice, and dependent choices, must be based on more fundamental grounds than we used for the axiom of choice. That is, we must question the interpretation of the phrase "for all a there exists b" as entailing the existence of an algorithm for transforming elements of A to elements of B. One reason for rejecting this interpretation is that by so doing our theorems will also be theorems in other (unintended) models, such as arise in topos theory, which are of interest in classical mathematics (see, for example, [Scedrov 1986]). A more relevant reason, perhaps, is that arguments that depend essentially on this interpretation often have an unsatisfying feel to them which seems to be connected with the gratuitous "completion of an infinity" that occurs when we subsume a potentially infinite number of items of information into a single algorithm.
A rank function on a well-founded set W is a map φ from W to an ordinal A such that φ(x) < φ(y) whenever x < y. It seems unlikely that we can always construct a rank function although, classically, every well-founded set admits a unique smallest rank function. Induction on rank is a common technique in the classical theory (see Exercise 6.11).

File translated from TEX by TTH, version 4.03.
On 06 May 2012, 20:45.