Lambda Calculus

Lambda calculus is a large and complicated subject. The "standard reference" is [1]. This essay gives the basic definitions and a few main points about the subject.

Lambda and Application

Lambda calculus, or calculus, is a theory of "rules". Whether these rules are programs or some kind of more general (non-computable) "function" we do not specify; we will just write down some axioms about these "rules". We assume that rules are just another kind of "object"; and every object is thought of as some kind of (possibly trivial) rule. We could always make that later condition true by arbitrarily defining every non-rule to behave as a rule in some arbitrary way, for instance to be a constant function returning itself. With this picture it makes sense to have a function symbol Ap and write Ap(f,x) for the result of applying rule (or object) f to object (or rule) x. The result is another object, which may of course be a rule. The distinction between rules and objects is dropped completely in the formalism. The second fundamental symbol in the lambda calculus is λ. It is used to construct rules. You can read it mentally as "as a function of", until you get used to it. For instance, the constant function with value c is &lambda x. c. We write that in computer-readable form ("ascii form") as lambda(x,c). The identity function is λx.x, or in ascii form as lambda(x,x). [We will usually use ascii notation, for two reasons: it's easy to render in HTML, and we have to use it in computer input files, and it's better to stick primarily with one notation.]

The connection between Ap and lambda is illustrated by this example: Ap(lambda(x,x),c) = c. This says, if we apply the identity function to c, we get c.

Ap is an ordinary function symbol, like any function symbol in logic, so Ap(f,x) always has a value. Thus in lambda calculus, all functions are "total", i.e. everywhere defined. There is no such thing as an undefined expression. There is a theory called "partial lambda calculus", but it is much newer and less studied than lambda calculus.

Lambda terms can be nested. For example, lambda(x, lambda(y,x)) is the function whose value at any x is the constant function with value x.

The variable x in lambda(x,t) is bound. That means, intuitively, that the value of the expression does not depend on the value of x. For example, lambda(x,x) and lambda(y,y) both denote the identity function. They are said to "differ by renaming bound variables". A fancier term for this is "alpha-equivalence". The reason for using a fancier term is that not every renaming of bound variables is allowed. You cannot rename a bound variable using the name of a variable that already occurs free in the scope of the binding. (The"scope" of the binding is the lambda term whose first argument is the bound variable in question.) For example, in lambda(x,y), you cannot rename x to y. As an exercise, students should stop here to give a precise definition of alpha equivalence.

The first axiom of the lambda calculus is the "alpha" axiom: t=r if t and r are alpha-equivalent.

Substitution of terms for variables is slightly tricky to define (or implement) correctly, because we have to avoid "accidental capture" of free variables. We use the notation t[x:=r] to denote the result of substituting term r for variable x in term t. Suppose t is lambda(y,x), the constant function with value x. What is t[x:=y]? It is not lambda(y,y), as would be obtained by a simple text-for-text replacement of x by y. No, this is the idenity function, not the constant function with value y. The correct answer is lambda(z,y). The substitution takes place in two steps. First t is replaced by an alpha-equivalent term (in this case lambda(z,x)) in which the bound variables are all distinct from any variables in r or any free variables in t. After that, x is replaced (textually) by the replacement term r (in this case y).

Beta reduction

The connection between lambda and Ap is given by the "beta" axiom:

Ap(lambda(x,t),r) = t[x:=r].

If we use this rule left-to-right to "simplify" lambda-terms, the process is called "beta reduction". Reduction can be applied to any subterm of a term. For example, Ap(lambda(f,Ap(f,x)), lambda(x,x)) beta reduces in one step to Ap(f,x)[f:lambda(x,x)], which is Ap(lambda(x,x),x). This in turn beta reduces in one step to x. So Ap(lambda(f,Ap(f,x)), lambda(x,x)) beta reduces to x.

A term which does permit any beta reductions is called normal. A term which beta reduces to a normal form is said to have a normal form or to be normalizable.

Not every term does have a normal form. For example, let ω = lambda(x,Ap(x,x)), and let Ω = Ap(ω,ω). Then Ω can be beta reduced to itself, as follows: Ω = Ap(ω,ω) = Ap( lambda(x,Ap(x,x)),ω ) = Ap(ω,ω) = Ω. That is also the only way to beta reduce Ω, so every possible way to beta reduce this term leads to the same term again; the chain of beta reductions never comes to an end.

The Church-Rosser theorem

A term of the form Ap(lambda... (i.e. one that matches the left side of the beta axiom) is said to permit beta reduction, or to be a redex. There could be several different subterms of a term t, all of which are redexes. We could perform first one of those, or first another. Thus the same term t could beta-reduce to different terms r1 and r2. If the redexes are all disjoint, we could just perform the remaining reductions, and we would find that both r1 and r2 reduce to the same term r. But if the redexes are nested, this is not quite so obvious. Nevertheless it is true, although a correct proof is quite complicated. The theorem is known as the Church-Rosser theorem:

Theorem. [Church-Rosser] If t beta reduces to r1 and t beta reduces to r2 then there exists a term r such that both r1 and r2 beta reduce to r.

Note that the beta reductions involved can be longer than one step.

Equality in the lambda calculus refers to the least equivalence relation generated by the beta-axiom, the alpha-axiom, and closed under substitution of equals for equals. One can give a system of rules of inference that permit the deduction of t=s exactly when t and s are equal in the sense defined here.

Fixed points and recursion

A fundamental theorem about lambda calculus permits the definition of functions by recursion. The proof is a deep mystery, in the sense that it is very short, but still mysterious. You can and should memorize it but you will always wonder if you have really understood it. The first level of understanding is to be able to reproduce the proof quickly on demand.

Theorem. [Fixed point theorem] Let H be any lambda term. Then there is a lamba term f such that f = H(f).

Proof. Let ω = lambda(x,Ap(H,Ap(x,x))). Define f = Ap(ω,ω). Then

Ap(H,ω) = Ap(lambda(x,Ap(H,Ap(x,x))),ω) = Ap(H,Ap(ω,ω)) = Ap(H,f).

QED.

Corollary. Let G be any lambda term. Then there is a lambda term f such that Ap(f,x) = G(f,x).

Proof. Let H(f) = lambda(x,G(f,x)) and apply the theorem to get f such that f = H(f). Then Ap(f,x) = Ap(H(f),x) = Ap(lambda(x,G(f,x)),x) = G(f,x).

This corollary is often used to define f by recursion.

Church numerals

Church showed that the natural numbers can be defined in lambda calculus. His idea is to define n to be the function, or perhaps a better word is functional, that takes a function f as argument and iterates f n times. More formally, let CN(n) be the n-th Church numeral:

CN(0) = lambda(f,lambda(x,x)).
CN(n+1) = lambda(f,lambda(x, Ap(f,Ap(Ap(CN(n),f),x)))).

To make this a bit clearer we temporarily write Ap(f,x) as fx. Then

CN(3) = lambda(f,lambda(x,fffx)).
CN(4) = lambda(f,lambda(x,ffffx)).
CN(5) = lambda(f,lambda(x,fffffx)).

Although these are known as Church numerals, it was Rosser who constructed lambda terms that define addition, multiplication, and exponentiation with respect to these "numerals". (See [1], bottom of page 144.)

The Church numerals are of some theoretical importance, because they can be used to define the concept of "lambda-definable function" on the natural numbers. F is called lambda-definable if there is a lambda term f such that for all natural numbers n,

f(CN(n)) = CN(F(n)).

It turns out that the lambda-definable functions are exactly the Turing computable total functions on the natural numbers, and also exactly the general recursive functions. These theorems provided evidence that this class of functions was indeed the correct formalization of the intuitive notion of "computable function".

In the theory of automated deduction, the Church numerals figure in certain proofs, for example in the proof of the undecidability of second-order and third-order unification, and hence of lambda-unification.