Tarski Formalization Project: Strategies
We found proofs of all the approximately 200 theorems that we attempted. These proofs fell into the following categories:
- Some were found by OTTER without any special strategy, just using standard values for OTTER parameters like
max_weight
, max_distinct_variables
, etc. Most of these were found in a few minutes, although
a few took closer to an hour.
- Some theorems were found using the diagram strategy. That refers to defining one (or a few) terms corresponding to
the construction of certain points, usually indicated in a geometrical diagram in the book. Such proofs do not rely
on the steps of the book proof; they are found without further use of the book proof, once the diagram is given.
- Proofs found by the subformula strategy also do not rely on the book proof. Since we used this strategy mainly
on difficult theorems, we usually used the diagram too.
- When we could not find a proof without reference to the steps of the book proof, we put (at least some of) those
steps in as intermediate goals; that is, in
list(passive)
). We used lemma adjunction and hint injection,
and often case elimination, to get a proof "by hook or by crook". (All these terms are explained below and in our paper.)
When iteration of this process finally produced a proof, we kept the steps as hints, so that the proof can be quickly
reproduced.
Below we describe each of these strategies briefly; lengthier discussions can be found in our paper.
The diagram strategy
The diagram strategy works because
the defined point is denoted by a single letter instead of a term using, for example, 8 symbols, so the "weight" of such
clauses is smaller by 7, making them less likely to be discarded as exceeding max_weight
and more likely
to be used sooner rather than later once they are kept. This strategy was pioneered by Art Quaife in the 1990s.
Please see our paper for an illustrative example, namely, the derivation of the "crossbar theorem" using Pasch.
The subformula strategy
The subformula strategy is very simple: just include as hints
all subformulas of the formulas in the theorem one is trying to prove. By "include a clause as hints", we mean
to place that clause in list(hints)
or list(hints2)
. Either one results in the clause
being given a low weight, making it more likely to be kept, and more likely to be used sooner rather than later.
The same is also true of formulas that subsume a hint. There is a technical difference between hints
and that is explained in the OTTER manual, and also mentioned in our paper; researcher planning to use these
techniques should be aware of the difference, which will not be very important to readers not intending to conduct
their own experiments.
We had several spectacular successes with the subformula strategy, described on other pages of this website.
Most of the spectacular successes required more than a few minutes to find the proof. To achieve these successes,
we set max_weight
rather low, typically 8. Combined with the subformula strategy, this means
that we are searching for a proof whose steps consist mostly of subformulas of the theorem to be proved, involving just one
literal that is not a subformula of the theorem, or perhaps two short such literals. That this succeeds is
amazing; but remember that the diagram is usually included as part of the theorem statement in these problems,
so short formulas about the points constructed in the diagram are allowed. In other words, we are looking for a proof
that involves short statements about the points constructed in the diagram and the original points, without constructing
new points or mentioning other predicates not mentioned in the theorem.
Lemma adjunction and hint injection
After giving up on finding a proof without using the book proof, our next step was to put the (important)
steps of the book proof into list(passive)
as intermediate goals. For example, if the first step
were $T(a,b,c)$, we would place -T(a,b,c) | $ANS(1).
in list(passive).
. When
proofs were found of such steps, their steps would be put in as hints. We called this hint injection.
Then the process could be iterated. For example, we might get four steps immediately, and the fifth after five minutes.
Then, after hint injection, we would get all five immediately, and three more after five minutes. Eventually this
iteration led either to a proof of the theorem, or to a stalemate situation, in which no more intermediate goals would
be proved (say in an hour).
At that point we turned to lemma adjunction, by which we mean inserting one of the intermediate steps $Q$
that we could not prove as a new hypothesis. If we were lucky that would result in a proof (of the theorem from
the new hypothesis $Q$). If not we could continue with further intermediate goals and hint injection. Eventually,
in many cases, we would obtain a proof of the theorem from (one or more) additional hypotheses $Q$.
Then we would put in the steps of that proof as hints, and then replace $Q$ in the list of assumptions by $\neg Q$.
If that resulted in a proof we would also place the steps of that proof in as hints, and try for a proof without
either assumption $Q$ or $\neg Q$. If that failed we would be in a position to use the last strategy,
case elimination.
Case elimination
As explained above, we often ended up with a proof of some desired theorem $A$ from a hypothesis $Q$, and
another proof of $A$ from $\neg Q$, but even with both proofs inserted as hints, a proof of $A$ was not
forthcoming. In that case sometimes we could succeed by including the tautology Q | -Q
as a
(harmless) assumption; sometimes this worked after hours of runtime. If it did not work, then we resorted
to a more elaborate strategy, described in detail in our paper, that involves first seeking a proof of one case
that only uses binary resolution and paramodulation (not hyperresolution and unit resolution). For example,
Satz 10.2b resisted us for a long time, but eventually we were able to combine the cases. There are, however,
other cases (such as the last theorem, Satz 12.11, Hilbert's parallel axiom), where we could not combine the cases.
Of course, then one can just formulate the two cases as separate input files and derive the theorem in two steps
from the two cases, but we find this less beautiful than having the input files correspond directly to the book theorems.