Is the S combinator universal?

Wolfram has issued a [challenge](https://writings.stephenwolfram.com/2021/06/1920-2020-and-a-20000-prize-announcing-the-s-combinator-challenge/?ref=josephnoelwalker.com) to prove that the S combinator from the [SKI combinator calculus](https://en.wikipedia.org/wiki/SKI_combinator_calculus) is Turing-complete. This is right up my alley, so I'm going to investigate and post whatever I learn on this page. The first thing to note is that it is decidable whether a particular S-expression halts, so if S-expressions are to be Turing complete, both halting and non-halting programs must be embedded within the non-halting S-expressions. This doesn't seem like a huge problem, but it's possible that using an external procedure to determine when the system has completed its computation could improperly imbue a non-Turing-complete system with Turing-completeness, so some caution is necessary. ## Basics The S combinator is defined as a transformation between two binary trees: ``` S x y z = x z (y z) ``` Combinators are left-associative, so we can leave out parentheses on the left side of the binary tree (my code parenthesizes things fully, but just be aware that this convention exists). One simple fact is evident: as a computation proceeds, the number of leaves on the binary tree cannot decrease. This implies that any computations that use intermediate data will probably generate a bunch of garbage that sticks around until the end. When evaluating an S-expression, it may be that the above rule matches in multiple places. In these cases, it's legal to apply the rule to any of the locations, and these may affect your results. My experiments are set up to replace the leftmost match. ## Numerals and a simple counter Wolfram mentions the following as a simple example of a non-terminating expression: ``` (((S (S (S S))) (S (S S))) (S S)) ``` Here's the first 5 steps in the evaluation of this expression. This is unambiguous because this expression happens to only have one possible reduction at each step. ``` (((S (S (S S))) (S (S S))) (S S)) (((S (S S)) (S S)) ((S (S S)) (S S))) (((S S) ((S (S S)) (S S))) ((S S) ((S (S S)) (S S)))) ((S ((S S) ((S (S S)) (S S)))) (((S (S S)) (S S)) ((S S) ((S (S S)) (S S))))) ((S ((S S) ((S (S S)) (S S)))) (((S S) ((S S) ((S (S S)) (S S)))) ((S S) ((S S) ((S (S S)) (S S)))))) ((S ((S S) ((S (S S)) (S S)))) ((S ((S S) ((S S) ((S (S S)) (S S))))) (((S S) ((S (S S)) (S S))) ((S S) ((S S) ((S (S S)) (S S))))))) ``` Let's define two new symbols to make the patterns more evident: ``` Z = S S M = S Z Z ``` Substituting into the above, and adding another 5 lines: ``` (((S (S Z)) (S Z)) Z) (M M) ((Z M) (Z M)) ((S (Z M)) (M (Z M))) ((S (Z M)) ((Z (Z M)) (Z (Z M)))) ((S (Z M)) ((S (Z (Z M))) ((Z M) (Z (Z M))))) ((S (Z M)) ((S (Z (Z M))) ((S (Z (Z M))) (M (Z (Z M)))))) ((S (Z M)) ((S (Z (Z M))) ((S (Z (Z M))) ((Z (Z (Z M))) (Z (Z (Z M))))))) ((S (Z M)) ((S (Z (Z M))) ((S (Z (Z M))) ((S (Z (Z (Z M)))) ((Z (Z M)) (Z (Z (Z M)))))))) ((S (Z M)) ((S (Z (Z M))) ((S (Z (Z M))) ((S (Z (Z (Z M)))) ((S (Z (Z (Z M)))) ((Z M) (Z (Z (Z M))))))))) ((S (Z M)) ((S (Z (Z M))) ((S (Z (Z M))) ((S (Z (Z (Z M)))) ((S (Z (Z (Z M)))) ((S (Z (Z (Z M)))) (M (Z (Z (Z M)))))))))) ``` Simpler, but still not obvious. Before I can explain how this works, I'll need you to understand what [Church numerals](https://en.wikipedia.org/wiki/Church_encoding) are. Basically, they're a representation of the natural numbers such that each number $n$ corresponds to some function composed $n$ times. In our case, the mapping looks like this: ``` 0 = M 1 = (Z M) 2 = (Z (Z M)) ``` and so on. So if I substitute the Church numerals with their natural number counterparts in the above expression, it looks like this: ``` (((S (S Z)) (S Z)) Z) (0 0) (1 1) ((S 1) (M 1)) ((S 1) (2 2)) ((S 1) ((S 2) (1 2))) ((S 1) ((S 2) ((S 2) (0 2)))) ((S 1) ((S 2) ((S 2) (3 3)))) ((S 1) ((S 2) ((S 2) ((S 3) (2 3))))) ((S 1) ((S 2) ((S 2) ((S 3) ((S 3) (1 3)))))) ((S 1) ((S 2) ((S 2) ((S 3) ((S 3) ((S 3) (0 3))))))) ``` You can ignore the first line; the pattern really starts from line two. What happens is that the term `(S n)` is inserted $n$ times, and then $n$ is incremented. So we get two copies of `(S 2)`, three of `(S 3)`, and so on. Unfortunately, this system won't help our search for Turing-completeness much, because I can easily write a formula that directly predicts any step of the computation (so nothing undecidable is going on; a prerequisite for Turing-completeness). Nonetheless, it's good for intuition to understand what's going on, so let me make the behavior more explicit: $0$ (also known as `M`) *increments* the numeral after it, and also duplicates it. For a number $n$: ``` M n = (n + 1) (n + 1) ``` So on the second and third lines we go from `(0 0)` aka `(M M)` to `(Z M) (Z M)` aka `(1 1)`. Then, observe that `Z x y = S y (x y)` by definition of `Z` and `S`. Since `Z` is part of the numeral itself, the result of the previous step `(n + 1) (n + 1)` reduces to `((S (n + 1)) ((n) (n + 1)))`. In this step, the term `(S (n + 1))` is inserted, and the left numeral in the final pair is decremented; this happens n + 1 times, until the left numeral is $0$ (`M`). Then, according to the M rule above, the numeral on the right (n + 1) is incremented and duplicated, restarting the process. (to be continued...)