Showing posts with label computability. Show all posts
Showing posts with label computability. Show all posts

Ordinals and well-founded recursion

 Two closely related questions about ordinals that I found quite confusing at first and couldn't find a satisfactory answer online (self-answering):

  • I've heard sentences like "$\omega^{CK}$ is the least ordinal that cannot be represented on a computer". What does this mean, precisely? Surely after exhausting notations for all computable ordinals, I could just represent $\omega^{CK}$ with something else? E.g. even in the Kleene's O notation, I could just denote $\omega^{CK}$ by $-1$ or something?
  • We say things like "the strength of a theory corresponds to the smallest recursive ordinal it cannot prove well-founded" -- but _we know_ every ordinal is well-founded, right? So are we better than every theory?



implementing the ordinals

The thing that really clarified this for me was to actually implement ordinals in Python. Implementations in Haskell are plentiful of course (e.g.) but Haskell doesn't feel as grounded, since you don't actually see the computations being done.

from typing import Union, Callable
from __future__ import annotations

class Ordinal:
  def __init__(self, children : None | Ordinal | Callable[[int], Ordinal]):
    self.children = children

That's it. An ordinal is defined by its "children" -- if that's None, it's 0; if that's a single ordinal, it's the successor of that ordinal; if that's an infinite sequence of ordinals, it's the limit of that sequence.

You can then define some obvious operations.

Zero = Ordinal(None)

def Succ(x : Ordinal) -> Ordinal:
  return Ordinal(x)

def Lim(xs : Callable[[int], 'Ordinal']) -> Ordinal:
  return Ordinal(xs)

def Add(x : Ordinal, y : Ordinal) -> Ordinal:
  if not y.children:
    return x
  elif type(y.children) == Ordinal:
    return Succ(Add(x, y.children))
  else:
    return Lim(lambda n : Add(x, y.children(n)))

def Mul(x : Ordinal, y : Ordinal) -> Ordinal:
  if not y.children:
    return Zero
  elif type(y.children) == Ordinal:
    return Add(Mul(x, y.children), x)
  else:
    return Lim(lambda n : Mul(x, y.children(n)))

def Pow(x : Ordinal, y : Ordinal) -> Ordinal:
  if not y.children:
    return Succ(Zero)
  elif type(y.children) == Ordinal:
    return Mul(Pow(x, y.children), x)
  else:
    return Lim(lambda n : Pow(x, y.children(n)))

And represent some small ordinals:

# ordinal corresponding to some finite n
def Fin(n : int) -> Ordinal:
  x = Zero
  for i in range(n):
    x = Succ(x)
  return x

Omega = Lim(Fin)

# ordinal corresponding to omega^n
def Foo(n : int) -> Ordinal:
  x = Succ(Zero)
  for i in range(n):
    x = Pow(Omega, x)
  return x

Epsilon0 = Lim(Foo)

explicitly seeing them as functions

What stops us from making an infinite sequence of all (computable) ordinals? That would be paradoxical, of course -- if you had an ordinal whose children/descendants included itself, its definition would not be a well-founded recursion, it would be circular. But it's easy to enumerate all programs, is it really so hard to enumerate specifically programs that represent objects of the Ordinal class?

Let's think of a way to reduce an ordinal into a simple-to-understand program. The natural way to do this is:

def call(x : Ordinal) -> Callable | None:
  if not x.children:
    return None
  elif type(x.children) == Ordinal:
    return lambda: call(x.children)
  else:
    return lambda n: call(x.children(n))

So e.g. call(Epsilon0) returns a multivariable function that you can call as follows:

$$\epsilon_0(2)=\omega^\omega$$

$$\omega^\omega(4) = \omega^4$$

$$\omega^4(5) = \omega^3\cdot 5$$ 

$$\omega^3\cdot 5(2) = \omega^3\cdot4+\omega^2\cdot2$$

$$\omega^3\cdot4+\omega^2\cdot2(4) = \omega^3\cdot4+\omega^2+\omega\cdot4$$

$$\omega^3\cdot4+\omega^2+\omega\cdot4(3)=\omega^3\cdot4+\omega^2+\omega\cdot3+3$$

$$\omega^3\cdot4+\omega^2+\omega\cdot3+3()()()=\omega^3\cdot4+\omega^2+\omega\cdot3$$

... and ultimately $\epsilon_0(2)(4)(5)(2)(4)(3)()()()(1)()(1)()(1)()(2)(2)()()(1)()(0)(0)(0)(1)(2)(3)()()()(2)()()=0$, which you can check by running

print(call(Epsilon0)(2)(4)(5)(2)(4)(3)()()()(1)()(1)()(1)()(2)(2)()()(1)()(0)(0)(0)(1)(2)(3)()()()(2)()())

You may think this is quite long, but I've actually chosen fairly small numbers at each step. The ridiculous thing is that despite these paths being absurdly long -- all of them do, in fact, terminate! This is a statement that should initially surprise you by the way -- you might think you can just provide larger and larger numbers at each step and get a non-terminating sequence, but nope.

$$\epsilon_0(2)(3)(4)(5)(6)()^6(7)()^7(8)()^8(9)()^9(10)(11)(12)()^{12}(13)()^{13}\dots(22)()^{22}(23)(24)^{24}(25)^{25}\dots(45)()^{45}(46)(47)(48)()^{48}(49)^{49}\dots(94)()^{94}=0$$

In general, the main point is that these sequences always terminate, i.e. the ordinals represent total functions. But given an arbitrary such function, it is not clear if it represents an ordinal. This is still a bit of a claim to unpack, at least to me, so it is helpful to think of ordinals from an entirely new perspective--


nested for loops

Remember the first time you actually bothered to implement recursion, because nested for loops wouldn't suffice? while loops would do it, of course, but often that's inelegant.

Generally speaking, if the only function we are allowed to define without recursion is the successor function (as it is the case with one nested `for` loop is sufficient to define addition $f_1(n)$, two to define multiplication $f_2(n)$, three to define exponentiation $f_3(n)$, and so on to define any hyperoperation.

And indeed this demonstrates that the hyperoperations partition (in terms of growth rate) all the primitive recursive functions -- but there are total computable functions that grow faster than any hyperoperation (i.e. than any primitive recursion), which you can define by diagonalization: i.e. $f_\omega(n)$, which uses $n$ for loops where $n$ is the value inputed (this is basically the "Ackermann function"), and you can go further and further. This is called the fast-growing hierarchy.

Quite generally such total computable functions can be written recursively, by Kleene's fixed-point theeorem (such a function is the "fixed point" of an operator like $\lambda (f, x) \mapsto xf(x-1)$ or whatever). What seems to be a good reference:

Fairtlough & Wainer (1992). Ordinal Complexity of Recursive Definitions.

TL;DR: Specifying an ordinal is equivalent to specifying a particular recursion. It is not immediately obvious that a given recursion is well-founded. ZFC can't actually prove "bla bla recursion equals bla bla big ordinal", even though it (non-constructively) "has" the latter ordinal.

Computable functions I: mu-recursion

When you were first introduced to functions, you were presented the idea of functions as processing machines, that took in output and processed it to give out output.

On a tangential note: even those who are educated in the "maps between sets" or "special subsets of the product set" often still think of functions in a way that reveals an implicit belief in "functions as a machine". E.g. noobs who are dismayed that there is no "closed-form expression" for the inverse of $y = x^x$ but are perfectly satisfied with the inverse of $y = x+7$, children and ancients who are uncomfortable with recurring decimals $0.\dot{3}$, $3.141\dots$. It is irrelevant to mathematics that you were taught an algorism [not sic] to compute "subtraction by 7" but not to compute "the Lambert W function", or that you are more familiar with representation in base 10 than in base 3 or base $\pi$.

This is the wrong way to think of a function -- a machine is not a function, it's an algorithm. There are two "ways" in which this distinction manifests:
  • A function may have multiple algorithms to compute it.
  • A function may have no algorithms to compute it.
The second might sound really unintuitive, but should really be completely expected.

In general, defining a function $\mathbb{N}\to\mathbb{N}$ means storing the value of every single $f(x)$ for each natural number $x$. Yes, some functions can do it without -- by just giving a rule that determine $f(n)$ for any given $n$ (i.e. an algorithm for $f$) -- but it is just obvious that this isn't possible for every function. 

Why?

Because ... because ... if we could, then we could store an infinite amount of data in finite memory!

(This suggests that there are functions we just can't write down or imagine -- exactly what this means will be formalized later, when we have the notion of models and languages.)



To formalize this more clearly, we need a precise notion of what exactly an algorithm is -- what exactly computation is.

Here's an example of a function we can actually write down: $f(x)=x^2$, which represents the function $(0, 1, 4, 9, \dots)$. Given any number $x$, we know what $f(x)$ is despite not having a table containing the values of $f(x)$ for every value of $x$.

(Take a moment to think about how amazing that is.)

How, though? Let's "unfold" the definition of $x^2$.

def square(x):
  return multiply(x, x)

def multiply(x, y):
  out = 0
  for i in range(y):
    out = add(out, x)
  return out

def add(x, y):
  out = x
  for i in range(y):
    out = successor(out)
  return out
Hm, interesting -- we have the notion of some basic functions (co-ordinate functions, constant functions, successor function) and some operators that work on them (successive lines of code, for loops, while loops) and free system generated by this is the set of all computable functions.

What about if conditionals, i.e. piecewise functions? Show that these can be written in terms of for loops. Hint: start with a basic function like $f(0)=0,f(s(n))=1$ and see how you can transform other conditionals into something similar?

More formally, we have the following primitive functions which are computable:

  1. Constant functions -- $c(\mathbf{x}):=c$
  2. Projection functions -- $p_i(\mathbf{x}):=x_i$
  3. Successor function -- $s(x)=x+1$

And the following operators:

  1. Composition, i.e. to successive lines of code.
    Given computable functions
    $$f_1:\mathbb{N}^n\to\mathbb{N}^m$$$$f_2:\mathbb{N}^m\to\mathbb{N}$$
    We have a computable function
    $$f:\mathbb{N}^n\to\mathbb{N}$$ Defined as: $$f:=f_2\circ f_1$$
  2. Bounded recursion, i.e. for loops.
    Given computable functions
    $$\text{“base”}\ \ f_0:\mathbb{N}^n\to\mathbb{N}$$$$\mathrm{“recursor”}\ \ h:\mathbb{N}^n\times\mathbb{N}\times\mathbb{N}\to\mathbb{N}$$
    ($h(\mathbf{x},y,z)$ takes as input iterate $y$ and output from previous step $z$)
    We have a computable function
    $$[h:f_0]:\mathbb{N}^n\times\mathbb{N}\to\mathbb{N}$$
    Defined by recursion:
    $$[h:f_0](\mathbf{x}, 0)=f_0(\mathbf{x})$$
    $$[h:f_0](\mathbf{x}, s(y))=h(\mathbf{x}, y, f(\mathbf{x}, y))$$
  3. Unbounded recursion, i.e. while loops.
    Given a computable function
    $$c:\mathbb{N}^n\to\mathbb{N}\to\mathbb{N}$$
    We have a computable function
    $$\mu(c):\mathbb{N}^n\to\mathbb{N}$$
    Defined as the lowest $y$ for which $\mu(\mathbf{x},y)=0$.

This is known as the structure of general recursive functions, or $\mu$-recursive functions, and is one of the formulations of the notion of a computable function.

We immediately see the existence of non-computable functions, since we see that computable functions are countable -- however, it doesn't really capture the idea of non-computable functions requiring an infinite amount of data to describe. 

An abstracted way to think about computer science -- and of machines and engineering in general -- is as a theory of automation. The mathematical abstraction of the machine is the algorithm, which is intuitively thought of as a set of instructions or computational paradigm. Pondering some examples of real-world machines help us pin down another characteristic of importance to an algorithm: the machines are often not completely autonomous, they need to take inputs, so algorithms can be considered procedures to execute functions.

Historically, the theory of machines was often formulated in terms of simple machines. The Ancient Greeks formulated the theory in terms of the "six simple machines" (the inclined plane, lever, wedge, wheel and axle, pulley, and screw); the Ancient Indians formulated the theory in terms of mechanisms that operate on "pressure, weight and rotation". Note that these were mechanistic descriptions (and not very fundamental ones, due to the lack of a serious theory of mechanics).

In general, these areas of machine theory fall into the general subject of control theory.

If I had to distinguish computer science from engineering/machine theory -- I would say that computer science is interested in the generality of machines. Computers are different from jackhammers, because the architecture of a computer allows the simulation of "any possible algorithm" that the jackhammer -- or any other machine -- could follow.

The computer is therefore fundamentally about control -- any kind of actions you may want to encode in a machine, a computer is literally the best thing you can ask for to do so, because it is the device that can produce any physically-possible-to-encode sequence of actions in the machine.



The Church-Turing thesis should therefore be seen as the fundamental thesis of computer science -- the precise specification of what machines/algorithms can do. It leads to the notion of Turing completeness, i.e. a particular machine actually acting as a universal computer (and a particular such instance is the universal Turing machine).

The Church-Turing thesis is the statement that the Turing machine -- or equivalently, mu-recursion, or lambda calculus -- can compute anything that can be computed at all. That it's the "best we can do", the ultimate limit (for which reason the thesis is sometimes known as the maximality thesis).

(Take a moment to appreciate how this means we've actually achieved optimization here. The computers we use are literally universal -- given enough memory and time. They are truly the pinnacle of human invention, to date.)

Note that this is a physical limitation, one that depends very specifically on the universe we live in. For example, a universe with time travel (the ability to send signals back in time) will be able to trivially solve the halting problem. The Church-Turing thesis is not a mathematical fact; it is a physical one. It can be considered an axiom of computer science, and if one day we do find a system that computes a larger set of computation -- called a hypercomputer -- then we'd find that some significant portion of computer science, such as the very notion of a Turing machine, is not relevant to our universe.

A particular result that will be helpful in convincing us of the thesis is that the mu, lambda and Turing notions of computability are indeed equivalent. This is important, because at least to me, lambda calculus and mu recursion are far more intuitive than the Turing machine; I can really see that mu recursion is what I'm doing when I'm executing an algorithm, and I can really see that lambda calculus is the "natural" sense in which one would try to describe functions if they can be described at all.

We will see this result in the next article.