Based on a render by https://www.flickr.com/photos/51339555@N02/.

Gödel, Escher, Bach - A Practical Guide & Summary

I hope you can find some of my notes on the book useful. I certainly don't pretend that this is a complete summary, nor that I understood everything.

I have gathered a list of useful external ressources:

Introduction. A Musico-Logical Offering
Three-Part Invention
I. The MU-puzzle
Two-Part Invention
II. Meaning and Form in Mathematics
Sonata for unaccompanied Achilles
III. Figure and Ground
Contracrostipunctus
IV. Consistency, Completeness and Geometry
Little Harmonic Labyrinth
V. Recursive Structures and Processes
Canon by Intervallic Augmentation
VI. The Location of Meaning
Chromatic Fantasy, and Feud
VII. The Propositional Calculus
Crab Canon
VIII. Typographical Number Theory
A Mu Offering
IX. Mumon and Gödel
Prelude…
X. Levels of Description, and Computer Systems
… Ant Fugue
XI. Brains and Thoughts
English French German Suite
XII. Minds and Thoughts
Aria with Diverse Variations
XIII. BlooP and FlooP and GlooP
Air on G’s String
XIV. On Formally Undecidable Propositions of TNT and Related Systems
- Incompleteness of TNT
- Consistency of G
- TNT is \omega-incomplete
- Are Supernatural Numbers Real?
- Diophantine Equations
Birthday Cantatatata …
XV. Jumping out of the System
Edifying Thoughts of a Tobacco Smoker
XVI. Self-Ref and Self-Rep
- Typogenetics
- (Real) Genetics
- Isomorphisms
The Magnificrab, Indeed
XVII. Church, Turing, Tarski and Others
- Church’s and Tarski’s Theorem
SHRDLU, Toy of Man’s Designing
Artificial Intelligence: Retrospects
- Problem-Space Reduction
Contrafactus
XIX. Artifical Intelligence: Prospects
Sloth Canon
XX. Strange Loops Or Tangled Hierarchies
- Intuition for Consciousness
- Free Will in a Deterministic World
Six-Part Ricercar
Interesting Books from the Bibliography

Introduction. A Musico-Logical Offering

There are “strange loops” in music, art and mathematics. They are related to paradoxes, the resolving of which is the ultimate goal of mathematicians. The futility of this exercise was proven by Gödel with his incompleteness theorem.

Strange loops:

Music: Bach’s infinitely repeating canon “Canon per tonos”, which starts with C min and modulates the key to d min, which then perfectly ties into the beginning again. If you keep on repeating it, it will eventually end up at C maj, but one entire key higher.
Escher: Illusions like the waterfall which is a closed loop, or the gallery which depicts itself. There is no way to know what is reality, there is always a level below, and a greater reality above. They form a loop.
Math: Epimenides Paradox: “All Cretans are liars”, or the reduced version “This sentence is false” which are also loops. In set theory, how do we construct the set which contains all sets, and does it contain itself? The statement is a meta-statement about itself.
- Gödel’s incompleteness theorem: All consistent axiomatic formulations of number theory include undecidable propositions.

A fugue is like a canon (based on a theme) with a freer form of repetition, which allows countersubjects and more variation. First the theme is announced, then a second, third, etc… voice enters Hofstadter compares improvising a fugue with 6 voices to playing 60 simultaneous blindfolded chess games and winning. I find the comparison somewhat lacking due to the inherent memory requirements that limit such chess exploits. Composing a 6 voice fugue is not harder than a 3 voice one because you need to remember more I think? .

The goal of Russel and Whitehead’s Principia Mathematica was to formulate number theory in a way which would not permit paradoxes. This project, Gödel proved worthless. It feels like, a system powerful enough to make statements about itself, must allow paradoxes, because of self-reference.

In computer science, the example of a strange loop given is that of Charles Babbage and the machine which is “capable of eating it’s own tail” - of modifying it’s program input (the punchcards) or of creating a new program - which it feeds itself A quine would be an example of a machine interacting with it’s own output. .

Discussion notes on music:

Each octave has 12 half-notes, which is mapped onto 7 (8th is the next C) white keys on the piano. Where E, F are actually just a half-tone away. This is due to the historical tendency to write music in CMaj.
A key is a list of relative distances from one note to the next, with a specific note as a starting point. Each key contains 8 notes. The distances between the notes are always the same for all Maj and all the same for all Min keys, just the starting points vary. There are also more obscure keys with only 5 (pentatonic) notes.
Notes harmonise in “terzen” (small = 3, big = 4) and “quinten” (5 steps). These work across keys.

Three-Part Invention

Dialogue between Achilles, the Tortoise and Zeno. Zeno himself shows up in the dialogue, telling the story of how a passer-by convinces Achilles and the Tortoise to race. The dialogue is itself a strange loop. Zeno’s paradox of motion illustrates how “Motion is impossible” - ironic how Zeno complains that nobody wants to listen to his argument, because they’re hurrying about (i.e. moving)!

I. The MU-puzzle

Formal system: composed of axioms (assumed true) and theorems, formed by applying the rules of the formal system to the axioms. A derivation from an axiom to a theorem entirely following the rules is a proof.

A finite decision procedure characterises a system explicitly. With it you can decide the theorem-hood of a proposition.

A requirement of formal systems is that the set of axioms has a decision procedure. Theorems may only be implicitly characterised by the rules. You would never know if it was a valid theorem until you found a derivation. In the case of a system with lengthening and shortening rules this could be an infinite amount of time.

The MIU system described here doesn’t seem to have MU as a theorem - indeed there seems to be an issue that you can only double the number of I’s you have, but you need exactly three of them to make a U.

A human might realise this impossibility after a while of tinkering and then exit the system, reason about it on a meta-level. And yet, a computer working inside a system would never be able to.

A fundamental question in computing and AI seems to be whether machines could ever shift levels fluidly like human intelligence can The current LLM craze splits people between “reasoning believers” and sceptics that argue that LLM’s are not really thinking. Here’s o4-mini solving the MU problem by stepping out of the system and reasoning on a meta-level. .

Two-Part Invention

Dialogue written by Lewis Carroll. The tortoise accepting $A$ , $B$ and $C$ but not $Z$ shows how reasoning outside the system, reasoning about the system leads to infinite regresses if taken inside the system. A meta-statement about logic itself must not actually “work” while inside the narrow system the Tortoise is reasoning in.

The book slowly filling up with statements is an infinite rising canon again.

II. Meaning and Form in Mathematics

This chapter introduces a new formal system, the “pq-system”. It only has lengthening rules, so there must be a decision procedure for it’s theorems.

There is a top-down approach which starts with the string and looks at all possible forebears - recurse. Or you use a bottom-up approach which starts with the axioms, then applies all rules until you have only theorems longer than the string you’re checking for. If that is the case, the string is not a theorem.

There is also the notion of a well-formed string - a string which seems to adhere to all rules of the system regarding typography (uses only allowed characters, etc…), but has not been proven to be a theorem yet.

The system seems to be somehow related to addition with p being mapped to plus, q to equals and each - being a one. This corresponds to a low level isomorphism between the individual characters of the system and mathematical characters. Then there is a high level isomorphism because well formed strings correspond to correct arithmetic operations. “—p—-q——” is “2+3=5”.

Not all isomorphisms are meaningful. In addition, the meanings must always remain passive, as the addition meaning does imply but not prove that ¨-p-p-q—-” is a theorem. There can also be multiple isomorphisms that apply to the same formal system.

We believe that our methods of calculation (for example multiplying piece by piece then adding) are correct. It is just a formal system with an isomorphism to multiplication though, and if we didn’t operate outside the system, it would be impossible to prove it’s 100% corresponding. In the system, we could never try out all possible steps, but with reasoning, even without checking all possibilities we believe something is true.

A proof is just a series of steps linked together- leading from one inescapable conclusion to another - yet they rely on generalisation rules.

Sonata for unaccompanied Achilles

Achilles is on a phone call with the tortoise. Tortoise is looking for the answer to a word puzzle: word with ADAC somewhere. Achilles asks his own puzzle: word beginning and ending with HE. HEADACHE is an answer. It’s also the answer to the tortoise’s puzzle, but Achilles doesn’t realise it.

Achilles voice is the figure, the tortoises parts are the ground to be inferred. Together ground (ADAC) and figure (HE) make up the whole picture.

III. Figure and Ground

The “tq” system is introduced, isomorphic to multiplication instead of addition this time. Then, the “compositeness” of a number is codified in a typographic formal system, with C---- meaning that 4 is composite. The typographic rule is that Cz is a theorem if x-tz-qz is a theorem.

Now, we want a system that has as theorems all primes. We could say that all numbers that are not composite (meaning that Cx is not a theorem) are primes Pz. But that would be illegal, because we are using the “negative space” of the C system instead of typographical rules. In order to correctly encode “primeness” as a system, we have to define a DND (does not divide) system and then use DF (divisor free up to) as an iterator to keep state.

Axiom: xyDNDx and if xDNDy then xDNDyx because if $y \equiv z [x]$ then $y + x \equiv z + x [x]$ and thus $x$ does not divide $y+x$ .

Art: Figure is the foreground, Ground is what is left, the negative space. Sometimes, the figure creates a message/meaning (another figure) in the ground, then it is recursive. Most cursively drawable figures are not recursive (the ground doesn’t produce meaning).

In TNT, the hope is that non-theorems can be expressed as the negative space of all theorems and as altered copies of the theorems (by negation). not x is false if x is true. However, the set of all non-theorems contains some truths, and outside the set of all negated theorems are found some falsehoods. The characterisation by negative space is not complete. You cannot prove all true statements in TNT, thus you can’t characterise all false statements as the inverse of the truths. (Gödel’s theorem).

In the realm of formal systems:

there exist formal systems whose negative space (set of non-theorems) is not the positive space of some other formal system (<=> In some formal systems, there isn’t a decision procedure for the non-theoremhood)
Stated in other words, there exit recursively enumerable sets which are not recursive.
Or another way: There exist formal systems for which there is no typographical decision procedure.

Example: if F is a subset of natural numbers, and G it’s complement (ex: primes), such that union F, G is equal to N, there might be a rule to produce all F but not all G. The issue stems from the non-monotonicity which underlies some formal systems (can shrink and grow) like the MIU system (MIU might not have a decision procedure for non-theorems?)

Not all figures in art are recursive, and not all sets in math are recursive, meaning that their negative also is a figure (definable by some procedure). All numbers in F share some common form/shape, but not all in G might. There is no “regularity” in primes…

What did the Tortoise mean by it’s Figure/Ground hint for the ADAC puzzle.
- See this post. They ask the complimentary parts of their answers in the puzzle
What is the resemblance between the sequence and FIGURE-FIGURE?
- The sequence A plus the sequence B, defined by the amount of space between elements of sequence A gives all natural numbers.
There is also a notion of figure and ground in music, how does it work?
- Two melodies can play at the same time and be harmonised, just offset by a short interval.

Contracrostipunctus

The Crab is looking for the perfect record player, that can play every sound perfectly. The Tortoise is gifting him the “I cannot be played record player X” which destroys the record player because of resonant frequencies. This is repeated a few times until the Crab buys a record player (Record Player Omega) which self-modifies by inspecting the record before playing.

A low fidelity record player would also have defeated the Tortoise’s plans by just not playing the note which destroys it. But that is the dilemma, a perfect player can’t exist, because if it is perfect, it has to destroy itself.

An acrostic is a message hidden in a poem/piece/etc… A contracrostic would be a secret message hidden in the reverse of the message The initials of all lines spell “HOFSTADTERS CONTRACROSTIPUNCTUS ACROSTICALLY BACKWARDS SPELLS ‘J.S. BACH’”, which is an acrostic. “JSBACH” backwards is “HCABSJ” which is the initials of the words in the previous message, also making it a contracrostic. .

Achilles gifts the Tortoise a perfect goblet G, which has “BACH” etched at the bottom. When the Tortoise then plays the Art of the Fugue piece Contrapunctus, the goblet is destroyed.

“Tödelization” is mentioned in the index for this chapter, which refers to “Gödelization” by the Tortoise.

Why is a record player omega impossible, or how could you defeat it? If it could exist, it would prove Gödels theorem false?

IV. Consistency, Completeness and Geometry

We generally see the meaning without seeing the “isomorphism”. In human language we attribute the meaning to the sound of the word itself, instead of correctly recognising that the meaning arises from the isomorphism between it and the concept it describes.

There is an “isomorphism” between the grooves in the record and the airwaves coming out of the record player. There is then the inverse mechanism, where the record player itself shakes due to the airwaves it produced. The implicit meaning of the record turned around and destroyed the player In the case of the goblet, the notes etched onto it “BACH” have an isomorphism with those played by the Tortoise. .

Hofstadter states explicitly: Gödel’s theorem means that for any record player (number theory) there are records which it cannot play because they will cause indirect destruction.

There is a low-level isomorphism (mapping of phonograph terms to number theory) between the record player and Gödel’s theorem.

Euclidean geometry arises from the fifth axiom which states a truth about parallel lines. Mathematicians always tried to prove it’s derivation from the first four axioms but no one succeeded. By instead changing this fifth axiom (negating it, etc…) you arrive at the hyperbolic and elliptical geometries, which are valuable in themselves.

By redefining terms which have common sense meaning, like POINT or LINE, which Euclid mistook for fixed interpretations, we arrive at other consistent interpretations. This goes to show how important strict formalism (M mode thinking, from MIU) is.

Completeness: all statements which are true under the interpretation are also theorems
- Gödel says: Truth transcends theoremhood in any sufficiently powerful formal system. There are some truths which can never be proven (or records which can not be played).
Consistency: Externally consistent: system + interpretation “When every theorem upon interpretation comes out true (in some imaginable world).” It’s a property of the interpretation that was chosen!
- Internally consistent: all theorems come out mutually compatible (no logic errors) but this depends on interpretation
  - There might be some world where a logic error is still consistent. Thus there has to be a core which is always true! Is it only logic, or also math, or none at all? Mathematicians assert that Peano arithmetic might be this core theory.
  - But since Einstein and the relativity of space’s shape itself, we know that no geometry is intrinsic to the world, so why a logic?

Any consistent but incomplete formal system is not powerful enough, and only an injective mapping to the interpretation. You could remedy this by adding new rules or by tightening up interpretation (less powerful).

If we have a record and a record player, but playing them together introduces vibrations that distort the sound, we then test them independently and they both come out fine, their interaction (isomorphisms) is what breaks them.

Little Harmonic Labyrinth

The title of the dialogue comes from a Bach piece which somewhat mirrors the nested structure of the story with melodic arrangements.

Achilles and Tortoise are abducted in the frame story. They then descend one level by reading a recursive story about themselves in a book (push). The Tortoise also mentions push-potion and pop-tonic, which can be used to push into an Escher painting and pop back out. Of course the effects can be nested, by pushing into yet another painting inside the painting.

Inside one of the levels, the encounter a genie, which cannot grant meta-wishes, but by asking GOD (”GOD over Djinn”) he manages to allow a single typeless wish. He does this by asking the meta-genie, which asks the meta-meta-genie, up until the top. Achilles breaks the simulation by wishing that his wish isn’t granted.

The story ends before coming back to the the frame story, the same as the key in Bach’s Little Harmonic Labyrinth.

What is the Schönberg factory stopping it’s tonic production and making cereal instead a reference to?
- Schönberg is a famous composer of serial (12 tone) music. He stopped making “tonic music”.

V. Recursive Structures and Processes

A general notion of recursion (with a push/pop-stack, which comes from canteen tablets being stacked) is introduced. A recursive definition of the Fibonacci numbers is given, then also in tree-form.

Hofstadter talks about his solid-state physics thesis and the recursive nature of energy levels in crystals. At the end of the chapter the minimax algorithm is introduced. The comment about humans being better than computers at evaluating positions is moot today…

The concept of recursion in music, with theme switches and then returning back to the original tonic is mentioned.

Bare particles (without any interaction being considered) combined with their recursive interactions through virtual particles form the complex interactions that make up quantum physics. We actually only estimate what happens at the subatomic level by reasoning about the most common interactions. An electron may absorb and re-emit multiple photons, even stacking the effects while travelling. A photon may decay into an electron-positron pair for a split second before they annihilate each other. These effects can cumulate, as virtual particles themselves can also interact. Matter itself seems to be somewhat recursive to define.

Hofstadter introduces a series of functions labelled $G$ , $H$ , $F$ & $M$ and Q. These all define a tree by returning the parent node for any node n entered as an input. For $n = 0$ , the root node is $0$ . Node $n = 1$ has $0$ as it’s parent node as $G(1) = 0$ . By introducing recursive calls into the function definitions, the resulting trees have nice repeating properties.

He leaves a puzzle for the “curious reader”: what would a recursive formulation of the mirrored version of the $G$ tree look like The answer is pretty far removed from the simple formula of $G$ and has only been proven in 2015. .

Canon by Intervallic Augmentation

Tortoise and Achilles are at a Chinese restaurant. Achilles eats half of the fortune cookies paper, and the Tortoise and him read two different messages, from the same letters on the paper: “ONE WAR TWO EAR EWE” and “O NEW AT WOE ARE WE”.

The Crab has purchased a giant jukebox, which instead of switching the records, switches the record players which play the same record, but it always comes out differently. The players are called B-1, A-3 (which multiplies intervals by $3 \frac{1}{3}$ ) and B-10 which multiplies them by $10$ How would this be physically possible? The needle being stiffer or less stiff would only shift the frequencies. .

Also Hofstadter really hates John Cage.

VI. The Location of Meaning

A message has three layers:

The frame message: “I am a message, decode me if you can!”
The outer message has no direct translation, to understand it is to build or to know how to build the decoding mechanism. It is the same as the decoding mechanism. The outer message is not in any language, it is always the burden of the receiver to interpret it correctly. A text in Japanese might be recognisable as such by the shape of the characters. Adding a literal text saying “I am in Japanese” is of no help to the decoder, as they would have to decode the outer message before understanding that text. In the same way, adding translations is no use either as they have their own outer message.
To understand the inner message is to extract the meaning intended by the sender.

There are two possibilities for the location of meaning:

It could be in the message itself, an intrinsic property. This would mean that any intelligence that decodes the message arrives at the same inner message.
It could depend on the context in which it is interpreted.

A record player is a strict decoder, it always extracts the same sounds from the record’s grooves (no information is added during decoding). You can also correlate a specific groove to a specific sound in the played piece.

The decoding of the inner message of DNA to the phenotype however is highly dependant on the chemical mechanisms, the cell’s inner workings, etc… From the molecule alone no human will emerge. But the DNA is a trigger for the mechanisms. Does the DNA then still contain all the information intrinsically or is it during the process of extraction that it gets added?

Thus a record seems to contain intrinsic meaning, while DNA does not and only acquires meaning through the isomorphism to the phenotype. When is the interpretation, the decoding, the actual source of the meaning, and when is it intrinsic then?

The Rosetta stone required immense labour to be deciphered, but a different team of linguists would still have arrived at the same translations, thus decoding work does not immediately mean that the message is not intrinsic.
A record with random music received by aliens would puzzle them, it would have the right frame and outer message, but the inner one would be garbage (John Cage random music for example). Here the message clearly still depends on the societal context.

When thinking about the intended meaning, we have to assume an intelligent species, otherwise the message would of course always be context dependant. The fact that all our brains respond to patterns in language the same way - we are hard-wired to understand and interpret messages - is in favour of some messages carrying intrinsic meaning. Babies don’t care about their nationality, they learn the language spoken with them.

Schrödinger predicted DNA as “aperiodic crystals” which encode life before it was discovered. A lot of messages could be described in this way. Thus recognising that something is a message could be related to measuring it’s complexity or entropy. A record player is distinct from a cloud of hydrogen because it’s unlikely to occur naturally. Could Kolmogorov complexity be a proxy for measuring the messageness of information?

Some languages / brains might not even have the capability of representing / understanding concepts from other languages See the adjective-only language from Tlön, Ukbar and Orbis Tertius by Jorge Luis Borgès. Or the different blues that can be named in Greek but not other sea-faring civilisations. . We might be surrounded by messages we can never decode because we aren’t able to understand the frame or outer message.
When chatting with an LLM, we read and perceive the inner meaning of their messages. But can these truly have inner meaning, as the receiver didn’t “intend anything”? Is a message from an LLM automatically only meaningful because of our interpretation or did we prove that our language is special / contains intrinsic meaning because we managed to train “another type of intelligence” on it?
Are animals hard-wired for messages the same way as our brains are? Do they only understand sounds / “language” or could dolphins see that script / text carries information? Is the neuron-structure of brains intrinsically “structured-message” oriented?
If some isomorphisms of a message are easier to understand than others DNA strands contain information about the context in which they would be used thanks to chemical properties, compared to the ATCG coding which is textual only. , could there be a message that is entirely self descriptive? In the sense that it encodes the messages and the decoding mechanism itself?

Chromatic Fantasy, and Feud

The dialogue is similar to the Two-Part Invention by Carroll. The Tortoise again plays the role of the “in-system” player that only accepts what is given and doesn’t obey any meta-logic. The point of discussion is what makes a contradiction.

VII. The Propositional Calculus

Basic propositional logic is introduced in the form of a typographic formal system. In a dialogue between Prudence and Imprudence it is made clear that the “correctness” All theorems come out true (completeness) and $x$ and $\sim x$ can never be theorems (consistency). of the system cannot be proved, but has to be taken on faith alone This is what philosophers refer to as the problem of justification. .

Starting with a contradiction inside the system like $< P \& \sim P >$ you can prove anything, it immediately ”crashes” the system. This is unlike our minds, which can operate on a meta-level and learn things from a contradiction, then modify the erroneous parts of the system.

There is a temptation to introduce derived rules (metatheorems) into the formal system to speed up the derivations. This is dangerous as reasoning errors can be induced when not clearly distinguishing between I-Mode thinking and M-Mode thinking.

The propositional calculus can be used for derivations, instead of proofs, where every step is very simple and thus the conclusion can be defended.

Crab Canon

This dialogue is another example of a text using the same structure it is describing. The story proceeds up until the Crab shows up and then unwinds again, in the reverse order (like a crab canon).

VIII. Typographical Number Theory

The TNT (Typographical Number Theory) is introduced. It’s a typographical system which builds upon the propositional calculus and is able to express theorems about number theory. Numbers are encoded in “successor form”, zero being $O$ and $1$ being $SO$ , etc…

The five axioms of the system are the Peano postulates. The fifth postulate, that lays out the principle of induction cannot be expressed inside the original system, without adding a way to encode infinite sequences of derivations. This means that the system is $\omega$ -incomplete The interpretations of the symbols is not fully pinned down, the same as Euclid’s geometry hadn’t exactly pinned down the meaning of POINT and LINE, without adding the last axiom. . Thus Hofstadter adds the last rule (which cannot be derived), which allows induction inside the system But the system would still be perfectly legitimate. .

A typographical decision procedure for deciding theoremhood in TNT would amount to the holy grail of number theorists if the system was complete. This is not possible however.

Contrary to propositional calculus which was above reproach due to it’s simplicity, we can be sceptic of why TNT should be consistent. A proof using a simpler version of TNT would be sufficient to convince us, but as Gödel showed, a system that is strong enough to prove TNT’s consistency is at least as strong as TNT itself, thus we would have to prove it first and circularity makes this impossible.

Solve the exercise about encoding powers of 2 and 10 into the system.

A Mu Offering

Achilles gives a lecture on Zen to the Tortoise. He tells a number of kōans, then tells the Tortoise, that there is actually a way to encode a kōan into a string, which can then be tested for “Buddha-Nature” (which is truth).

You first take the kōan and transcribe it to a selection of four shapes (which look very similar to molecules) and then fold a long string, coated with ribo “Ribo-” is probably a reference to ribonucleic acid, the prefix of which stems from “ribose” (an acid), whose name is derived from the english “arabinose”, which is called thus because “arabinose” is a sugar formed from gum arabic. We’ve now come full circle, as gum arabic is an important component of, you guessed it, glue. a sort of glue, into a specific shape according to the transcription and the “Geometric Code”. The result is a string, which encodes the kōan. There is an isomorphism from kōan to string, going in the reverse direction is possible, but feels wrong to Achilles As we’ve seen from the “pq-system”, you aren’t supposed to use the properties of the isomorphism to create new theorems, as the mapping isn’t guaranteed to be reversible (bijective). .

A discussion on the completeness and consistency of the system follows, including the revelation that any string can be made into the opposite, by attaching knots at the end. Two knots cancel each other out.

The Tortoise tries out the string-to-kōan mapping by creating a randomly folded string. It turns out that his string encoded the long lost kōan that tells the origin of this string binding, it’s invention by the Grand Tortue. The kōan contains the instructions to create itself though (it’s a quine), with the exception that it’s negated (by a knot at the end).

Thus, if the Tortoise’s string is genuine and has buddha-nature (is true) and it has instructions to make another true string, which is it’s opposite, a paradox emerges.

IX. Mumon and Gödel

The first part of the chapter contains a number of new kōans and a discussion on the meaning of Zen. To be enlightened, is to reject dualism, which is difficult since human perception is by nature a dualistic phenomenon. Then the kōans are compared to Escher’s paintings.

In the second half of the chapter, Gödel numbering is introduced. By mapping the symbols of a formal sytem to numbers, we can encode any statement of that formal-system into a number. Hofstadter propes numbering rules for the MIU-system and rules for TNT itself Picking up the genetic metaphor from the previous chapter, Hofstadter refers to three numbers representing a symbol of TNT in his Gödel numbering system as a codon, which is what we call a combination of 3 adjacent nucleotides in DNA. . Additionally, any typographical rules can be transformed into arithmetical rules according to the central proposition. We can thus use number theory to operate on any formal system (using TNT for example).

We can go one level deeper and create a Gödel numbering for TNT itself, thus expressing TNT statements as numbers, which TNT can operate on. Any formalisation of number theory contains it’s own metalanguage within itself, through this Gödel numbering. A string of TNT has an interpretation in number theory and a statement of number theory may have a second meaning as a statement about TNT itself, by using the Gödel numbering’s mapping. A TNT-theorem can be represented as a number, which has an isomorphism to a meta-TNT theorem.

We now want to create a string of TNT $G$ , which expresses a TNT statement about itself: “ $G$ is not a theorem of TNT”. $G$ should be a number, which upon interpretation as a TNT string, is a TNT theorem saying that $G$ is not a theorem of TNT.

This is essentially Epimenideses “Cretan liar” paradox translated into number theory. If G is a theorem, $G$ would have to be a non-theorem, thus making TNT inconsistent. If we reject this, we’d be forced to conclude that $G$ is not a theorem. But now $G$ expresses a truth (namely that $G$ is not a theorem), thus making TNT incomplete.

How could G include itself, if it isn’t infinitely long? G would have to contain itself, and the numbers to create the proposition that “G is not a theorem”…

Prelude…

The Tortoise, Achilles and the Crab are at the Anteaters house. The Tortoise gives two records as a gift to the Crab, they are original recordings of Bach playing his Well Tempered Piano on Hapsichord. The Tortoise was able to reconstruct the sounds through acoustico-retrieval: by working backwards from the air molecules in the atmosphere. He was able to do this by solving the mystery around Fermat’s last theorem.

He both discovered a counterexample (the only number not to be in the digits of $\pi$ Because $\pi$ is irrational, theoretically all numbers should be in it somewhere. ) and proved the theorem (the proof depended on a counter-example to the theorem, recursively…), which seems absurd, except if you assume that the only solution is an impossible number.

While listening to the fugue, Achilles - a novice regarding fugues by his own admission - asks if the others have difficulty concentrating on both a specific voice and the entire composition at once too Hofstadter draws a parallel to the Escher lithograph Cube with magic ribbons, one of his best works. . They also do, and conclude that you can’t be in the two “modes” at the same time.

What does the [ATTACCA] at the end of the chapter mean?
How could the Tortoise prove Fermat’s last theorem by disproving it? Is it really because the solution he found is a number which doesn’t exist?

X. Levels of Description, and Computer Systems

We can understand genetics, fugues and the Gödel string $G$ from last chapter on different levels, but not at the same time and without disconnecting them from each other. Where it gets difficult however is when the different levels are similar and use similar words and terms, like in computer science or psychology. Then we can’t disconnect them as easily any more and get confused Hofstadter gives the example of less technically inclined people confusing computer and program. His friends didn’t understand that PARRY the program (similar to ELIZA) they were speaking to in the terminal isn’t the computer. When Hofstadter entered a command that went straight to the OS, how come PARRY didn’t notice…

Hofstadter explains that chess masters are better than computer systems This isn’t true anymore, obviously. and novices, simply by thinking at a different level. They see clusters of positions It’s interesting that a major improvement in Stockfish came in the form of the NNUE, which is basically pattern recognition on a higher level in some ways. instead of individual pieces. Where novices don’t consider illegal moves the way someone who doesn’t play chess might, masters don’t even think about bad moves.

The chapter goes on to explain basic computer science. A word in memory might be a value, a string, an int or even an instruction. Microcode, machine code, Assembly, compilers Hofstadter goes on the interesting tangent of explaining compiler bootstrapping here. and interpreters all operate on different levels of each other and build up on each other, translating between layers But at some point, this process bottoms out at the hardware level. We don’t have infinite flexibility, we are limited even as programmers. In the same way, you can’t will yourself to be faster or smarter. . Yet we have the same tools to describe and talk about them, which can confuse the programmer.

Hofstadter goes on a tangent about an even higher level of programming language, which understands the programmer and can use his intentions to correct mistakes. A simple spelling error shouldn’t cause the compiler to go haywire Without this, there would be much harder to debug mistakes that appear, due to the compiler misinterpreting where the colon should go, for example. He seems to have dreamt of LLM’s, yet he doesn’t seem to like them. And here he is, telling us he was wrong about everything because the thinks LLM’s are really thinking and there’s nothing special about the human mind. .

It is only when something goes wrong that we notice the leakage in these abstractions. An airline passager notices the complex system only when his baggage gets lost, a programmer considers machine code only when debugging a nasty issue.

Yet there are abstractions which can be perfectly sealed-off, chunked. A volume of gases can be described using pressure and temperature, yet they don’t mean anything if we are talking about the large number of atoms on the lower level.

This is the first type of system, where noise gets cancelled out.
The second type of system behaves in a snowball like way, the same way weather or a pinball machine does.

Computers are both, since electricity is in the first category, programs in the second category (a single bit-flip might change everything).

Epiphenomena are emergent properties of systems, which aren’t obviously connected to the lower levels. In Go the “two eyes live” rule is an epiphenomenon for example.

The main question of part two of GEB according to Hofstadter is: Can the brain’s top level, the mind be understood like a program without reading machine code? Can we understand our mind without needing to know about the neurons? Is consciousness an epiphenomenon?

The rant about “Computers can only do what you tell them to do” seems misdirected, because you they can only calculate the digits of pi if you know how to. You could also perform these operations, by hand.

… Ant Fugue

The Anteater describes the nature of anthills and their relation to the ants making them up to Achilles, the Tortoise and the Crab. There are parallels being drawn between the anthill and the brain.

Even though the anthill is fully made up of ants (reductionist view), it can be fully described on a higher level (holism). When Dr. Anteater talks with “Aunt Hillary” - the anthill - by drawing lines / ants making paths, the individual ants have no knowledge of this communication (not conscious). In fact, they write “eat some juicy ants”, fully ignorant that they just killed themselves.

On the ant level, there are castes (types of ants), which are distributed all over the hill. Then there are ants travelling in packs and interacting with each other through rubbing, pheromones, etc… But on a higher level, there are “signals” Groups of ants, some of which may split off and others may join the group. travelling through the hill and interacting with the caste distribution. Then you have active “symbols”, which represent some sort of state and can interact with each other. These symbols are made up of groups of groups of ants (signals).

This sort of ever changing “state” / distribution of the hill is the hill itself. When “Johant Sebestiant Fermant” (the anthill) was decomposed by a great flood (died), the same ants rebuilt a hill (Aunt Hillary) but it had a completely different state / was a different person.

XI. Brains and Thoughts

In the brain, the ants are neurons and signals are chains of neurons firing together. They then make up symbols, which are very flexible Hofstadter uses the term intensional. , meaning they can merge, split up or be invented. The question is whether symbols can be mapped to specific neurons or not?

There is conflicting evidence, either memories are coded locally but over and over again in different areas of the cortex, or they are a dynamic process triggered by some neurons Is there a grandmother symbol mapped to a collection of neurons somewhere that gets activated when you see your grandmother? Or is it the pattern of activations all over the brain that represents your grandmother? . These symbols don’t act in isolation, they trigger others and exist in the context of the brain (A symbol could be fully defined by the way it triggers others.).

There are classes and instances (newspaper vs. the edition of a specific day), but each of them can also be an instance or a class themselves. The newspaper might be an instance of a paper publication, while also being a class. Hofstadter concludes that there must be “instance symbols” and “class symbols” that coincide for the same concept.

When two or more symbols act together, they could be said to be a new symbol itself. Thus the problem of counting or delimiting the symbols becomes difficult. If symbols don’t overlap on the neural level, then there would be a finite number of symbols we could have. If they do, much more complex combinations (possibly an infinity of them) could be represented.

The difference between these theories, is that one assigns a hardware mapping to symbols, the other a mix of hardware and software. If we can explain the high-level symbol activations without recurring to the neurons, then we could create intelligence, independently of the low-level hardware used to “implement” it.

There exists declarative knowledge in a program (strings, numbers, etc…) and in your brain as facts. There is also procedural knowledge (which could be seen as an epiphenomenon), which can only be seen from outside, by the programmer for example For example, most people have procedural knowledge of the grammar of their language, and at the same time a much weaker declarative knowledge of it. . When asked “How many chairs are in my living room?”, you will imagine your living room and count them in your head. On the other hand, “How many people live in Chicago?” will be stored as a fact somewhere in your head This knowledge about where to get knowledge - also seems to be stored procedurally. You aren’t aware of how you get it, the same way you aren’t aware of how your brain concocts mental imagery for example. .

English French German Suite

Jabberwocky, by Lewis Carroll is printed, side by side with German and French translations.

XII. Minds and Thoughts

The question of lifting intelligence from a brain, or anthill onto another system is picked up again, but in the form of an isomorphism between two brains. How does our intelligence differ?

There is obviously not a perfect isomorphism between my brain and yours, neither neural nor symbolic between two people, otherwise they’d think exactly alike. In the same way, you clearly aren’t isomorphic to yourself five minutes ago.

But clearly there is some form of isomorphism, at least between core symbols, as we feel similar emotions We can understand poems written hundreds of years ago and feel the same love or sadness. .

Hofstadter gives us an analogy to his “partial isomorphisms” in the form of a geographical though experiment. You are asked to create a perfect map of the USA, but only from memory. You are given a blank topographical map to start.

Your ASU (Alternative Structure of the Union), will be identical to that of another person in many points, but the farther from your hometown / habitual places, the less precise it will be. But still, most people should share big landmarks and cities (which would correspond to “core symbols”).

Therefore, you could navigate between two big cities on yours and another persons ASU, but you wouldn’t take the exact same route. In the same way, core symbols are related in different people, but not by the exact same thoughts / routes.

Now instead of brains, let’s look at computers. In the same way we might all think alike, two completely different systems (let’s say RISC and x86) can get the same results, while executing two distinct machine languages. To identify if two programs are identical then, you’d have to form a sort of map from the memory dump, carefully deconstructing them. From them, you could lift a sort of conceptual skeleton. Could the brain be mapped for it’s symbols and their relations in the same way?

Even though people’s neural pathways don’t directly correspond, they could be analysed to extract this skeleton. This must be possible, since we ourselves can construct this type of symbol-chunked map of our own brain at any time, by reflecting on our own thoughts It would be a hugely complex skeleton, in conditional form, since so much of our thinking is dependant on external factors. …

Hofstadter then goes on to describe consciousness in terms of symbols, suggesting that there is a symbol for self, which is part of a subsystem which emulates our own brain and is aware of other symbols activations. Awareness would then be the monitoring of the brain by itself. There are other types of subsystems, to emulate a friend’s thoughts for example These subsystems would share a large number of our own symbols, operating simultaneously. .

Hofstadter then quotes an extract of J. R. Lucas’s Minds, Machines and Gödel. A conscious mind is aware of itself, yet it cannot be clearly divided. A machine on the other hand can be made aware of it’s perfomance, but it cannot take that into account without becoming a different machine. A machine so complex we could not predict it’s behaviour anymore, thus removing its “machineness” would be a conscious being of it’s own.

Hofstadter suggests that the map should contain an infinity of symbols, because there could always be new potential pathways connecting the current symbols. I think this stretches his analogy quite far, as he can’t be suggesting that everybody knows everything currently, he just has to connect the right paths. Neuroplasticity and new outside information has to count for something, otherwise Descartes and his tabula rasa would have dominated the empiricists.
I take issue with the subsystem description of the self as explained by Hofstadter. How is the self-symbol special as to be aware of other symbols. Awareness feels like there’s a little gremlin sitting in your brain, controlling and observing the machine running, taking charge of the input and output. How else would controlling your own thoughts work?
Lucas’s claims that a self-modifying and reflecting machine would defeat itself, because it would become a different machine by modifying itself - thus making such a machine impossible. However, this process doesn’t seem so different from the mind, which by neuroplasticity certainly doesn’t resemble it’s former self?

Aria with Diverse Variations

The Tortoise is visiting a sleepless Achilles, who begs him to be entertained with number-theoretical stories. He tells the story of the Goldberg Variations and the Goldbach conjecture.

Any number that is the sum of two odd primes can be said to have the “Goldberg property”. Achilles and the Tortoise then make up another property, the “Tortoise property”, which any number that can be written as the difference of two primes possesses (the “Achilles property” represents the opposite).

What is interesting is that you can be sure whether a number has the “Goldberg property”, because you only have to test all primes between $2$ and $N$ . Thus the decision procedure is guaranteed to terminate. It is also predictably terminating, you can set an upper bound for the numbers to search. For the “Tortoise property”, it is currently not so, as you’d have to search an infinite number of primes, but you might discover such a procedure.

Achilles is upset that as such, a number having the “Achilles property” would embed an infinite amount of information, of coincidence, into that statement. It would mean that an infinite amount of primes don’t have the number as their difference It would be a property of the number system as a whole, not really just of the number. . The Tortoise reminds him that the primeness of a number is also actually a collection of an infinity of statements (no other number divides it), but Achilles has no problem with calling a number a prime number. Still Achilles rejects the notion that an arithmetical fact caused by an infinity of “coincidences” would not be provable with a finite proof.

In another effort to convince him, the Tortoise mentions unpredictably terminating tests, as an example it presents the Collatz Conjecture. All numbers will terminate with 1, you can just never be sure after how man steps.

Hofstadter then makes the dialogue an unpredictably terminating test. He lets the Tortoise and Achilles discuss that books should have filler pages at the end as to make the reader less aware that the book ends soon. They then discuss cleverly disguising this fact by filling the pages with some filler part of the story. But at the point where the story ends, there should be some obvious mark so an astute reader immediately understood that it had ended (he mentions extraneous characters or inconsistent events) The story thus ends twice, once with the beginning of Achilles and the Tortoises discussion about books, and at the end of the chapter with the appearance of a cop that arrests Achilles. .

XIII. BlooP and FlooP and GlooP

A sufficiently powerful system is a system which can represent (all true instances of the predicate are theorems and all false instances are nontheorems) all primitive recursive truths Gödels theorem acts similar to Gantō’s Axe here. A sufficiently powerful system is incomplete because Gödels theorem, yet a system that is not powerful enough is incomplete exactly because it’s not powerful enough. Either way, chop chop. .

Hofstadter then introduces BlooP (Bounded looP), a programming language which only supports bounded loops (for loops). It thus can only represent primitive recursive truths with it’s functions. It’s powerful enough to be able to write a test for primeness for example, but not the “Tortoise property”, as that would require an infinite loop.

TNT is indeed complete with regards to primitive recursive truths, meaning that if a BlooP test can be written for a property of natural numbers, then that property is represented in TNT.

Now Hofstadter proves that not all properties can be tested for in predicable time, thus meaning that there’s some “jumbliness” in the system of natural numbers, which would very much disappoint Achilles. This proves that not all properties can be encoded in BlooP. He does this by using Cantor’s diagonal argument:

Take the pool of all BlooP programs, which are only a chain of functions, the last (main) of which takes one input: FUNCTION [N].
Order them by length and then alphabetically, to assign a number to each. We can refer to them by Blueprogram {#k} [N].
Now, we define Bluediag [N] = 1 + Blueprogram {# N} N. Bluediag is special, since it’s not defined anywhere in the pool of all BlooP programs, since if it was, Bluediag [X] = Blueprogram {# X} [X] while also being equal to 1 + Blueprogram {# X} [X]. Contradiction…

Thus we have a program unrepresentable in BlooP and thus not a primitive recursive function. We might know how long each Blueprogram takes to run, but we cannot predict how long the execution of Bluediag will take, since we can’t infer a general rule for all programs.

To remedy this, we create FlooP, which now is allowed to use unbounded loops FlooP is now Turing Complete. . We can now represent the Collatz conjecture from earlier, and the “Tortoise property”, for example.

Since we can’t know if a FlooP program Also called general recursive if it terminates, partial recursive if it doesnt. terminates (the halting problem), Hofstadter suggests to split them into two categories: “terminators” and “nonterminators”. By creating a BlooP program which takes in a Gödelized form of FlooP programs and decides if it halts or not The same way we tested if a kōan hat Buddha-nature through the string folding test. We might hypothesise that such a program exists, since the information about halting might lie “closer to the surface” in Gödel numbering form for example. , we might be able to do this Turing shows this is impossible using a diagonal argument. He feeds the termination tester it’s own code. It’s not that simple, since you need to quote the entire program in the program, which leads to infinite regress. . We are sure to get an answer, since the program is in BlooP, not FlooP.

By the same argument as before, we take the pool of all FlooP programs, but this time we filter them to only include terminating programs using our tester. Now we can construct Reddiag [N] = 1 + Redprogram {#N} [N] which halts for all inputs $N$ . But thanks to the diagonal argument from before, we can assert that Reddiag is not in the pool of of terminating programs. Thus we have a paradox. A human can calculate Reddiag in finite time, but no computer can.

We could now try the same with GlooP, giving our language even more freedom to also be able to represent these programs. But the Church-Turing Thesis asserts that no “freer” or more powerful language exists. Thus we have to assume that the termination test does not exist (as long as we believe the CT-Thesis).

Air on G’s String

Achilles received a phone call from someone screaming “Yields falsehood when preceded by it’s quotation! Yields falsehood when preceded by it’s quotation!”.

You have to distinguish between mentioning and using a concept. You can quote something, speaking about it, or use it. “HAS THREE WORDS” has three words, would be an example of using and mentioning the same sentence.

Thanks to this distinction, you can have a sentence talking about itself, the same way “Yields falsehood when preceded by it’s quotation.” does. Hofstadter calls this “quining” a sentence after the philosopher and logician. It refers to itself now, asserting something about itself.

During this conversation, the Tortoise and Achilles completed a tour of a porridge factory. They enter a courtyard and then walk up the staircase inside the tower in the courtyard. Achilles walks up the flights of stairs, while the Tortoise walks them down (upside down) from the other side, because it was getting tired A and T are a pair on a spiraling staircase, which is another reference to DNA. . At the top of the tower, they emerge on the level of the courtyard again. They land at the same level they began at, just like a “quined” sentence.

The title of this chapter is an adaptation of the title of Gödel’s original 1931 paper.

Incompleteness of TNT

We now apply the “quine” technique of self-reference to TNT, in order to construct the string $G$ , Gödel’s Number.

We introduce the notion of $m$ and $n$ being a “proof-pair”. Two numbers constitute such a pair if $m$ is the gödel number representing the derivation of the statement $n$ is the gödel number of. This test for “proof-pairedness” is quite different from the test of theoremhood, since we don’t actually have to search for a proof, just check if $m$ is one.

The property of being a proof-pair is a primitive recursive property, and can be tested for with a BlooP program You only need to check if each line of $m$ is arithmetically related to the next and if the last line is equal to $n$ . This must be possible due to the typographical nature of TNT. .
The property of forming a proof-pair is testable in BlooP, therefore it is represented in TNT by some formula having two free variables.

By substituting in a variable for $m$ , we can cleverly create a check for theoremhood now: $\exists{a} : \text{TNT-PROOF-PAIR}\{a, a'\}$ . This means that there exists some derivation $a$ which proves $a’$ .

To construct our quine, we need another property. It should represent “replacing all free variables of a statement by a specific numeral”. We’ll call it $\text{SUB}\{a, a', a''\}$ , where $a$ is the string to be modified, $a'$ is the string to be put in place of all free variables and $a''$ is the resulting string . The arithmetical (TNT) version of quining - Hofstadter calls it ”arithmoquining” - is to replace all free variables in $a$ by itself (it’s own gödel number). Thus $\text{SUB}\{a'', a'', a'\} = \text{ARITHMOQUINE}\{a'', a'\}$ . This means: Replacing all free variables in $a''$ by $a''$ itself yields $a'$ .

To construct $G$ , we only need to put both of these together. The following formula is $G$ ’s uncle $u$ .

$\sim\exists{a} : \exists{a'} < \text{TNT-PROOF-PAIR}\{a, a'\} \land \text{ARITHMOQUINE}\{a'', a'\}>$

Now, to create $G$ , we substitute $u$ for all occurrences of the free variable $a’’$ (we arithmoquine $u$ itself).

Translating the sentence $G$ into english, we get for the first part: $\sim\exists{a} : \exists{a'} \text{TNT-PROOF-PAIR}\{a, a'\}$ that there doesn’t exist a derivation for $a’$ , which is equivalent to saying that $a’$ is not a theorem of TNT.

The second part is the one that leads to contradiction, since it says that $a’$ is the “arithmoquined” version of $u$ . But that arithmoquined version is that string $G$ itself . Thus $a' = G$ says that it has no derivation.

We now have successfully brewed a string $G$ that says that: “ $G$ is not a theorem of TNT”. If we take $G$ as true, that would mean it is not true, ergo, contradiction. If we say, well then $G$ is not a theorem, which is acceptable since it doesn’t lead to contradiction, that makes $G$ a theorem (as $G$ literally says that it’s not itself a theorem). Thus there exists a truth which is not a theorem.

We have blown a massive hole in TNT by proving it’s incomplete.

Consistency of G

By virtue of $G$ ’s interpretation being true (” $G$ is not a theorem”), the interpretation of the negation $\sim G$ ( $G$ is a theorem) is false. Since we can’t derive falsehoods in TNT, $\sim G$ is not a theorem either. Thus $<G \lor \sim G>$ is a theorem To repeat, neither $G$ not it’s negations are theorems, yet the system which we used to prove this says that one of them has to be true. . And using this, as seen before, we can prove anything we want in the propositional calculus.

TNT being inconsistent would mean that any well-formed string would be a theorem, provable by starting from the contradiction $<G \lor \sim G>$ . Thus to show that TNT is consistent, we just have to prove that a single sentence of TNT is a non-theorem: “The formula $\sim 0 = 0$ is not a theorem of TNT”.

It can be shown that as long as TNT is consistent, this oath of consistency is not a theorem. Thus TNT can only be proven to be consistent, if TNT is inconsistent.

TNT is $\omega$ -incomplete

TNT has the type of incompleteness we called $\omega$ -incompleteness in chapter VIII This means that there’s some infinite pyramidal family of strings which are all theorems, but the general case cannot be proven. .

One such family is simple the tree consisting of the list of all sentences saying $SS0$ is not a derivation of $G$ , $SSS0$ is not a derivation of $G$ , and so on and so forth... All of these sentences are true, yet we cannot prove in general that “ $G$ is not a theorem” / that there’s a gödel number for $G$ ’s derivation.

Just as we did before and as with euclidian and non-euclidian geometry, $\omega$ -incompleteness can be resolved by adding a new, additional axiom, which can’t be proven solely in terms of the others. In the case of TNT, that would be that $G$ is an axiom (or $\sim G$ ).

We could call the family of derivations of $G$ the “supernatural numbers” This would work just like we got the complex numbers by allowing $i^2 = -1$ to be an axiom for example, or negative numbers by saying that $\exists{a} : S0/a + S0 = 0$ . By naming them “supernatural numbers”, we would only be continuing the tradition of naming new concepts like they are repulsive: see irrational numbers and imaginary numbers. . They are larger than normal natural numbers, more infinite than infinite. They would still obey our previous TNT rules, and be manipulable the same as normal numbers.

These infinitely large supernatural numbers, let’s call them $I$ , are thus just infinitely long derivations of $G$ . Just as both $-i$ and $i$ satisfy $i^2 = -1$ , there are many $I$ ’s, but they are all represented by the same character.

Are Supernatural Numbers Real?

The question is then, should we add $G$ or $\sim G$ as a new axiom? Which of these number theories is more “real”?

We could either say that both are “true” and that there is no answer, the same way we have euclidian and non-euclidian geometry, or physicists using different spaces (Hilbert or reciprocal or phase space). It is important to choose the correct formal system for your work, and both axioms have merit.

But on the other hand, mathematicians use number theory to talk about formal systems (which includes the natural numbers), thus showing that they believe that these natural numbers are quite real, using them as a base.

Most believe that there is a formalisation of number theory that does not force you into believing in supernatural numbers. But they may be wrong, there might just be one single “perfect” number theory in which supernaturals are necessarily included.

There’s no proof that supernaturals don’t naturally arise in all formulations of number theory we can think of. And until we find one we can’t be sure.

Diophantine Equations

You can find a diophantine equation which is equivalent to G in any sufficiently powerful formal number theory. By interpreting this equation’s gödel number, it asserts that it has no solution. If you found one, you could construct a proof from it that shows that the equation has no solutions.

This is what the tortoise used to prove Fermat’s last theorem (which uses a diophantine equation) in the Prelude….

Birthday Cantatatata …

It’s Achilles birthday, which the Tortoise deduces from the big button on his shirt saying so.

The Tortoise then asks if, from the previous statements, he would be correct in deducing that it’s Achilles birthday. He replies yes, which then prompts the same question again from the Tortoise.

After a few rounds, Achilles asserts that the answer to any similar question will be “yes”. Armed with this “Answer Schema $\omega$ ”, the Tortoise then asks if knowing that “yes” is the answer to all those questions allows him to deduce that it’s Achilles Birthday?

Achilles then gives him the yes answer, $\omega + 1$ . The Tortoise asks the same question again. Achilles then gives him the answer schema $2\omega$ . So on and so forth…

Achilles tries to circumvent this by giving the answer schema $\omega^{\omega^{\omega^{\text{…}}}}$ . This is then named answer schema $\epsilon_0$ . But it could go on forever…

Finally, the Tortoise tells Achilles that it’s also his birthday, and that Achilles should take him out for a nice dinner, to which Achilles acquiesces, defeated.

XV. Jumping out of the System

In the same way that every new answer schema that Achilles devises can never fully answer the Tortoise’s question, that adding $G$ or $\sim G$ to TNT wouldn’t defeat Gödel’s trick. You can always construct a new $\text{(TNT+G)-PROOF-PAIR}$ function, on which you can base your new $G$ string.

Couldn’t you create an even more powerful system, that plus all of these holes? This is the same thing that the Crab did with his record player $\omega$ . In the story, we were left hanging as to how to construct such a device / formal system.

You try to extend TNT with the axiom collection $G_{\omega}$ . And yet you can once again create a proof-pair test, using a $\text{OMEGA-AXIOM}\{a\}$ test. Your system $\text{TNT} + G_{\omega}$ wasn’t clever enough to foresee it’s own embeddability inside number theory. Thus, any system can be Gödel-numbered and thus defeated.

There are three conditions to using Gödel’s self-reference method:

System should be rich enough to express all desired statements about numbers In the Contracrostipunctus this would have been substituting a fridge for a record player for example. It is simply not able to express any note or melody. . Otherwise it’s too weak.
All general recursive relations should be represented This would be like having a low-fidelity record player. (testable for by a FlooP program). Without this, the system fails to represent some basic facts.
Axioms and typographical patterns must be recognisable by some terminating decision procedure This would be a phonograph still on the drawing board, only partially designed. You couldn’t be sure how it would play certain sounds. . Without this you couldn’t be sure if a derivation was valid at all,

Since all three are needed to produce a sufficiently powerful system and they are also the “critical mass” for the Gödel trick, you can’t escape.

Hofstadter then engages with J.R. Lucas’ position on how Gödel’s theorem proves that machines can never be intelligent. Lucas asserts that a program can be devised that prints out facts of number theory. It knows them when in prints them out, anthropomorphising the machine.

Yet, since it can never print out $G$ because the system it works with is incomplete, it never knows as much as we humans do.

Hofstadter takes issue with the fact that humans would “know” the theorem $G$ . We might know it’s possible to construct, yet never be able to do so, due to increasing complexity of the formal system.

The computer can’t go outside of itself, to assert a meta-fact. But this is not a defect of TNT, rather a mismatch of expectations. Even we humans cannot do this. Lucas cannot assert the sentence “Lucas cannot consistently assert this sentence.”. So Lucas is also incomplete.

A computer can certainly modify it’s program, but just in the way it was instructed to, therefore only obeying it’s instructions Even if it modified the instructions on how to modify itself, it would still obey instructions. .

The same escalating spiral of levels is present with objectivity, in Galileo’s four Dialogues concerning Two New Sciences. There are three characters: Simplicio (simpleton), Salviati (wise) and Sagredo (objective). Sagredo is supposed to be an objective judge, yet he always agrees with Salviati. Maybe that’s because he’s always right, but couldn’t we add another, higher-level judge of that? We have to ask, why add Sagredo at all? Why not simply let the characters speak? It’s because adding him adds to the illusion of “stepping out of the system”.

Edifying Thoughts of a Tobacco Smoker

Achilles is visiting the Crab’s Home. After some discussion of a Bach poem titled “Edifying Thoughts of a Tobacco Smoker”, the Crab wants to show Achilles some of his own songs.

He pulls a record from his shelf, which he puts into a new record player, that first scans the record, then swallows it and plays it. By chomping any record which doesn’t possess the “label” - which is the Crab’s unique style - it might not be a “perfect” record player anymore, but it can withstand the Tortoise’s attempts to destroy it There’s some interesting discussions and meta-reference in this dialogue. Hofstadter mentions Copper, Silver, Gold: An Indestructible Metal Alloy, a book about metal-logic. The book mentions a “Tobacco Mosaic Virus” which infects tobacco plants and destroys them. Incidentally the virus looks like tiny cigarettes - cigarettes destroy the tobacco. This is similar to the “suicide virus” which is mentioned later, which destroys it’s own host cell. .

Achilles then plays with the Camera from the record player, pointing it at it’s own screen, thus creating interesting recursive patterns. Only if the camera films the whole screen, will it show an infinite descending loop of screens. He then wants to create total self-engulfing, by also filming the camera at the same time. But even with a mirror, he can’t film the back of the mirror - which would also be a part of the “device”.

They then discuss the painting Ceci n’est pas une pipe by Magritte. Achilles is wondering if “ceci” refers to only the pipe (which would make the sentence a falsehood) or to the entire painting.

XVI. Self-Ref and Self-Rep

A sentence can refer to itself using the seemingly simple technique of using “This sentence”. But this sneakily relies on our understanding of context and the English language. You could now try to create a self-ref (self-referring) sentence without using this trick, for example by quoting itself. This is doomed to fail, the quoting sentence would have to be longer than itself… The solution is to use quining, as illustrated in Air on G’s String.

You could also program such a quine. Depending on the language used, this might be pretty complex, or very easy. A language could contain an operator * which prints a copy of the program before executing. This would be the equivalent of cheating by using “This sentence”.

To be a self-rep (self-reproducing) copy, the copy has to somehow (to the maximum extent possible) contain explicit directions for the copying operation. Otherwise, it’s just passively participating. The computer program needs the computer, the compiler, etc… A program which on execution would create a whole new computer and compiler and copy itself would be similar to DNA, in that it is able to copy not only itself, but also the whole support machine. DNA does this by also containing the instructions on how to make the enzymes that copy it - it just needs some help to get started. and the cell the whole support organism. In the same way, the camera from the dialogue can never film itself, the screen, the mirror, the back of the mirror, etc… all at once.

Typogenetics

Hofstadter now introduces a new typographical system, called Typogenetics.

There are four letters (or bases): A, T, G and C A and G are purines, C and T are pyrimidines. . They form strands when put together. There are some derivation rules, called enzymes, which operate on one strand at a time - they bind to a specific base and then execute their “program”, one unit (base) at a time. Each enzyme likes to start at a specific kind of base.

The enzymes can move and search bases on a strand, they can then copy it to form the complementary strand (A $\Leftrightarrow$ T and G $\Leftrightarrow$ C) or cut the strand. There are fifteen kinds of commands that can make up the programs (enzymes), each coded for by a three-letter abbreviation, which are called amino acids. This code forms the primary structure of an enzyme.

Instead of starting with some specific axioms and rules of inference like in the MIU system, each DNA strand can be translated into an enzyme with the Typogenic Code, which maps pairs of two bases to a specific amino acid. The AA pair is like a delimiter, it tells us where the next enzyme coding starts Thus we have 15 enzymes + the delimiter, which maps to all 16 different pairs of DNA. . Thus the strand contains the derivation rules themselves.

To find out the binding preference of an enzyme, we use the tertiary structure. Each amino acid either fits straight onto the enzyme, or turns left or right. Depending on the orientation of the final amino acid in the chain, the enzyme binds to a different base initially.

We, as the player, do the work of the ribosomes. They are what converts DNA strands to enzymes in a real cell. Via ribosomes, the strand is mapped to an enzyme, which then modifies the strand - thus creating a multi-level (strange) loop.

(Real) Genetics

Real DNA comes in double strands, each base paired with it’s complement. The strands are held together by strong covalent bonds, the complementary base-pairs only by weak hydrogen bonds.

The DNA is protected inside the nucleus of the cell. The mRNA (messenger RNA, which has a U base instead of a T) is responsible for transcribing parts of DNA into mRNA, which is then transported outside the nucleus, where the real work happens: mRNA is “interpreted” by ribosomes to create the enzymes (and other types of proteins), which then act on mRNA, etc…

Proteins are made up of amino acids, of which there are 20 different kinds. The building blocks of all of these structures are the same, but while DNA is made up of hundreds or thousands of millions of nucleotides, proteins only consist of about three hundred of them.

The way the ribosome translates from mRNA to enzyme doesn’t rely on it having the translation table memorised. Instead, it hooks into a segment of three bases (a codon) at a time with it’s “record player”-like playing head. Then, tRNA pieces (which are like puzzle pieces) are pulled from the cytoplasm until a match is found. Each type of tRNA piece only fits one unique combination of three bases. On the other side of the tRNA piece, there is a single amino acid. That piece then gets appended to the slowly growing protein.

How are the tRNA pieces produced? Enzymes - which are coded for in the DNA - produce it, always matching the right amino acid with the right codon. Thus the genetic code is stored in the DNA itself, indirectly through enzymes.

Because the tRNA contains the instructions to interpreting DNA into enzymes, the are somewhat the essence of the outer message of the DNA. Thus the DNA contains its own outer message There always must be some knowledge of the Genetic Code in the cell beforehand, as to allow the manufacture of enzymes which create tRNA pieces of the DNA master copy. There always needs to be something to jump-start the process. .

The newly produced protein / enzyme folds itself up (the tertiary structure) in a consistent way that depends on it’s structure, which ensures that it fits onto the right base and executes it’s function. This folding is a lot more complex than the logic described in the typogenetic system Hofstadter hypothesised that a program could exist which takes the primary structure and infers the tertiary one. This has indeed come true, with the folding@home project and AlphaFold. .

In the same way that AA codes for the end of a protein in the typogenetic system, there are special codons that specify the end of an enzyme on mRNA strands Some genes may overlap - as was discovered in viruses - and thus stretches of DNA may code for totally different proteins. This is what inspired the Haiku in the fortune cookie in the Canon by Intervallic Augmentation. .

Isomorphisms

To have self-replicating DNA, we need a support system, as DNA itself can’t bootstrap. There needs to be ribosomes, and polymerase (which copies the DNA to make mRNA) to make the mRNA which feeds the ribosomes. Thus, similarly to how number systems need to be “sufficiently powerful formal systems” to enable self-reference, cells need to have a “sufficiently strong support system” to replicate.

DNA self-replicates by using enzymes to pull apart the two complementary strands of the double helix. Behind the unzipping-enzyme come two copiers, which create the complementary strand for each of them. At the end, everything is plugged up by another enzyme. This is somewhat similar to quining, in that the three enzymes don’t care that the section they’re currently copying might be their own blueprint. Thus there is the use-mention dichotomy again. The DNA makes the enzymes that copy (DNA as the program, or use) the DNA itself (here the DNA is used as data, or mention).

There are different levels of meaning to DNA. On the lowest level, each strand codes for an equivalent RNA strand - transcription. Then there is translation, which creates enzymes from the RNA. On the highest level, expression, physical properties - the phenotype - is pulled from the genotype - known as epigenesis. There could be pseudo-epigenesis by a program that simulates birth on the lowest possible level. Then, there’s also the possibility of doing this without an exact simulation, by finding some mapping to physical features directly - a sort of decision procedure for gene expression Hofstadter notes that this exists for the species Felis catus, the common cat. It’s DNA is CATCATCATCAT… .

Hofstadter then draws a comparison between TNT and genetics, by mapping the terms to each other. Strands of DNA become strings of TNT, mRNA are statements of N, proteins are statements of meta-TNT (since they operate on strands), etc… You can do the same for record players, as was already done in the “record player head” metaphor for ribosomes. In the same way that there are strings of TNT that destroy the formal system, there are DNA strands that code for enzymes which destroy the cells - sort of like suicide viruses (a low-level version of the Tobacco Mosaic Virus mentioned in the dialogue).

T-Phages, a specific type of virus, work by injecting their own DNA into the cell, which disrupts the cell’s own DNA production and hijacks it to produce more viruses. Then the cell bursts and releases more viruses. The cell needs to recognise the viruses DNA to defend itself. It can do this by labelling it’s DNA with methyl and destroying DNA which doesn’t have the label for example This TC (T-Phage and Cell) game is quite like the Tortoise and the Crab’s game with record players. .

A virus’s DNA that gets reproduced in the cell has as interpretation: “I Can be reproduced in Cells of Type X”, whereas the suicide virus’s DNA’s interpretation would be “I Cannot be reproduced in Cells of Type X”. The suicide virus would be the Gödel sentence, the equivalent of the real virs’s DNA would be a Henkin sentence - they assert their own theoremhood or producibility in a formal system. You can construct such a Henkin sentence by removing the negation at the start of the Gödel string G - it then says that it’s a theorem.

While such a sentence tells you that it is derivable, it doesn’t tell you how. There are also explicit Henkin sentences that also describe how to derive them. This is similar to how some viruses, like the T-Phage mentioned before, also tell the cell how to construct them. Other viruses only make the cell reproduce their DNA and then spontaneously assemble.

Some viruses encode how to manufacture themselves, as to instruct their host-cell. But how does our DNA tell the cell what to do and how to assemble our complex body? This has to do with cellular differentiation (how do different types of cells - kidney, brain, skin, etc… - emerge) and morphogenesis (How does intracellular communication result in complex substructures like the brain?).

Differentiation works by using feedforward and feedback loops. Cells can inhibit (disable them, by plugging them up) or repress the production of certain enzymes. Repressing the production of certain enzymes works by making the DNA “unreadable” for RNA polymerase. It can also induce the production again, by disabling the repressors. By this process a cell can control what enzymes are produced and thus what function it performs.

The distinction between program and data in DNA is somewhat arbitrary, the same way TNT and N statements mix. The DNA is program, data, interpreter, processor and language fused into one. DNA contains all the information for constructing proteins (program translated into machine code). These enzymes then manipulate the DNA itself as data. DNA also contains it’s own genetic code, by coding for tRNA which is used for translation This is similar to how languages can be bootstrapped. Their own compilers can be written in the language as soon as it’s powerful enough. If the copies of the compilers were then all lost, the language would be “dead”, the same way DNA couldn’t replicate without cells. .

Proteins are machine language by encoding a program to modify DNA, processors, since they perform the operations, by being catalysts, and interpreters, carrying out the program in the DNA. Ribosomes could be interpreters, carrying out the code in the tRNA. You could also say that they are processors, while tRNA is an interpreter.

So how did this process ever get started in the first place?

The Magnificrab, Indeed

Achilles and the Tortoise meet the Crab on a walk to a teahouse on top of a hill.

The Crab has received a letter from Mr. Najunamar (Ramanujan backwards) in which he asserts a number of fantastical mathematical results, for example a map of India coloured in 1729 distinct colours This is a satirical reference to the famous 4 colour theorem. . The crab is sceptical of his results and concludes that it’s more likely that he is a charlatan of extraordinary ingenuity than that it’s a great mathematical genius The opposite of G. H. Hardy’s conclusions: that it had to be a genius. .

It also turns out that the Crab can play any sentence of TNT as a musical piece on his flute. As Achilles finds out when he composes pieces of his own, the Crab only perceives those pieces as beautiful that represent truths in TNT.

The final test of the Crab’s abilities presents itself when Achilles “composes” the Goldbach conjecture. The Crab becomes very uncomfortable and refuses to play it. It seems as though his abilities have a limit after all.

Why is the Crab coming down, having eaten and being really full, from the teahouse at the beginning and then lies about it and walks back up with the Tortoise and Achilles?

XVII. Church, Turing, Tarski and Others

One of the main theses of this book is that: all modes of thinking emerge from simple, formal systems on a low level. Our complex thoughts are the result of neurons firing in the brain, not some magical soul.

Neither the Crab, nor Srinivasa Ramanuja perform any special operations in their brains that allow them to violate the Church Thesis - tell theorems of TNT from non-theorems, nor true from false statements.

Because our conscience is thus the result of purely mechanical (rather neuronal) computation, it can be replicated, skimmed off, and performed on other hardware, such as a computer. Hofstadter’s basis for such a claim is the Church-Turing thesis. It states that any deterministic method which a sentient being follows in order to sort numbers into two classes within finite time can also be replicated as some FlooP program (general recursive function).

It says that there isn’t any more powerful computation than FlooP, some magical calculations. Now some might be sceptic of that and say that our brain might have special powers. But psychological evaluations have shown that these special minds that can calculate faster use the same methods as the rest of us The time taken increases when they are asked to operate on bigger numbers. You could probably arrive at the big O value for their algorithm by timing them. . They can just perform the operations faster.

Now you might say that this seems plausible for mathematical formal systems, but in the real world, things are much more complicated: “Changing a light bulb requires adapting to so many factors, there has to be some extra power in our brain!” But just because we can’t form a 1-to-1 isomorphism to the “real world” in those more complex situations doesn’t mean that there isn’t some lower-level computation going on that can be mirrored by a computer There are probably “symbols” operating that represent some concept that only makes sense in relation to the other symbols it triggers. Higher-level meaning is an optional feature of neural networks, in the brain our own introspective abilities aren’t universal either (they might have developed due to environmental pressure). .

We can thus state a “microscopic” version of the CT-Thesis: the behaviour of the components of any living being can be simulated on a computer The question of how finely we’ll need to actually copy the processes of the brain to achieve intelligence probably depends on the depth of intelligence we want. Checkers, Chess and now with LLMs even language output has been solved, yet we’re not satisfied. With every step in AI advances, it seems like we rule out what isn’t actually considered intelligence by us humans, instead of defining what is. . The stomach is no more magical than the brain, we can simulate both. Reductionistically, all brain processes are derived from computable substrate.

Some “soulist” people might say that there are things in a brain that the computer can’t replicate, due to their irrational nature. That is a misunderstanding of levels however. Even though the computer can’t defy “logic” on the transistor level, it can still produce non-sensical output, if it has been programmed to do so. In the same way, the brain’s neurons also never violate “logic”.

Thus the final form of the CT-Thesis is that any mental processes can be simulated by a program in a language not more powerful than FlooP. All intelligences are just variations on the same one.

Church’s and Tarski’s Theorem

The Magnificrab’s ability to distinguish theorem from non-theorem is impossible, because of Church’s Theorem. It states that there can never be a decision procedure for theoremhood.

If it becomes representable (not just expressible like in Gödel’s string) then we can construct the Epimenides paradox in TNT. Before, we proved TNT was incomplete using $G$ , by expressing the notion that $G$ is a theorem. Because it was only expressed, we could deny $G$ is theoremhood and settle for incompleteness.

Now, with our assumption that theoremhood is representable, if $G$ is a theorem, so is $\sim G$ , since $G$ states it - TNT would be inconsistent. If $G$ were not a theorem, then by the representability of theoremhood, the formula asserting it isn’t (which is $G$ itself) would be a theorem - paradox! There is no escape this time. Thus we must conclude that our assumption, the representability of theoremhood was erroneous.

Tarski’s theorem goes a step further and states that number-theoretical truth is not even expressible. To prove this, let’s assume $\text{TRUE}\{a\}$ exists and let’s construct the same string $G$ as before with it, now called $t$ - and then arithmoquine it to get $T$ . Thus we get $T$ saying: “The arithmoquinification of $t$ is the Gödel number of a false statement.”, and since the arithmoquinification of $t$ is $T$ , it says that $T$ is false. Thus, this new formula also replicates Epimenides paradox.

If Tarski’s formula really existed then, there would be a statement about a natural number both true and false at the same time. Thus, according to Chuch and Tarski, theoremhood is not representable and truth is not expressible.

Hofstadter tries to relate the English version of Epimenides paradox to Tarski’s theorem, saying that the brain trying to compute the sentence is impossible, since it’s only a formal system. This seems quite far-fetched to me. See Tarski’s original statements.

SHRDLU, Toy of Man’s Designing

The new character Eta Oin Etaoin Shrdlu is the arrangement of keys on a linotype machine keyboard, used to type up newspapers. When a mistake in a line was made, they filled it up with nonsense - usually pressing the keys in order - which results in ETAOIN SHRDLU. Sometimes this text was left in by mistake. interacts with the SHRDLU program by MIT. The entire chapter is a narrated chat between the program and the character.

The SHRDLU program’s source code is available. It’s a program designed to operate in a toy world, where blocks and pyramids of different sizes and colours are stacked on a table. The user can ask the program to stack them or to answer queries about the current state.

Artificial Intelligence: Retrospects

The Turing Test comprises three players, (A) a man, (B) a woman and an interrogator (C). The interrogator must deduce which of the two players X and Y is the man and which the woman. A’s objective is to cause C to make a mistake, B has to help the interrogator. Turing proposes that a machine able to perform the role of A, trying to deceive, serves as a proxy for the question “Can machines think?”

Turing disarms a number of objections to the test, pointing out that the test allows “drawing a sharp line between physical and intellectual capacities of a man”. He expects us to be able to anthropomorphise a machine as “thinking” by the end of the century.

Hofstadter presents Tesler’s Theorem: “AI is whatever hasn’t been done yet.” As more and more tasks were solved by machines, like playing checkers, integrating functions, etc… our expectations grow.

A discussion on the origin of a program’s output follows: is the programmer or the machine responsible for the output? Samuel’s checkers program beats Samuel himself, the same way a program calculating $\pi$ beats the programmer in speed. And yet the algorithm was invented by the programmer.

Hofstadter proposes the following test. If you can trace the origin of the idea in the program, it was only revealing “hidden” ideas from the programmer’s mind. If however, the program didn’t hide the original ideas inside the code, one should begin to separate programmer and program This is the case for neural networks like LLMs for example. AI interpretability is standing before a black box. .

If programs that one can identify with start to exist, Hofstadter would feel comfortable attributing them the ownership of their output, for example composed pieces of music. He wants the program to have internal “symbols”, the same way he thinks brains have them Anthropic was able to extract semantic neurons or subnets from their LLM, as detailed in this research paper. By training an auto-encoder on the middle layers in the LLM they extracted activation patterns corresponding to certain outputs, across languages, like the “Golden Gate Feature”. .

Problem-Space Reduction

A dog trying to get to a bone behind a fence might just stand in front of it, barking, or it might notice the open gate a few meters away and get to the bone. What initially looks like steps in the wrong direction - away from the bone - is actually the only way to get to it.

Thinking about spacial distance here is the wrong problem space, you have to transform it into another space where the way to the door is the shortest route. This is also true in AI, where the right problem set-up and organisation of knowledge can make things easier.

An AI trying to generate text will struggle with character level data, but provide it with words and their relations like Hofstadter does and it can generate some sensible sentences.

Not everything can be represented in data, the same way a computer doesn’t have explicit knowledge of addition, etc… but instead has them as process knowledge This might be similar to how AI with tool calls is more efficient, it offloads knowledge tasks to procedure. . Some things have to be wired in.

Does it even make sense to ask about originality in a non-legal context? We aren’t the “owner” of the experiences that shaped our mechanical thinking machines (brains) either, so how can we say that it was “our thought”?

Contrafactus

Achilles, the Tortoise, the Crab and the Sloth are watching football. The Sloth and the Tortoise are talking in counterfactuals and are not making much sense.

They are watching the game on the Crab’s new Subjunctive-TV - he won it at a lottery. By tweaking the channel, you can see what would have happened, given some conditions, for example had the player not gone outside the field. Achilles asks to see what the play would have looked like if addition was not commutative, but the Crab says that there isn’t a channel for such big changes.

At the end it turns out the whole dialogue was only a hypothetical, had the Crab won that TV.

XIX. Artifical Intelligence: Prospects

Just like the Subjunc-TV, we constantly manufacture such counter-factual scenarios and try to find out what might have happened. But there is a line of probable hypotheticals - the player scored the goal, he slipped - and improbable ones, we would consider absurd - the pitch was on the moon, the players were aliens. Hofstadter proposes that our brain comes with frames, which have their expectations - fixed constants, parameters and variables.

Hofstadter goes on to describe by which mechanisms a machine might solve Bongard Problems, how machines could recognise similarities. This is a problem of finding and focussing on the relevant attributes and being able to describe scenes in terms of them. Humans seem to have flattened this sort of pattern recognition (for faces, shapes, paths, trails) into our unconscious.

Based on this analysis of pattern recognition, Hofstadter suggests that symbols are the combination of frames with actors, that can send and interpret messages. These can be fused and separated based on learning Here words like “doughnut” are interesting, since they don’t share any characteristics of the words making them up so they are not “joint symbols” at all - in fact we don’t even activate the “dough” and “nut” symbols when hearing “doughnut”. . The same objects or words can be represented by different symbols and descriptions, based on the angle of attack and context.

At the end, Hofstadter presents a number of predictions on AI in the future, some of which came true and some are completely false.

He thought that solving chess would be equivalent to general intelligence, which isn’t the case at all for Stockfish.
At the same time, he said that these general intelligences would not be able to add like a calculator, making errors like we do, which is indeed the case for LLMs.
He said that there would be no way to reach in and tweak knobs to change the behaviour of these emergent systems. This turns out to be possible thanks to neural patterns like the “Golden Gate Feature” in Claude.

How is it that humans recognise patterns exactly? When looking at a Bongard problem, the relevant features just seem to jump out - I have no explanation for where they came from and how I made them out?
Could there be Bongard Problems that are easy for a machine but impossible for us due to the way our eyes and recognition work?

Sloth Canon

This dialogue imitates what Hofstadter calls a "Sloth Canon", where a third voice repeats is upside down and twice as slowly. Here, the Sloth says the opposite of the Tortoise's lines (encouraging Achilles to play the piano instead of wanting to prevent it) and is a bit behind the Tortoise.

Why is there a discussion on the length of potato fries between Achilles and the Sloth at the end?

XX. Strange Loops Or Tangled Hierarchies

Samuel believes his Checker's program is not choosing the moves by itself, but that everything is programmed in. He thinks that any machine with consciousness or it's own will would fail due to the mechanical instantiation of will requiring an infinite regress. A program with truly free will would have had to program itself Otherwise, the writer of the program would have pre-determined all actions. .

Hofstadter sees as issue with the assumption that a machine (or a brain) cannot do anything without having a rule telling it to do so. They can, because of the hardware level. Nothing tells transistors to behave as they do and nothing tells neurons to behave as they do - it's just physics.

You didn't develop your own brain, and yet you still believe you have free will. In the same way, some day machines might spontaneously develop free will too.

Intuition for Consciousness

In this final chapter, Hofstadter wants to provide an overview of mental images that helped him gain insight into how Tangled Hierarchies and Strange Loops form consciousness.

Imagine a chess game where on your turn, you're allowed to change one rule. But you can't just change any rule, there have to be rules for how to change the rules - now there are two levels of rules. If you wanted to change the meta-rules (telling you how to change the rules), you'd need meta-meta-rules telling you how to change those. To do this, you might need a formal system - or you could use a chess board that - interpreted with certain conventions that both players agreed on - encoded the rules. If you wanted to change the rules of how to move on this "rule"-board, you'd have to add another board that specifies the rules for that, etc... This infinite regress continues, as the top-level of rules is always inviolate.

To carry this even further, you could map this whole array of boards onto one single board. It then represents both the pieces to be moved and the rules by which they are allowed to move This is similar to a Typogenetic-like formal system. . But you haven't gotten rid of top-level rules yet - there's still the rules governing the interpretation of chess boards to rule sets. Even though you've tangled all other layers together, there's still this inviolate level that governs it all. You can never get rid of it.

The next example is the authorship triangle again. T writes a novel in which Z writes a novel in which E writes a novel on T. This seems impossible, but could happen - in a novel by author H. Even though T can do things to Z and Z to E and E to T in their novels, none of them can touch H. There's always a higher level that can't be modified from below.

The same thing happens in the Drawing Hands by Escher. Even though both the left and right hands draw each other, the unseen hand of M.C. Escher himself drew them both.

A painter painting a palette of colours creates the illusion of it existing inside the painting. And yet the colour splotches on the palette are literal splotches from the artist's palette. The paint paints itself.

Our brain's symbolic tangle is supported by the hardware tangle of the interconnected neurons themselves. But we can't interact with them or feel them and thus we fall for the illusion of believing that there is no inviolate level in our brain. We feel self-programmed, like there is no inviolate level.

Our brains may simply be too weak to understand themselves, like an animal doesn't understand self-consciousness. There is no reason that we should be able to comprehend our own brains - even though it may be perfectly clear to more intelligent beings.

In the same way that someone acting in M-Mode inside TNT could never prove that $G$ is a theorem by deriving it, even though arithmetically there's nothing wrong with it, we might never be able to explain the brain due to not having access to higher level concepts.

The emergence of consciousness and free will is based on a Strange Loop that creates some resonance. It's certainly possible to understand the brain in a totally reductionist level but it will be incomprehensible to us. We can only understand it on the higher level of symbols and by accepting causality without any underlying explanation.

Free Will in a Deterministic World

So how do we reconcile the existence of free will with the existence of this inviolate hardware level? The path of a marble rolling down a hill can't be predicted, but there's certainly no will in the marble. It's just complicated interactions. A robot in a maze choosing the direction to go in based on the parity of digits of $\pi$ works in a similarly deterministic way and we would not attribute it any free will either.

But if you allow the robots "self-symbol" to interact with the decision it makes, as humans we'd be compelled to say it has free will. Hofstadter thinks that we'd be able to identify with it. Even though, on a very local level, the program will look similar to the one making decisions based on $\pi$ , on a higher level it will look more like our thinking.

A sophisticated chess program that learns from playing might never make the same move twice - does it have free will then? If we were to empty it's cache and re-seed the RNG, it would behave the exact same way again. But is there a reason to suspect that we would work differently if we relived the exact same situation again?

The balance between self-ignorance and self-knowledge - we have some intuitive understanding of the symbols in our brain, yet we can't inspect the neural level - is what makes free will emerge. We feel in control because we can't monitor the entire process.

Six-Part Ricercar

The book ends with this final dialogue, mirroring Bach's Six-Part Ricercarl. The dialogue has six voices: Achilles, The Crab, The Tortoise, Author, Charles Babbage and Turing. Each of them enter with the line "I can get along very well without such a program."

Babbage He is introduced as Ba. Ch. (Babbage, Charles) the celebrated improviser. performs the role of Bach, improvising some programs on the Crab's "smart-stupids" The name "smart-stupids" mirrors the construction of the word "piano-forte" on which Bach improvised for the King. (computers). He improvises on themes given by the Crab (chess computer, creating an intelligence - which is Turing) and at the end the Crab spells out the King's Theme, on which Babbage wants to create a piece.

The Author discusses this final dialogue in which he features with the characters in that dialogue. Achilles is especially distraught that he is but a figment of imagination inside the Authors brain. And yet he believes that he has free will, like people in dreams behaving independently and certainly like they had a free will of their own.

The Crab recounts the story of him reading the final chapter in Giraffes, Elephants, Baboons: an Equatorial Grasslands Bestiary in which the author quotes Minsky without attribution. He then says that the author hinted that he would at the very end sometimes before that chapter. It turns out that that sentence was this hint itself, as Hofstadter quotes Minsky without attribution in the dialogue.

How does the final sentence make the Ricercar be endlessly repeating - like it somewhat implies - there is no link between it and the first sentences of the dialogue?

Interesting Books from the Bibliography

Alan Ross Anderson, Minds and Machines, 1964. Contains the full text of J. R. Lucas’ Minds, Machines and Gödel.
Marvin Minsky, Matter, Mind and Models. A short but seminal paper.

Gödel, Escher, Bach - A Practical Guide & Summary

Table of Contents