Thursday, April 18, 2019

Let's Watch: 10-on-10: The Chronicles of Evolution - Eörs Szathmáry

This post is supposed to be a prequel to my upcoming post on the major transitions in the evolution of life on earth. Eörs Szathmáry is a good lecturer and the lecture gives a very clean overview of the origin of self-replication, evolvability, and the relation of evolution with learning theory. I am reading about this, because I think AI might cause another major transition in evolution in a similar way.

This post will give annotations, references, and comments, to read along the lecture.

The "10-on-10" comes from Sydney Brenner's 10-on-10, of which this lecture is a part:
These seminars would begin $10^{10}$ years ago with the origin of the universe, then move on to the beginnings of biological life $10^9$ years ago, and so on, up to the development of modern human society in the present time, or $10^1$ years.


Autocatalysis (self-replication of chemicals)

The Formose cycle

The formose reaction, discovered by Aleksandr Butlerov in 1861, is of the form
$2 H_2CO \xrightarrow{H_2CO} HOCH_2COH$
It's basically glycolaldehyde doing self-replication by eating formaldehyde. Glycolaldehyde starts from the bottom left and goes around a cycle, eating two formaldehydes, and spits out a copy at the bottom edge.

More abstractly,

There are more complicated versions of this cycle, where the glycolaldehyde doesn't self-replicate, but instead catalyzes the formation of more complicated stuffs.

Ribonucleotide from simpler stuffs


Fig. 1 of Systems chemistry on early Earth (Jack W. Szostak, 2009)

This reaction was first reported in Synthesis of activated pyrimidine ribonucleotides in prebiotically plausible conditions (Matthew W. Powner, Béatrice Gerland & John D. Sutherland, 2009).
The origin of life on Earth required... the synthesis of a genetic polymer [DNA, RNA] from simple chemicals. The leading candidate for this role is RNA, but although 'activated' ribonucleotide molecules (the building blocks of RNA) can polymerize without enzymes, no plausible route had been found by which the ribonucleotides could have formed. Now a team from the University of Manchester has found such a route.
That is, they have found one possible step in the origin of RNA:
$\text{2-Aminooxazole}\xrightarrow{\text{chemistry???}}\text{activated ribo}\xrightarrow{\text{activated ribo}}\text{RNA}$

Think of this as answering the bootstrapping, chicken-and-egg problem: Once the DNA-RNA machines are running, they would construct more copies themselves. But how do you build up the DNA-RNA machines from simpler molecules?

The reason we don't see life using this 2-Aminooxazole reaction anymore nowadays, is because it's really inefficient: it's the best we've got back when earth was devoid of DNA-RNA machines, but once the DNA-RNA machines were up and running, this method went obsolete.

Theories of the origin of DNA-RNA life are very speculative, because ancient chemical cycles don't leave behind fossil records. Maybe they do, in some way (perhaps as an overabundance of organic molecules in ancient rocks?), but none is known for now.


Eigen's Paradox

Eigen's paradox, first put forth by Manfred Eigen in his 1971 paper Selforganization of matter and evolution of biological Macromolecules (Eigen 1971), says

  1. Without error correction enzymes, the max size of replicating molecule is ~100 base pairs (bp).
  2. For a replicating molecule to encode error correction enzymes, it must be >> 100 bp.
  3. So it seems impossible for replicating molecules to grow gradually from ~100 bp to >> 100 bp. There's a gulf of chaos between the two sides of stability.
Consider evolution as a learning process: the information encoded in the replicators is the knowledge learned, mutation is making up new hypothesis or forgetting old knowledge, selection is learning new knowledge. This is not mere analogy, and can be formalized with math. See How Can Evolution Learn? (Richard A. Watson, Eörs Szathmáry, 2016).

Eigen's paradox is this, then
  1. Without error-correction, molecular replicators can only keep about 200 bits of information.
  2. To produce error-correction enzymes, >>200 bits of information is needed.

There have been a few solutions. Eigen's solution was hypercycles.
Instead of each chemical cycle running itself, consider a lot of chemical near-cycles that come together and help running each other. $I_1$ produces substance A, which $I_2$ relies on, and so on. There might still be a problem: the whole cycle would still contain 500 bp of information, and would still suffer from too much error. But hypercycles can be modeled by differential equations, and it can be shown mathematically to overcome the error bound. 


Hypercycle?

But there's a problem of parasites. Some parasitic cycle could suck material off from the hypercycle without paying anything back, and kill the hypercycle. 

To be stable against parasites, make sure the parasites can't get access to too many altruists. To do that, imagine wrapping a dozen of these replicators into little bags (like protocells!), and let them grow, then open the bags, let them mix, and do it all over again. The bags with many altruists would grow fast, while those with many parasites would be unable to grow. The parasites end up inhibiting its own growth, and can't dominate. 

Again, this stability can be mathematically and experimentally shown, as in Transient compartmentalization of RNA replicators prevents extinction due to parasites (Matsumura et al, 2016).
They analyzed viral replication in a droplet-based microfluidic system, which revealed that as long as there is selection for a functional replicator, the population is not overwhelmed by the faster-replicating parasite genomes.
Another method to break through the Eigen paradox is the stochastic replicator model, from Group selection of early replicators and the origin of life (Eörs Szathmáry, László Demeter, 1987).

In this model, we have many kinds of ~100 bp replicators, replicating on their own. There is no hypercycle, and thus no parasitism. Hundreds of bp of information can thus be remembered. There's a problem though. The problem of monopoly. That is, how to prevent the fastest replicator from crowding everyone out.

The idea is this. Imagine two replicators green and red. Green replicates faster than red, but we want to keep the system having a 1:1 ratio of green and red. To do that, we can pack them into protocells and let them grow and divide like shown. And we constantly select for protocells with 1:1 ratios (perhaps because such protocells were the most "balanced" in the soup). This keeps the ratio stable.

Evolvability

Evolvability: How suited for evolution a system is.
Beneficial mutations are always rare, but if they are too rare, then adaptation cannot occur. Early failed efforts to evolve computer programs by random mutation and selection showed that evolvability is not a given, but depends on the representation of the program as a data structure, because this determines how changes in the program map to changes in its behavior. Analogously, the evolvability of organisms depends on their genotype–phenotype map. This means that genomes are structured in ways that make beneficial changes more likely. This has been taken as evidence that evolution has created not just fitter organisms, but populations of organisms that are better able to evolve.
The key is the genotype-phenotype map determined by the structure of the genome. If the map is bad, then changing the gene code is very unlikely to give a good mutation in phenotype. As evolution proceeded, the genome got evolved into a structure most suited for further evolution. 

An analogy can be drawn with the no free lunch theorem (David Wolpert, William Macready, 1997). The structure of the genome encodes a learning template, and the template only works well on certain kinds of environments, not others. This template can be evolved too, to be more optimized for whatever environment the replicators happen to live in.


In this view, the no free lunch theorem essentially says (after extreme handwaving):
In any particular environment, some genome structures allow faster evolution.

Modularity

Consider the beaks of Darwin's finches.
Fig. 1 of Wrinkly Spreaders, Wrinkleality and Fitness (Kimberley Mclaughlin, 2016)
How could they evolve so quickly? Modularity.

It turns that many birds' beaks are made by piecing together conic curves. As such, it only takes a few numbers to specify the shape of a beak, and smoothly varying these numbers would give smooth variations of the beak shapes, and thus smooth variations of fitness. This makes the beak shape evolvable.
Fig 3 of Shared developmental programme strongly constrains beak shape diversity in songbirds (Joerg A. Fritz et al, 2016)
It shows how the variations in beaks of closely related species are generated by scaling and shearing, and how the upper arcs of the beaks are all parabola arcs.

Computer Simulation

In Facilitated Variation: How Evolution Learns from Past Environments To Generalize to New Environments (Merav Parter et al, 2008), computer simulations of evolution have demonstrated the evolution of evolvability. One example from the paper is evolving RNA sequences that take on a particular shape. After that, the authors started selecting for another shape, and found that this time, the evolution proceeded much faster. 

Turns out, a few locations on the RNA sequences are evolved to become "genetic triggers", so that changing that few locations causes precise big changes in the shape of the RNA.

The phrase "facilitated variation" is proposed in The theory of facilitated variation (John Gerhart,Marc Kirschner, 2007). It basically means what we already said: the structure of the genome is optimized for evolution, that is, it facilitates the generation of beneficial variations.

From AI point of view, this is similar to the problem of transfer learning. The genome not only learns to solve a particular problem (how to survive in one environment), but also a structure optimized for solving a whole group of problems (how to survive on general earth-surface environments), and this knowledge transfers, and allows quicker evolution when the problem is changed.

Evolution and Learning Theory

Compare these two quotes:

The Design of Experiments, page 8 (Ronald A. Fisher, 1971, 10th edition):
Inductive inference is the only process known to us by which essentially new knowledge comes into the world... Men have always been capable of some mental processes of the kind we call " learning by experience'. Doubtless this experience was often a very imperfect basis, and the reasoning processes used in interpreting it were very insecure; but there must have been in these processes a sort of embryology of knowledge, by which new knowledge was gradually produced.

The Blind Watchmaker, page 317 (Richard Dawkins, 1987):
The theory of evolution by cumulative natural selection is the only theory we know of that is in principle capable of explaining the existence of organized complexity. Even if the evidence did not favour it, it would still be the best theory available! In fact the evidence does favour it. But that is another story.
They suggest a formal analogy between inductive inference/learning and evolution.

Bayesian learning

The Replicator Equation as an Inference Dynamic, (Marc Harper, 2010):
The replicator equation is interpreted as a continuous inference equation and a formal similarity between the discrete replicator equation and Bayesian inference is described. Further connections between inference and the replicator equation are given including a discussion of information divergences, evolutionary stability, and exponential families as solutions for the replicator dynamic, using Fisher information and information geometry.
The details are shown in page 2 and 3 clearly, so I won't repeat here.

(The paper went on to study continuous version of the replicator equation, and its connection to information geometry, at which I got lost. Perhaps next time...)

No comments:

Post a Comment

Let's Read: Neuropath (Bakker, 2009)

Neuropath  (Bakker 2009) is a dramatic demonstration of the eliminative materialism worldview of the author R. Scott Bakker. It's very b...