Wednesday, November 23, 2016

Organ-izing against biological chaos: an advantage of multicellularity

The complex nature of most biological functions is curious.  Why would a cell ever pal up with other cells rather than slog through life alone?  Why did we big, clumsy, slow-reproducing organisms evolve in the first place?  Being stuck with other cells means (1) being larger but having lower mobility when it comes to choosing how to go about your business, (2) having to get along with the other cells, which could be a drag on your own survival, and (3) being restricted in what you do, having less flexibility.  That these issues can in fact be detriments can be seen in the likelihood that bacteria will dance on all our multicellular graves.

We usually think that bacteria are out there on their own.  But they, like other single-celled species, can aggregate and act as a single organism under some circumstances (bacterial biofilms and slime molds, sponges and others are examples).  Interestingly, the cells that make up the aggregate body are not necessarily those that shed to form another organism, such as sperm or eggs in mammals: even in a sponge, there can be separate 'body' and 'germ' cells.  The body cells reproduce within the body but, like worker bees, are evolutionarily subordinate to the queens--the few reproductively active cells. Presumably, aggregation can at least have a collective advantage even if individual cells go their own way most of the time, and there is no evidence that I know of, other than chance, that determines which of the sponge's founding cell lines will end up as reproducing cells.


Dictyostelium: Wikiwand


Cells in 'true' multicellular organisms, like humans, don't have any option of going it alone.  We begin life as one cell, and develop into a differentiated organism with many types of specialized cells (organs in animals, roots and leaves in plants).  Most of these don't reproduce, but the cells that do reproduce are genetically very closely related, so other cells aren't total evolutionary dead-ends.  Even a super-organism like a bee or ant colony has only a subset of organisms that directly reproduce, creating representative descendants of the whole group.

Not only do we have specialized organs, but they are typically comprised of a great number of cells of different types. Bacterial species can specialize in many different ways, but the cells in multicellular organisms generally specialize only in one: they are intestinal lining producers, or muscle cells, or cells of the neocortex.  That's a kind of cooperation within an organ analogous to the cooperation among organs that make the organism.

But there is a danger.  Each cell division introduces mutations that will be carried by the cell and its daughter cells for the future life of the organism. The organ in which the mutation occurs is stuck for life with the mutant cells.  The mutations are usually silent, or individually minor in their effects on the cell's behavior, but with millions or billions of cells in an organ, that has to work for a long time, at least some mutations may well have an effect, and some of those will be harmful.

A combination of such somatic mutations (SoMu) occurring over time may lead to a single cell lineage within the organ that no longer behaves properly, and in particular if it divides without the typical restraint for cell's tissue context in that organ, the changes can overwhelm the organ and that can threaten not only the whole organ but the whole individual.  Cancer is the classic example in animals.

Given this, then why would organisms with mandatory multicellularity ever have evolved?  Why not get together only when needed, as do the bacteria and slime molds of the world?

Safety in Numbers:  Protection from mutational danger.
The cells in an organism do share a common genome, the one in the founding cell of the organism (fertilized egg, or seed).  So an organism of varying specialized cells is a gang of likes, a differentiated, cooperating society of cellular kinship, which by aggregating can perhaps advance the cause of their group, their particular genotype, in a kind of Size Matters way: they can do things like exploit resources, just as bees and ants do, that an individual cell couldn't.  Specialization, and size do make a difference.  But the cost is that of the rogue members in the cellular society, whose SoMu number and sub-lineages increase with body size and age.  When one organ fails, the whole organism fails.

One aspect of the protection of multicellularity is that SoMus will have various effects, from none to organ failure and death.  Even if one cell lineage doesn't work efficiently, the organ itself is made of many other properly acting cells and even if an SoMu kills the cell, this may have no effect on the organ or the individual, with their countless normally behaving cells.  A herd can withstand the bad behavior of a few of its members.

The risks that being a multicellular organism entail are offset by the average behavior of the aggregates of cells, and it usually takes time before any rogue sub-lineage would be life-threatening to the organ or its organism, as for example cancer is.  Meanwhile, the organism can go about its business and take advantage of being a big, cooperative collective of organ functions, doing many things--travel, browse, hunt, mate and reproduce--in ways a single-celled organism can't do.

Mutations in parents, those arising in their genome and transmitted to their offspring, will either be selected against during development or will force the offspring to compete with its fellow organisms in the usual Darwinian way (see our series on the many other forms adaptation can take).

Since single-celled species, or those that spend most of their time as independents, are clearly doing very well and have done so for nearly the entire history of life (fossils of bacterial biofilms around 4 billion years old have been found).  So multicellularity was never an overwhelming advantage, even if it opened different ways of life for some--and these relative exceptions are the most visible species.

The safety-in-numbers aspect of multicellular organisms seems to be a good way that being big can be successful even in the ever-present face of mutations, most of which are harmful.  Safety in numbers may have allowed multicellular organisms to evolve in the first place.

Monday, October 31, 2016

This is the forest primeval: each tree an evolution

This is the forest primeval. 
The murmuring pines and the hemlocks,
Bearded with moss, and in garments green,
indistinct in the twilight,
Stand like Druids of old,
with voices sad and prophetic,
Stand like harpers hoar,
with beards that rest on their bosoms.
Loud from its rocky caverns,
the deep-voiced neighboring ocean Speaks,
and in accents disconsolate
answers the wail of the forest.


These famous lines, from Longfellow's 1847 epic poem Evangeline, spoke of a sad human tale in the days of early European settlement in the New World. The story was about people, but there is much to tell about the Druids of old, the lives and evolution of treest can be quite surprising.

This post was motivated by a recent trade book The Hidden Life of Trees, by Peter Wohlleben, a German forester, who described what his life in the woods has taught him about trees, their nature, evolution, and biology.  It's written at a pop sci level, and is often quite subjective and evocative, but it's laden with important facts when it comes to trying to understand the evolution of these terrestrial beasts. And, in a sense, these facts generalize in many ways.



The author discusses all sorts of observations that have been made about the responses of different parts of trees (bark, vessels, wood, leaves, roots) to their environment (sunlight, presence of trees of their own species, or of other species, of insect, fungal and other parasites), even going so far as to describe the sociology of trees and their responses to being isolated vs being in a forest of their friends and relatives. Trees interact with their own detected relatives, connected via communication through the air and underground via fungal networks, to the point that they even assist each other, when in trouble, with nutrients. It is a remarkable picture of interactions between organisms in organized, positively coordinated ecosystems.

The book is very selectionist, in that every trait is described as an adaptation to this or that condition, but trees that seem very similar can be different in these respects, so there is the assumption (very hard to prove, if even possible) that each trait evolved 'for' its current function. This is a more deterministically selectionist or even determinist viewpoint than we think is justified by actual fact, even if the functional aspects are as described (which we have no reason to doubt). Indeed, many examples are given of ways trees respond differently to different environments, and hence are not rigidly programmed to live in one particular way.

In any case, our point here aside from recommending an interesting and informative book, is to muse over some we think rather widely missed aspects of trees, their lives, and how they manage to survive and evolve.

While the author is a very strong selectionist when it comes to explaining who does what among trees or among woodsy species, I think he--and for all I know the vast majority of botanists--overlooks what is likely a very major aspect of arboreal evolution.

One major problem that seems to need to be more widely considered (maybe it is by botanists, but we haven't seen much that refers to this particular issue) relates to the implications of time scales (a matter that Wohlleben discusses in detail). Trees can live for decades, centuries, or even millennia.
Wohlleben very clearly and repeatedly stresses the fact that trees live on such a different time scale compared to us, that it can be hard for us to fathom how their lives evolve--and evolve is the appropriate word. If trees are, so to speak, rooted in their origins for hundreds or even thousands of years, while insects, fungi, and other plants and animals (not to mention microbes) have generations in years or even minutes, how can trees ever adapt or survive? By the time a tree has reached a venerable age, hasn't it been out-evolved by almost every other species that lives in or that is blown into its neighborhood?  By the time it dies, when any of its seeds germinate they must already be obsolete, ready to fight the last war-or the last war minus 10 or 100 or 1000.

One answer, in my view, is the largely overlooked fact of the evolution of tree--of each individual tree--during its lifetime.

The evolution of tree (not trees)
Unless my feeble knowledge of botany totally fails me, I think there is a lot going on even at the normal pace of things, within an individual tree. That is that each tree is a remarkable micro-example of evolution in itself.

Each tree starts life as a single fertilized egg (its seed). During its life, that little cell divides into billions, probably trillions, of descendant cells. These make up its roots and, important for us, its trunk, branches, leaves, and flowers. While there are various aspects of communication among these cells, they are essentially independent.

Each cell division along the way from the root tip to the branch tip (or 'meristem'), mutations will occur. This happens in humans, too; such mutations are called somatic because they don't occur in the individual's germ line (that is, the cell lineage that leads to sperm or egg), and hence while the mutation carried by the original cell and its descendants may affect the local tissue, the change isn't inherited by the next generation. Only mutations in the germ line are, and indeed that's where the idea of 'mutation' historically arose. Most somatic mutations will have no effect on the gene-usage of the cell involved, but if they do it might be negative and the cell will die or just misbehave in a way that has no consequences because it's surrounded by countless healthy cells. Sometimes, such as with cancer, somatic mutations can be devastating.

Trees are different. They have no separated somatic and germ lines. Mutations occurring from the seed to the roots and limbs may lead to dead cells, or do nothing, or they may be screened for their 'fitness', their ability to generate the bark, vascular, leave of other tissues in their local time and place. They are, relative to other cells in the tree, removed by what we could call a version of natural selection. Those mutations that survive will be passed down the line or, rather up the line as the trunk, branches, and leaves grow.

Here is a photo of an oak tree and (metaphorically) its single starting genome:



At the end of the countless stems in a tree, over its long lifetime, would be meristem cells each carrying a wide but individually unique variety of mutational differences from what was in the founding acorn. At the meristem, in the appropriate time of year, cells differentiate into pollen and ovule cells. These are many generations of selection away from their founding acorn, and on a given tree there must be a great variety of genotypes, whose sequences would form a tree (a phylogeny), much as we find when we compare DNA sequences from dog species, or from individual humans.

A single tree is a very large evolutionary 'experiment'. Branches affected by harmful mutation, simply aren't there, so to speak. They and their genomic lineage are 'extinct'. A single tree, and its lifetime, comprise such a large 'experiment' that they are comparable in numbers to whole species of shorter-lived, germ-line-dependent organisms.

Here is a photo of a tree from our yard that may illustrate the point. Why are only the leaves on this one branch turning to fall colors so much earlier than the others on the same tree? There may be local environmental reasons, such as different sunlight or water supply or parasite effects, but this seems rather unlikely because other branches in similar positions, even on this same tree, are still green.




And now here is another photo, of a different tree in our front yard that we think illustrates the points we're making. This red oak loses its leaves in the usual way....except for the one major branch shown. Its leaves do not fall until the following spring, but the remaining branches on this tree drop in fall as would be normal. This happens every year and is not a fluke of some particular season.



A forester might have a local explanation, that there is some connection between the location of roots supplying these particular branches, relative to the underground water or soil conditions, but one possible explanation is somatic mutation. That is, some mutational effect, arising when the branch was early in its formation, led to a difference in the abscission  layers of the leaves to be produced by that branch, that retained those leaves through the winter.   If the explanation is local physical conditions, of course, that means the tree cannot be predicted from its founding acorn's sequence. But it is rather difficult to believe that somatic mutation doesn't have at least the kinds of effects seen.  A good experiment would be to take an acorn from this part of the tree and plant it next to one from another part of the tree and see what happens. Unfortunately, the answer wouldn't be available for many years....

Our point here is that among the countless cells in a tree's life, between its origin as a single cell and the also countless generations of its own acorns from its founding genome through its long live, there simply must have been countless somatic mutations, occurring all along the roots and trunk and branches, cell division by cell division.  Their descendants, down the root network, and up the trunk and into the branches must have been screened for the viability of any phenotypic effects, which many must have had.  If insects or bacteria attack or animal predators or the climate change, parts of a tree may be better able to survive than others.  Cells in the trees' future lives will have the benefit of these changes.  They may be small, but they may accumulate over the decades.  The branches affected by less helpful changes would flower less, or lead to branches that die or fall to predators, and so on--ones we never see later on, when we look at the tree.  Among the countless meristems every generation will be a population of differing genotypes to be passed on to its season's thousands and thousands of seeds.

In this way, by working through meristems everywhere (above ground) on the tree are cells with new genotypes screened for suitability in its environment at each time during the tree's life.  A tree is not a single organism, but a population of descendants of a founder.  The acorn was primeval perhaps, but not the forest.  It is this kind of within-life evolution that may, or perhaps must, explain how a single, immobile organism can survive for so long in the dynamics of local ecosystems.

That is, it's the tree itself, in its ever-renewing parts from root to twig, not just its evolving population of annual seeds, that must be evolving.  Decades, centuries, or millennia must often encompass changes in the biota around each primeval individual, and would destroy it, if it, too, were not evolving.  Otherwise, it would seem like asking for doom to be fixed in a given location for hundreds or thousands of years, surrounded by junior, dynamically evolving predators and competitors.  

The forest is always primeval: Each individual tree, in this view, is an evolving population, always adapting in its unchanging location to its locally changing conditions.

Thursday, October 27, 2016

Causal complexity in life

Evolution is the process that generates the relationships between genomes and traits in organisms.  Although we have written extensively and repeatedly about the issues raised by causal complexity,  we were led to write this post by a recent paper, in the 21 October 2016 issue of Science, which discusses molecular pathways to hemoglobin (Hb) gene function.  Although one might expect this to be rather simple and genomically direct, it is in fact complex and there are many different ways to achieve comparable function.

The authors, C Nataragan et al.,  looked at the genetic basis of adaptation to habitats at different altitude, focusing on genes coding for Hb molecules, that transport oxygen in the blood to provide the body's tissues with this vital fuel.  As a basic aspect of our atmosphere, oxygen concentrations differ at different altitudes, being low in mountainous regions compared to lowlands.  Species must somehow adapt to their localities, and at least one way to to this is for oxygen transport efficiency mechanisms to differ at different elevations.  Bird species have moved into and among these various environments on many independent occasions.

The affinity of Hb molecules for, that is, ability to bind oxygen, depends on their amino acid sequence, and the authors found that this varies by altitude.  The efficiency is similar among species at similar altitudes, even if due to independent population expansions. But when they looked at the Hb coding sequences in different species, they found a variety of species-specific changes.  That is, there are multiple ways to achieve similar function, so that parallel evolution at the functional level, which is what Nature detects, is achieved by many different mutational pathways.  In that sense, while an adaptation can be predicted, a specific genetic reason cannot be.

The authors looked only at coding regions, but of course evolution also involves regulatory sequences (among other functional regions in DNA), so there is every reason to expect that there is even more complexity to the adaptive paths taken.

Important specific documentation....but not conceptually new, though unappreciated
The authors also looked at what they call 'resurrected ancestral' proteins, by experimentally testing the efficacy of some specific Hb mutations, and they found that genomic background made a major difference in how, or whether, a specific change would affect oxygen binding.  This shows that evolution is contingent on local conditions, and that a given genomic change depends on the genomic background.  The ad hoc, locally contingent nature of evolution is (or should be) a central aspect of evolutionary world views, but there is a widespread tendency to think in classical Mendelian terms, of a gene for this and a gene for that, so that one would expect similar results in similar, if independent areas or contexts.  This is a common, if often tacit, view underlying much of genome mapping to find genes 'for' some human trait, like important diseases.  But it is quite misleading, or more accurately, is very wrong.

In 2008 we wrote about this in Genetics, as we've done before and since here on MT and in other papers.  In the 2008 article we used the following image to suggest metaphorically the nature of this complex causation, with its alternative pathways and the like, where the 'trait' is the amount of water passing New Orleans on the Mississippi River.  The figure suggests how difficult it would be to determine 'the' causal source of the water, how many different ways there are to get the same river level.

Drainage complexity as a metaphor for genomic causal complexity.  Map by Richard Weiss and ArcInfo
One can go even further, and note that this is exactly the kind of findings that are to be expected from and documented by the huge list of association studies done of human traits.  These typically find a great many genome regions whose variation contributes to the trait, usually each with a small individual effect, and mainly at low frequency in the population.  That means that individuals with similar trait values (say, diabetes, obesity, tall, or short stature, etc.) have different genotypes, that overlap in incomplete and individually unique ways.

We have written about aspects of this aspect of life, in what we called evolution by phenotype, in various places.  Nature screens on traits directly and only on genes very indirectly in most situations in complex organisms.  This means that many genotypes yield the same phenotype, and these will be equivalent in the face of natural selection and will experience genetic drift among them even in the fact of natural selection, again because selection screens the phenotype.  This is the process we called phenogenetic drift.  These papers were not 'discoveries' of ours but just statements of what is pretty obvious even if inconvenient for those seeking simple genetic causation.

The Science paper on altitude adaptation shows this by stereotypical sequences from one individual each from a variety of different species, rather than different individuals within each species, but that one can expect must also exist.  The point is that a priori prediction of how hemoglobin adaptation will occur is problematic, except that each species must have some adaptation to available oxygen.  Parallel phenotype evolution need not be matched by parallel genotypic evolution because selection 'sees' phenotypes and doesn't 'care' about how they are achieved.

The reason for this complexity is simple: it is that this is how evolution working via phenotypes rather than genotypes molds the genetic aspects of causation.

Thursday, October 13, 2016

Genomic causation....or not

By Ken Weiss and Anne Buchanan

The Big Story in the latest Nature ("A radical revision of human genetics: Why many ‘deadly’ gene mutations are turning out to be harmless," by Erika Check Hayden) is that genes thought to be clearly causal of important diseases aren't always (the link is to the People magazine-like cover article in that issue.)  This is a follow-up on an August Nature paper describing the database from which the results discussed in this week's Nature are drawn.  The apparent mismatch between a gene variant and a trait can be, according to the paper, the result of technical error, a mis-call by a given piece of software, or due to the assumption that the identification of a given mutation in affected but not healthy individuals means the causal mutation has been found, without experimentally confirming the finding--which itself can be tricky for reasons we'll discuss.  Insufficient documentation of 'normal' sequence variation has meant that the frequency of so-called causal mutations hasn't been available for comparative purposes.  Again we'll mention below what 'insufficient' might mean, if anything.

People in general and researchers in particular need to be more than dismissively aware of these issues, but the conclusion that we still need to focus on single genes as causal of most disease, that is, do MuchMoreOfTheSame, which is an implication of the discussion, is not so obviously justified.   We'll begin with our usual contrarian statement that the idea here is being overhyped as if it were new, but we know that except for its details it clearly is not, for reasons we'll also explain.  That is important because presenting it as a major finding, and still focusing on single genes as being truly causal vs mistakenly identified, ignores what we think the deeper message needs to be.

The data come from a mega-project known as ExAC, a consortium of researchers sharing DNA sequences to document genetic variation and further understand disease causation, and now including data from approximately 60,000 individuals (in itself, rather small compared to the need for purpose). The data are primarily exome sequences, that is, from protein-coding regions of the human genome, not from whole genome sequences, again a major issue.  We have no reason at all to critique the original paper itself, which is large, sophisticated, and carefully analyzed as far as we can tell; but the excess claims about its novelty are we think very much hyperbolized, and that needs to be explained.

Some of the obvious complicating issues
We know that a gene generally does not act alone.  DNA in itself is basically inert.  We've been and continue to be misled by examples of gene causation in which context and interactions don't really matter much, but that leads us still to cling to these as though they are the rule.  This reinforces the yearning for causal simplicity and tractability.  Essentially even this ExAC story, or its public announcements, doesn't properly acknowledge causal context and complexity because it is critiquing some simplistic single-gene inferences, and assuming that the problems are methodological rather than conceptual.

There are many aspects of causal context that complicate the picture, that are not new and we're not making them up, but which the Bigger-than-Ever Data pleas don't address:
1.  Current data are from blood-samples and that may not reflect the true constitutive genome because of early somatic mutation, and this will vary among study subjects,
2.  Life-long exposure to local somatic mutation is not considered nor measured, 
3.  Epigenetic changes, especially local tissue-specific ones, are not included, 
4.  Environmental factors are not considered, and indeed would be hard to consider,
5.  Non-Europeans, and even many Europeans are barely included, if at all, though this is  beginning to be addressed, 
6.  Regulatory variation, which GWAS has convincingly shown is much more important to most traits than coding variation, is not included. Exome data have been treated naively by many investigators as if that is what is important, and exome-only data have been used a major excuse for Great Big Grants that can't find what we know is probably far more important, 
7.  Non-coding regions, non-regulatory RNA regions are not included in exome-only data,
8.  A mutation may be causal in one context but not in others, in one family or population and not others, rendering the determination that it's a false discovery difficult,
9.  Single gene analysis is still the basis of the new 'revelations', that is, the idea being hinted at that the 'causal' gene isn't really causal....but one implicit notion is that it was misidentified, which is perhaps sometimes true but probably not always so,
 10.  The new reports are presented in the news, at least, as if the gene is being exonerated of its putative ill effects.  But that may not be the case, because if the regulatory regions near the mutated gene have no or little activity, the 'bad' gene may simply not be being expressed.  Its coding sequence could falsely be assumed to be harmless, 
11. Many aspects of this kind of work are dependent on statistical assumptions and subjective cutoff values, a problem recently being openly recognized, 
12.  Bigger studies introduce all sorts of statistical 'noise', which can make something appear causal or can weaken its actual apparent cause.  Phenotypes can be measured in many ways, but we know very well that this can be changeable and subjective (and phenotypes are not very detailed in the initial ExAC database), 
13.  Early reports of strong genetic findings have well known upward bias in effect size, the finder's curse that later work fails to confirm.

Well, yes, we're always critical, but this new finding isn't really a surprise
To some readers we are too often critical, and at least some of us have to confess to a contrarian nature.  But here is why we say that these new findings, like so many that are by the grocery checkout in Nature, Science, and People magazines, while seemingly quite true, should not be treated as a surprise or a threat to what we've already known--nor a justification of just doing more, or much more of the same.

Gregor Mendel studied fully penetrant (deterministic) causation.  That is what we now know to be 'genes', in which the presence of the causal allele (in 2-allele systems) always caused the trait (green vs yellow peas, etc.; the same is true of recessive as dominant traits, given the appropriate genotype). But this is generally wrong, save at best for the exceptions such as those that Mendel himself knowingly and carefully chose to study.  But even this was not so clear!  Mendel has been accused of 'cheating' by ignoring inconsistent results. This may have been data fudging, but it is at least as likely to have been reacting to what we have known for a century as 'incomplete penetrance'.  (Ken wrote on this a number of years ago in one of his Evolutionary Anthropology columns.)  For whatever reason--and see below--the presence of a 'dominant' gene or  'recessive' homozyosity at a 'causal' gene doesn't always lead to the trait.

In most of the 20th century the probabilistic nature of real-world as opposed to textbook Mendelism has been completely known and accepted.  The reasons for incomplete penetrance were not known and indeed we had no way to know them as a rule.  Various explanations were offered, but the statistical nature of the inferences (estimates of penetrance probability, for example) were common practice and textbook standards.  Even the original authors acknowledge incomplete penetrance, but this essentially shows that what the ExAC consortium is reporting are details but nothing fundamentally new nor surprising.  Clinicians or investigators acting as if a variant were always causal should be blamed for gross oversimplification, and so should hyperbolic news media.

Recent advances such as genomewide association studies (GWAS) in various forms have used stringent statistical criteria to minimize false discovery.  This has led to mapped 'hits' that satisfied those criteria only accounting for a fraction of estimated overall genomic causation.  This was legitimate in that it didn't leave us swamped with hundreds of very weak or very rare false positive genome locations.  But even the acceptable, statistically safest genome sites showed typically small individual effects and risks far below 1.0. They were not 'dominant' in the usual sense.  That means that people with the 'causal' allele don't always, and in fact do not usually, have the trait.  This has been the finding for quantitative traits like stature and qualitative ones like presence of diabetes, heart attack-related events, psychiatric disorders and essentially all traits studied by GWAS. It is not exactly what the ExAC data were looking at, but it is highly relevant and is the relevant basic biological principle.

This does not necessarily mean that the target gene is not important for the disease trait, which seems to be one of the inferences headlined in the news splashes.  This is treated as a striking or even fundamental new finding, but it is nothing of that sort.  Indeed, the genes in question may not be falsely identified, but may very well contribute to risk in some people under some conditions at some age and in some environments.  The ExAC results don't really address this because (for example) to determine when a gene variant is a risk variant one would have to identify all the causes of 'incomplete penetrance' in every sample, but there are multiple explanations for incomplete penetrance, including the list of 1 - 13 above as well as methodological issues such as those pointed out by the ExAC project paper itself.

In addition, there may be 'protective' variants in the other regions of the genome (that is, the trait may need the contribution of many different genome regions), and working that out would typically involve "hyper astronomical" combinations of effects using unachievable, not to mention uninterpretable, sample sizes--from which one would have to estimate risk effects of almost uncountable numbers of sequence variants.  If there were, say, 100 other contributing genes, each with their own variant genotypes including regulatory variants, the number of combinations of backgrounds one would have to sort through to see how they affected the 'falsely' identified gene is effectively uncountable.

Even the most clearly causal genes such as variants of BRCA1 and breast cancer have penetrance far less than 1.0 in recent data (here referring to lifetime risk; risk at earlier ages is very far from 1.0). The risk, though clearly serious, depends on cohort, environmental and other mainly unknown factors.  Nobody doubts the role of BRCA1 but it is not in itself causal.  For example, it appears to be a mutation repair gene, but if no (or not enough) cancer-related mutations arise in the breast cells in a woman carrying a high-risk BRCA1 allele, she will not get breast cancer as a result of that gene's malfunction.

There are many other examples of mapping that identified genes that even if strongly and truly associated with a test trait have very far from complete penetrance.  A mutation in HFE and hemochromatosis comes to mind: in studies of some Europeans, a particular mutation seemed always to be present, but if the gene itself were tested in a general data base, rather than just in affected people, it had little or no causal effect.  This seems to be the sort of thing the ExAC report is finding.

The generic reason is again that genes, essentially all genes, work only in their context. That context includes 'environment', which refers to all the other genes and cells in the body and the external or 'lifestyle' factors, and also age and sex as well.  There is no obvious way to identify, evaluate or measure the effects of all possibly relevant lifestyle effects, and since these change, retrospective evaluation has unknown bearing on future risk (the same can be said of genomic variants for the same reason).  How could these even be sampled adequately?

Likewise, volumes of long-existing experimental and highly focused results tell the same tale. Transgenic mice, for example, in which the same mutation is introduced into their 'same' gene as in humans, very often show little or no, or only strain-specific effects.  This is true in other experimental organisms. The lesson, and it's by far not a new or very recent one, is that genomic context is vitally important, that is, it is person-specific genomic backgrounds of a target gene that affect the latter's effect strength--and vice versa: that is, the same is true for each of these other genes. That is why to such an extent we have long noted the legerdemain being foist on the research and public communities by the advocates of Big Data statistical testing.  Certainly methodological errors are also a problem, as the Nature piece describes, but they aren't the only problem.

So if someone reports some cases of a trait that seem too often to involve a given gene, such as the Nature piece seems generally to be about, but searches of unaffected people also occasionally find the same mutations in such genes (especially when only exomes are considered), then we are told that this is a surprise.  It is, to be sure, important to know, but it is just as important to know that essentially the same information has long been available to us in many forms.  It is not a surprise--even if it doesn't tell us where to go in search of genetic, much less genomic, causation.

Sorry, though it's important knowledge, it's not 'radical' nor dependent on these data!
The idea being suggested is that (surprise, surprise!) we need much more data to make this point or to find these surprisingly harmless mutations.  That is simply a misleading assertion, or attempted justification, though it has become the intentional industry standard closing argument.

It is of course very possible that we're missing some aspects of the studies and interpretations that are being touted, but we don't think that changes the basic points being made here.  They're consistent with the new findings but show that for many very good reasons this is what we knew was generally the case, that 'Mendelian' traits were the exception that led to a century of genetic discovery but only because it focused attention on what was then doable (while, not widely recognized by human geneticists, in parallel, agricultural genetics of polygenic traits showed what was more typical).

But now, if things are being recognized as being contextual much more deeply than in Francis' Collins money-strategy-based Big Data dreams, or 'precision' promises, and our inferential (statistical) criteria are properly under siege, we'll repeat our oft-stated mantra: deeply different, reformed understanding is needed, and a turn to research investment focused on basic science rather than exhaustive surveys, and on those many traits whose causal basis really is strong enough that it doesn't really require this deeper knowledge.  In a sense, if you need massive data to find an effect, then that effect is usually very rare and/or very weak.

And by the way, the same must be true for normal traits, like stature, intelligence, and so on, for which we're besieged with genome-mapping assertions, and this must also apply to ideas about gene-specific responses to natural selection in evolution.  Responses to environment (diet etc.) manifestly have the same problem.  It is not just a strange finding of exome mapping studies for disease. Likewise, 'normal' study subjects now being asked for in huge numbers may get the target trait later on in their lives, except for traits basically present early in life.  One can't doubt that misattributing the cause of such traits is an important problem, but we need to think of better solutions that Big Big Data, because not confirming a gene doesn't help, or finding that 'the' gene is only 'the' gene in some genomic or environmental backgrounds is the proverbial and historically frustrating needle in the haystack search.  So the story's advocated huge samples of 'normals' (random individuals) cannot really address the causal issue definitively (except to show what we know, that there's a big problem to be solved).  Selected family data may--may--help identify a gene that really is causal, but even they have some of the same sorts of problems.  And may apply only to that family.

The ExAC study is focused on severe diseases, which is somewhat like Mendel's selective approach, because it is quite obvious that complex diseases are complex.  It is plausible that severe, especially early onset diseases are genetically tractable, but it is not obvious that ever more data will answer the challenge.  And, ironically, the ExAC study has removed just such diseases from their consideration! So they're intentionally showing what is well known, that we're in needle in haystacks territory, even when someone has reported big needles.

Finally, we have to add that these points have been made by various authors for many years, often based on principles that did not require mega-studies to show.  Put another way, we had reason to expect what we're seeing, and years of studies supported that expectation.  This doesn't even consider the deep problems about statistical inference that are being widely noted and the deeply entrenched nature of that approach's conceptual and even material invested interests (see this week's Aeon essay, e.g.).  It's time to change, but doing so would involve deeply revising how resources are used--of course one of our common themes here on the MT--and that is a matter almost entirely of political economy, not science.  That is, it's as much about feeding the science industry as it is about medicine and public health.  And that is why it's mainly about business as usual rather than real reform.

Friday, October 7, 2016

Science journals: Anything for a headline

Well, this week's sensational result is reported in the Oct 5 Nature in a paper about limits to the human lifespan. The unsensational nature of this paper shows yet again how Nature and the other 'science' journals will take any paper that they can use for a cheap headline.  This paper claims that the human life span cannot exceed 115 (though the cover picture in a commentary in the same issue is a woman-- mentioned in the paper itself--who lived to be substantially older than that!).  The Nature issue has all the exciting details of this novel finding, which of course have been trumpeted by the story-hungry 'news' media.

In essence the authors argue that maximum longevity on a population basis has been increasing only very slowly or not at all over recent decades.  It is, one might say, approaching an asymptote of strong determination. They suggest that there is, as a result of many complex contributing factors-of-decline, essentially a limit to how long we can live, at least as a natural species without all sorts of genetic engineering.  In that sense, dreams of hugely extended life, even as a maximum (that is, if not for everyone), are just that: dreams.

This analysis raises several important issues, but largely ignores others.  First, however, it is important to note that virtually nothing in this paper, except some more recent data, is novel in any way.  The same issues were discussed at very great length long ago, as I know from my own experience.  I was involved in various aspects of the demography and genetics of aging, as far back as the 1970s.  There was a very active research community looking at issues such as species-specific 'maximum lifespan potential', with causal or correlated factors ranging from the effects of basic metabolism, or body or brain size.  Here's a figure from 1978 that I used in a 1989 paper




There was experimental research on this including life-extension studies (e.g., dietary restriction) as well as comparison of data over time, much as (for its time) the new paper.  The idea that there was an effective limit to human lifespan (and likewise for any species) was completely standard at that time, and how much this could be changed by modern technologies and health care etc. was debated. In 1975, for example (and that was over 40 years ago!), Richard Cutler argued in PNAS that various factors constrained maximum lifespan in a species-related way.  The idea, and one I also wrote a lot about in the long-ago past, is that longevity is related to surviving the plethora of biological decay processes, including mutation, and that would lead to a statistical asymptote in lifespan.  That is, that lifespan was largely a statistical result rather than a deterministically specified value.  The mortality results related to lifespan were not about 'lifespan' causation per se, but were just the array of diseases (diabetes, cancer, heart disease, etc.) that arose as a result of the various decays that led to risk increasing with duration of exposure, wear and tear, and so on, and hence were correlated with age.  Survival to a given age was the probability of not succumbing to any of these causes by that age.

This paper of mine (mentioned above) was about the nature of arguments for a causally rather that statistically determined lifespan limit.  If that were so, then all the known diseases, like heart disease, diabetes, cancer, and so on, were irrelevant to our supposed built-in lifespan limit!  That makes no evolutionary sense, since evolution would not be able to work on such a limit (nobody's still reproducing anywhere near that old).  It would make no other kind of sense, either.  What would determine such a limit and how could it have evolved?  On the other hand, if diseases--the real causes that end individual lives--were, together, responsible for the distribution of lifespan lengths, then a statistical rather than deterministic end is what's real.  The new paper doesn't deal with these, but by arguing that there is some sort of asymptotic limit, it implicitly invokes some sort of causal, evolutionarily determined value, and that seems implausible.

Indeed, evolutionary biologists have long argued that evolution would produce 'negative pleiotropy', in which genomes would confer greater survival at young ages, even if the result was at the expense of greater mortality later on.  That way, the species' members could live to reproduce (at least, if they survived developmentally-related infant mortality), and they were dispensable at older ages so that there was no evolutionary pressure to live longer.   But that would leave old-age longevity to statistical decay processes, not some built-in limit.

Of course, with very large data sets and mortality a multicausal statistical process, rare outliers would be seen, so that more data meant longer maximum survival 'potential' (assuming everyone in a species somehow had that potential, clearly a fiction given genetic diseases and the like that affect individuals differently).  There were many problems with these views, and many have since tried to find single-cause lifespan-determining factors (like telomere decay, in our chromosomes), an active area of research (more on that below).  We still hunger for the Fountain of Youth--the single cause or cure that will immortalize us!

The point here is that the new paper is at most a capable but modest update of what was already known long ago.  It doesn't really address the more substantive issues, like those I mention above.  It is not a major finding, and its claims are also in a sense naive, since future improvements in health and lifestyles that we don't have now but that applied to our whole population could extend life expectancy--the average age at death--and hence the maximum to which anyone would survive. After all, when we had huge infectious disease loads, hardly anybody lived to 115, and in the old days of research, to which the authors seem oblivious, something like 90-100 was assumed to be our deadline.

The new paper has been criticized by a few investigators, as seen in reports in the news media coverage.  But the paper's authors probably are right that nothing foreseeable will make a truly huge change in maximum survival, nor will many survive to such an extended age.  Nor--importantly--does this mean that those who do luck out are actually very lucky: the last few years or decades of decrepitude may not be worth it to most who last to the purported limit. To think of this as more than a statistical result is a mistake.  Not everyone can live to any particular age, obviously.

The main fault in the paper in my view is the claim in essence to portray the result as a new finding, and the publication in a purportedly major journal, with the typical media ballyhoo suggesting that.

On the other hand....
On the other hand, investigators who were interviewed about this study (to give it 'balance'!) denigrated it, saying that novel medical or other (genetic?) interventions could make major changes in human longevity.  This has of course happened in the past century or two.  More medical intervention, antibiotics and vaccines and so on have greatly increased average lifespan and, in so doing in large populations, increased the maximum survival that we observe.  This latter is a statistical result of the probabilistic nature of degenerative processes like accumulating wear and tear or mutations, as I mentioned earlier.  There is no automatic reason that major changes in life-extending technologies are in the offing, but of course it can't be denied as a possibility either. Similarly, if, say, antibiotic resistance becomes so widespread that infectious diseases are once again a major cause of death in rich countries, our 'maximum lifespan' will start to look younger.

Those who argue against this paper's assertions of a limit must be viewed just as critically as they judged the new paper.  The US National Institute on Aging, among other agencies, spends quite a lot of your money on aging, including decades (I know because I had some of it) on lifespan determination.  If someone quoted as dissing the new 'finding' is heavily engaged in the funding from NIA and elsewhere, one must ask whether s/he is defending a funding trough: if it's hopeless to think we'll make major longevity differences, why not close down their labs and instead spend the funding on something that's actually useful for society?

There are still many curious aspects of lifespan distributions, such as why rodents have small bodies that should be less vulnerable per-year to cancer or telomere degradation etc. that relate to the number of at-risk cells, yet only live a few years.  Why hasn't evolution led us to be in prime health for decades longer than we are?  There are potential answers to such questions, but mechanisms are not well understood, and the whole concept of a fixed lifespan (rather than a statistical one) is poorly constructed.

Still, everything suggests that, without major new interventions that probably will, at best, be for the rich only, there are rough limits to how long anyone can statistically avoid the range of independent risk our various organ systems face, not to  mention surviving in a sea of decrepitude.

One thing that does seem to be getting rather old, is the relentless hyperbole of the media including pop-culture journals like Nature and Science, selling non-stories as revolutionary new findings.  If we want to make life better for everyone, not just researchers and journals, we could spend our resources more equitably on quality of life, and our research resources on devastating diseases that strike early in the lives we already are fortunate to have.

Thursday, September 22, 2016

Chain-ring genetics

If you're a bike rider, as I am, you know that there is a huge market out there trying to lure you into a really, really fancy bike.  Bike prices can easily get well into 4 digits, amazingly, and apparently there are enthusiasts who are willing to pay for them--maybe the thrill of the purchase is itself enough!
In a way, fancy bikes serve as an analogy for broader aspects of our society, as I'll try to illustrate.

I live in a pretty hilly area, and even though I just to bike-path or street-and-sidewalk riding, it's a pretty dramatic range of effort one needs in order to navigate the changing ups and downs.  And my bike, shown in this amateur photo, is a Trek Navigator hybrid, with a 3 x 7 gear cog setup: 3 chain rings in the front, and a 7-cog rear gear set.  That's 21 different gears, and I was happy to buy a bike with such a wide range of pedaling-efficiency options.



The next figure shows the gear ratio range schematically.  For each front chainring (Low, Middle, High), the corresponding line shows the relative gear ratio across the 7 rear cogs:So, in the extreme, if you go up a steep hill you want a front-1/rear-1 choice (the easiest combination, with more pedal rotations per rear wheel rotation, making each rotation easier even if you go slower), and downhill you'd want 3-7 for the opposite effects.
High, Mid, Low gear ratio range  for the 3-front, 7-rear cogs (schematic)

This plethora of gears was an attractive selling point when I bought this bike, which is a good one, but now that it's a few years old, I decided to shop around to see what's on offer these days.  I notice some  3 x 8, 3 x 9, and 2 x 9 front/back cog numbers.  The more expensive bikes tend to have more gears, though one had only 2 chainrings in front--and I wondered about that.  If the rear cogs had the right ratios, there is less weight in the front only having two chainrings, and the shifting will be easier and the shifting mechanism may act more quickly.  But the overall range was less, meaning it might not suit all riders as easily. In any case, there's a lot of techie glitter and salesmanship going on to get you to pony up the $K's for the fancier bikes.  They weigh a few pounds less, too, and so on, as the price goes up.

The bike-tech web sites basically warn you to avoid cross-chaining, which is to set the front chainring to a side of the cluster opposite to the cog set in the rear.  Instead, common advice says, shift to front-rear combinations for which the connecting chain is as close to parallel with the frame as possible.
But if you read a bit more carefully, you can see that some of the cross-chaining evidence, for modern bikes, is not very well established: you may not damage the chain, cog teeth, or be detectably less efficient, after all.  And some of the combinations--where the ranges in the above figure overlap--would never really be used.

So I wondered why one would not just stick with the middle front chainring all the time.  If you do that, the full range of rear cogs can be used without cross-chaining issues.  You don't get all of the bike's range, but you do get most of it.  What would the same ride feel like using only the middle chainring?

I've now tried that by taking my ride today without using the high or low chainring (stupidly, I never had tried that before!).  In going up the very steepest hill, I knew I could find it a tad easier to use the easiest front-rear combination.  Going downhill, I could muster up a bit more speed with the opposite extreme front-rear combination. But basically, the ride was the same.  It was also a bit simpler and involved a lot less coordinated shifting.

I decided I don't need a fancy new bike, after all!

So what does this have to do with genetics?
I was led to write this brief reflection when I thought about how many not really avid bikers have been led by cycle makers to get the most extensive, fanciest gearing (among other options), forking over very much more money, for very little gain, in the process.  Yes, performance is a bit better, but it doesn't really match up to the hype, especially not at the cost, unless you are a bike-racer or off-road biker, or have a yen for the latest-and-greatest and lots of 1%er money to invest in ego toys. The marginal gain per unit cost is minimal.

We're getting a lot of similar marketing for gearing up, so to speak, in our biomedical research and its application.  We're being told how marvelous having lots of chainrings and large rear cog-sets will yield miraculously better health than our old-fashioned ways have done so far.  It's called by flashy impressive or intimidating names like 'next-gen sequencing' or 'Big Data' or 'exome profiling', or 'precision genomic medicine', and that's the analog of Big Gearing (though a lot more costly).  Big Data is for the research community as carbon-fiber frames are for the bicycle industry.  Scientist and general public alike are suckers for slogans promising unbelievably more in the health-research industry from gearing up, much as we are for slogans promising unbelievably better biking.

The promotions are always shifting, so to speak, as the science rolls on.

But genetics can be important to our very lives!
The line we are fed by NIH and the research establishment always stresses the vital importance of our Big Data investment.  That is, after all, what 'precision genomic medicine' and wars on cancer and so on suggest they are promising.  It is true that under some circumstances, for some people, large-scale genomic database research may soon, or eventually, lead to more effective treatments of disease. There are already some examples, though how many really required massive genomewide association studies and the like is open to discussion.

As we've noted here many times, there are tons of more clearly genetic, or otherwise-caused, disorders for which the same monetary investment might yield much greater benefit. Most advances still, generally, seem to come from focused research on known, substantial causes.  Lifestyle changes, if our epidemiological data are worth their own huge cost, could much more massively reduce or defer common adult-onset diseases.  And there are a large number of clearly genetic diseases, pediatric and otherwise, for which the actual gene or genes are known.  They often strike at birth or in childhood, and are life-long debilitating,or life-shortening conditions.  They have, in my view, a much stronger and more legitimate claim on research resources.

Nobody wants a disease, genetic or otherwise, not even if it only strikes late in life.  But we should use the gears we have to get up those hills, rather than constantly being promised miracles if we only add another chainring, and then another, and then.....



Friday, August 26, 2016

Is life itself a simulation of life?

It often happens in science that our theory of some area of reality is very precise, but the reality is too complex to work out precisely, or analytically.  This can be when we decide to use computer simulation of that reality to get at least a close approximation to the truth.  When a phenomenon is determined by a precise process, then if we increase the complexity of our simulation, and if the simulation really is simulating the underlying reality, then the more computer power we apply, the closer we get to the truth--that is, our results approach that truth asymptotically.

For example, if you want to predict the rotation of galaxies in space relative to each other, and of the stars within the galaxies, the theories of physics will do the job, in principle. But solving the equations directly the way one does in algebra or calculus is not possible with so many variables.  However, you can use a computer to simulate the movement and get a very good approximation (we've discussed this here, among other places).  Thus, at each time interval, you take the position and motion of each object you want to follow, and those measures of nearby objects, and use Newton's law of gravity to predict the position of the objects one time interval later.

If the motion you simulate doesn't match what you can observe, you suspect you've got something wrong with the theory you are using. In the case of cosmology, one such factor is known as 'dark matter'.  That can be built into models of galactic motion, to get better predictions.  In this way, simulation can tell you something you didn't already know, and because the equations can't be directly solved, simulation is an approach of choice.

In many situations, even if you think that the underlying causal process is deterministic, measurements are imperfect, and you may need to add a random 'noise' factor to each iteration of your simulation.  Each simulation will be slightly 'off' because of this, but you run the same simulation thousands of times, so the effect of the noise evens out, and the average result represents what you are trying to model.

Is life a simulation of life?
Just like other processes that we attempt to simulate, life is a complex reality.  We try to explain it with the very general theory of evolution, and we use genetics to try to explain how complex traits evolve, but there are far too many variables to predict future directions and the like analytically.   This is more than just because of biological complexity however, in part because the fundamental processes of life seem, as far as we can tell, inherently probabilistic (not just a matter of measurement error).  This adds an additional twist that makes life itself seem to be a simulation of its underlying processes.

Life evolves by parents transmitting genes to offspring.  For those genes to be transmitted to the next generation, the offspring have to live long enough, must be able to acquire mates, and must be able to reproduce. Genes vary because mutations arise.  For simplicity's sake, let's say that successful mating requires not falling victim to natural selection before offspring are produced, and that that depends on an organism's traits, and that genes are causally responsible for those traits.  In reality, there are other process to be considered, but these will illustrate our point.

Mutation and surviving natural selection seem to be probabilistic processes.  If we want to simulate life, we have to specify the probability of a mutation along some simulated genome, and the probability that a bearer of the mutation survives and reproduces.  Populations contain thousands of individuals, genomes incur thousands of mutations each generation, and reproductive success involves those same individuals.  This is far too hard to write tractable equations for in most interesting situations, unless we make almost uselessly simplifying assumptions.  So we simulate these phenomena.

How, basically, do we do this?  Here, generically and simplified, but illustrating the issues, is the typical way (and the way taken by my own elaborate simulation program, called ForSim which is freely available):

For each individual in a simulated population, each generation, we draw a random number based on an assumed mutation rate, and add the resulting number and location of mutations to the genotype of the individual.  Then for each resulting simulated genotype, we draw a random number from the probability that such a genotype reproduces, and either remove or keep the individual depending on the result.  We keep doing this for thousands of generations, and see what happens.  As an example, the box lists some of the parameter values one specifies for a program like ForSim.



Sometimes, if the simulation is accurate enough, the probability and other values we assume look like what ecologists or geneticists believe is going on in their field site or laboratory.  In the case of humans, however, we have little such data, so we make a guess at what we think might have been the case during our evolution.  Often these things are empirically estimated one at a time, but their real values affect each other in  many ways.  This is, of course, very far from the situation in physics, described above!  Still, we at least have a computer-based way to approximate our idea of evolutionary and genetic processes.

We run this for many, usually many thousand generations, and see the trait and genomic causal pattern that results (we've blogged about some of these issues here, among other posts).  This is a simulation since it seems to follow the principles we think are responsible for evolution and genetic function.  However, there is a major difference.

Unlike simulations in astronomy, life really does seem to involve random draws for probabilistic processes.  In that sense, life looks like it is, itself, a simulation of these processes.  The random draws it makes are not just practical estimates of some underlying phenomenon, but manifestation of the actual probabilistic nature of the phenomenon.

This is important, because when we simulate a process, we know that its probabilistic component can lead to different results each time through.  And yet, life itself is a one-time run of those processes. In that sense, life is a simulation but we can only guess at the underlying causal values (like mutation and survival rates) from the single set of data: what actually happened its one time through.  Of course, we can test various examples, like looking at mutation rates in bacteria or in some samples of people, but these involve many problems and are at best general estimates from samples, often artificial or simplified samples.

But wait!  Is life a simulation after all?  If not, what is life?
I don't want us to be bogged down in pure semantics here, but I think the answer is that in a very profound way, life is not a simulation in the sense we're discussing.  For the relevant variables, life is not based on an underlying theoretical process in the usual sense, of whose parameters we use random numbers to approximate in simulations.

For example, we evaluate biological data in terms of 'the' mutation rate in genomes from parent to offspring.  But in fact, we know there is no such thing as 'the' mutation rate, one that applies to each nucleotide as it is replicated from one generation to the next, and from which each actual mutation is a random draw.  The observed rate of mutation at a given location in a given sample of a given species' genomes depends among other things on the sex, the particular nucleotides surrounding the site in question (and hence all sites along the DNA string), and the nature of the mutation-detection proteins coded by that individual's genome, and mutagen levels in the environment.  In our theory, and in our simulations, we assume an average rate, and that the variation from that average will, so to speak, 'average out' in our simulations.

But I think that is fundamentally wrong. In life, every condition today is a branch-point for the future. The functional implications of a mutation here and now, depend on the local circumstances, and that is built into the production of the future local generations.  Life in fact does not 'average' over the genome and over individuals does not in fact generate what life does, but in a sense the opposite.  Each event has its own local dynamics and contingencies, but the effect of those conditions affects the rates of events in the future.  Everywhere it's different, and we have no theory about how different, especially over evolutionary time.

Indeed, one might say that the most fundamental single characteristic of life is that the variation generated here today is screened here today and not anyplace else or any time else.  In that sense, each mutation is not drawn from the same distribution.  The underlying causal properties vary everywhere and all the time.  Sometimes the difference may be slight, but we can't count on that being true and, importantly, we have no way of knowing when and to what extent it's true.

The same applies to foxes and rabbits. Every time a fox chases a rabbit, the conditions (including the genotypes of the fox and rabbit) differ. The chance aspect of whether it's caught or not are not the same each time, the success 'rate' is not drawn from a single, fixed distribution.  In reality, each chase is unique.

After the fact, we can look back at net results, and it's all too tempting to think of what we see as a steady, deterministic process with a bit of random noise thrown in.  But that's not an accurate way to think, because we don't know how inaccurate it is, when each event is to some (un-prespecified) extent unique.  Overall, life is not, in fact, drawing from an underlying distribution.  It is ad hoc by its very nature and that's what makes life different from other physical phenomena.

Life, and we who partake of it, are unique. The fact of local, contingent uniqueness is an important reason that the study of life eludes much of what makes modern physical science work.  The latter's methods and concepts assume replicable law-like underlying regularity. That's the kind of thing we attempt to model, or simulate, by treating phenomena like mutation as if they are draws from some basic underlying causal distribution. But life's underlying regularity is its irregularity.

This means that one of the best ways we have of dealing with complex phenomena of life, simulating them by computer, smoothes over the very underlying process that we want to understand.  In that sense, strangely, life appears to be a simulation but is even more elusive than that.  To a great extent, except by some very broad generalities that are often too broad to be very useful, life isn't the way we simulate it, and doesn't even simulate itself in that way.

What would be a better approach to understanding life?  The next generation will have to discover that.