The Information Problem in Evolution: Where Does New Genetic Information Come From?

Every living cell runs on information. Not metaphorical information — actual, digital, encoded instructions written in the four-letter chemical alphabet of DNA. The human genome alone contains roughly 3.2 billion nucleotide base pairs, organized into genes, regulatory sequences, and layers of control systems that molecular biologists are still mapping.

Nobody disputes this. The question is: where did all that information come from?

This is not a peripheral question for evolutionary theory. It may be the most fundamental one. Evolution requires not just change over time, but the addition of new, functional biological information — the kind that builds new organs, new body plans, new cellular machinery. And whether natural processes can generate that kind of information from scratch is a question that cuts right to the heart of the origins debate.

What Do We Mean by “Biological Information”?

Before diving into the debate, it helps to be precise about what “information” means in this context. The word gets used loosely in everyday language, but in molecular biology it refers to something quite specific.

DNA encodes biological information in sequences of nucleotides — adenine, thymine, guanine, and cytosine — arranged in specific orders that determine the structure and function of proteins. This is often compared to a language or a code, and the comparison is not superficial. Just as the meaning of an English sentence depends on the specific arrangement of letters, the function of a protein depends on the precise sequence of amino acids specified by the DNA that codes for it.

Claude Shannon, the father of information theory, defined information mathematically as a reduction in uncertainty. By that measure, a genome packed with functional sequences carries an enormous quantity of information. But Shannon’s measure only captures one dimension of the problem. A random string of letters has high Shannon information, but it doesn’t mean anything.

What biological systems display is something richer: specified complexity. The sequences aren’t just complex — they’re functional. They code for machines that work, regulatory networks that respond to signals, and developmental programs that build organisms from a single cell. This distinction matters enormously when asking whether undirected processes can produce the information life requires.

The Scale of the Problem

Consider what evolutionary theory needs information-wise. The transition from single-celled organisms to the complex animal body plans that appear in the Cambrian period required vast quantities of new genetic information — new genes, new regulatory networks, new protein families with no apparent precursors.

Stephen Meyer, a philosopher of science at the Discovery Institute, made this point extensively in his 2004 paper published in the Proceedings of the Biological Society of Washington. Meyer argued that no current materialistic theory of evolution can account for the origin of the information necessary to build novel animal forms. The paper generated significant controversy — partly because it appeared in a peer-reviewed journal affiliated with the Smithsonian Institution — but the core question it raised has not gone away.

The issue isn’t just about where one gene came from. It’s about the integrated systems that genes participate in. A single new protein might require hundreds of specifically arranged amino acids. But that protein also needs to fold correctly, interact with other molecules in the right way, and be expressed at the right time and place through regulatory sequences that themselves carry information. The whole system is interdependent.

Can Mutations and Natural Selection Create New Information?

The standard evolutionary answer is yes: random mutations provide the raw material, and natural selection preserves whatever works. Over long periods, this process accumulates beneficial changes and builds increasingly complex structures. It’s an elegant idea, and it clearly works for certain kinds of change — fine-tuning existing systems, adjusting enzyme efficiency, shifting coloration in a population.

But does it work for generating genuinely new functional information? That’s where the picture gets considerably more complicated.

Mutations are real, observable, and frequent. Most of them, however, are either neutral (no measurable effect) or harmful. Beneficial mutations — the kind that natural selection can actually work with — are relatively rare. And even when beneficial mutations occur, they tend to involve the modification or loss of existing information rather than the creation of something novel. Bacteria that develop antibiotic resistance, for instance, often do so by breaking an existing molecular pump or losing regulatory control of a gene — not by building a new system from scratch.

Cornell geneticist John Sanford has explored this problem in depth. In his book Genetic Entropy and in a computational study with Chase Nelson published in the volume Biological Information: New Perspectives (World Scientific, 2013), Sanford presented simulation results showing that even under selection, digital organisms experienced a net loss of genetic information over time. The accumulation of nearly neutral harmful mutations — too subtle for selection to efficiently remove — gradually degraded the genome. Sanford calls this process “genetic entropy” and argues it points in precisely the opposite direction from the information gains evolution requires.

This is a serious claim, and it has drawn serious pushback. Thomas Schneider at the National Institutes of Health, for example, published a computational model called “ev” that demonstrated information gain in simulated binding sites through mutation and selection. Schneider argued this completely answered the creationist objection. But critics of Schneider’s model note that it measures Shannon information in a pre-specified search space — the simulation knows what it’s looking for. Whether that captures what’s needed to explain the origin of genuinely novel biological structures is a different question.

Gene Duplication: A Way Forward?

One of the most commonly cited mechanisms for generating new genetic information is gene duplication. When a gene is accidentally copied during DNA replication, the organism ends up with two copies. One copy can continue performing its original function while the other is free to accumulate mutations and potentially develop a new function — a process called neofunctionalization.

This mechanism is real and well-documented. The globin gene family, which includes hemoglobin and myoglobin, appears to have arisen through a series of duplications from an ancestral gene. So gene duplication can clearly produce variation within existing gene families.

The question is whether duplication and divergence can explain the origin of entirely new types of genes — genes with no detectable homology to anything that came before. The genome is full of what are called “orphan genes” or “taxonomically restricted genes” — sequences found only in one lineage with no apparent evolutionary precursors. A 2009 study in Trends in Genetics noted that every newly sequenced genome reveals a significant fraction of genes with no known homologs. Where these genes came from remains, in the words of one group of researchers, “a major puzzle in evolutionary biology.”

Gene duplication doesn’t solve this puzzle. Duplicating and modifying an existing hemoglobin gene might give you a new variant of hemoglobin, but it doesn’t explain the origin of hemoglobin in the first place — let alone the origin of the circulatory system it operates within.

The Deeper Issue: Code and Language

Perhaps the most striking feature of biological information is that it operates through a code. The genetic code — the mapping between three-nucleotide codons and amino acids — is not determined by chemistry. There is no chemical reason why the codon GCU must specify the amino acid alanine. The assignment is arbitrary, in the same way that there’s no physical reason the letters C-A-T must refer to a small furry animal.

This is what philosophers call “semiotic” information — meaning that depends on convention, not physics. And the origin of semiotic systems is notoriously difficult to explain through purely physical processes. As origin-of-life researcher Bernd-Olaf Küppers has acknowledged, “the problem of the origin of life is clearly basically equivalent to the problem of the origin of biological information.”

The genetic code also includes an elaborate translation apparatus — ribosomes, transfer RNAs, aminoacyl-tRNA synthetases — all of which are themselves encoded in DNA. This creates a chicken-and-egg problem: the code requires the machinery to read it, but the machinery is specified by the code. How the system bootstrapped itself into existence without the other half already in place is one of the deepest open questions in biology.

What Mainstream Science Says

It’s important to represent the mainstream response fairly. Most evolutionary biologists do not consider the information problem a crisis for their field. They point to documented examples of gene duplication, horizontal gene transfer, exon shuffling, and de novo gene origination as mechanisms that can produce new genetic sequences. They argue that given hundreds of millions of years and large populations, these mechanisms are sufficient to account for the diversity of life.

Some researchers have also pushed back on the way creationists and intelligent design proponents frame the concept of “information.” They argue that Shannon information and “specified complexity” are being conflated, or that the probability calculations used to argue against natural information generation rely on unrealistic assumptions about the size of functional sequence space.

These are legitimate scientific arguments, and they deserve engagement rather than dismissal. The question is whether they fully account for what we observe — particularly the origin of hierarchically organized, interdependent systems that appear abruptly in the fossil record.

Challenges and Research Frontiers

Honest assessment of this topic requires acknowledging what creation science still needs to work on.

First, the concept of “information” in biology needs more rigorous definition within the creationist framework. Critics have a point when they say that terms like “specified complexity” can be slippery if not carefully defined and measured. Developing quantitative, testable information metrics that clearly distinguish created from evolved systems would significantly strengthen the case.

Second, creation scientists need positive models — not just critiques of evolutionary mechanisms. What processes did generate the information in genomes, and when? If the information was front-loaded at creation, what predictions does that make about genomic structure? Some researchers have begun exploring these questions, but much more work is needed.

Third, the relationship between information loss (genetic entropy) and the observed diversity within created kinds (baraminology) needs further development. If genomes are consistently degrading, how do we explain the remarkable diversification that has occurred within many animal and plant families since creation? Are there mechanisms that generate variation without requiring new specified information? These are active research questions in creation biology.

Finally, the mainstream mechanisms for information gain — gene duplication, horizontal transfer, de novo origination — need more detailed creationist analysis. Simply asserting these processes “don’t create new information” isn’t sufficient. The field needs rigorous, case-by-case studies that examine specific claimed examples and evaluate exactly what kind of information change occurred.

Where This Leaves Us

The information problem in evolution is not a gotcha argument. It’s a genuine scientific question about the adequacy of known natural mechanisms to explain one of life’s most remarkable features — the digital code at the heart of every living cell.

Both sides of this debate have work to do. Evolutionary biology needs to move beyond hand-waving about gene duplication and demonstrate, in specific detail, how undirected processes can generate the hierarchical, integrated information systems that characterize living organisms. And creation science needs to develop rigorous, quantitative models of biological information that make testable predictions.

What’s clear is that the question is far from settled. The origin of biological information remains one of the great unsolved puzzles in science — and it’s exactly the kind of question that deserves serious, well-funded research from every perspective.

Support Creation Research

Questions about the origin of biological information are precisely the kind of deep, foundational problems that creation science needs to tackle with rigorous, peer-reviewed research. But that research requires funding — for computational modeling, laboratory studies, and the careful analytical work that moves the conversation forward.

If you believe these questions matter, consider supporting the researchers working on them.

Support Creation Research →