What an AI Trained Only on Fossils Taught Paleontologists About Bones They Had Already Classified

The naturalist who catalogues a specimen does not always see what the specimen contains — and it has taken a machine, unburdened by prior assumption, to demonstrate this with some embarrassment to the profession.

This is not a comfortable observation to make about paleontology, a discipline built on the painstaking accumulation of expert judgment across more than two centuries of fieldwork, excavation, and comparative anatomy. The fossil record is not merely a collection of bones; it is an interpreted archive, organized by human minds working within the theoretical frameworks available to them at the time of study. Those frameworks have been refined continuously, and the history of paleontology is, in part, a history of reclassification — of specimens moved between genera, species reassigned, entire lineages reorganized as new evidence and new methods have demanded. What is novel about the present moment is not that reclassification is occurring, but that the instrument of revision is a system that has never held a bone, never developed an intuition about fossil morphology, and carries no theoretical commitments whatsoever about what it expects to find.

The Problem of the Expert Eye

Darwin understood, from his own experience as a naturalist, that the trained observer is not a neutral instrument. The eye that has examined thousands of specimens develops expectations, and those expectations — efficient and largely reliable — nonetheless introduce a systematic bias: the observed tends to confirm what the observer already believes likely. When a paleontologist encounters a small, gracile fossil fish in a museum drawer, the prior classification on the label exerts a gravitational pull. Reclassification requires active effort against this pull; it requires, in effect, seeing the specimen fresh. Machines do not struggle with this. They have no prior classifications to overcome, no accumulated theoretical commitments, no professional investment in the existing taxonomy.

Researchers at the University of Bath and the Natural History Museum London have put this difference to systematic use. Deploying convolutional neural networks trained on morphometric data — the quantified geometry of fossil forms — they re-examined museum collections of fossil fish, with results that proved, in several instances, corrective of the existing record. Specimens that had been catalogued as distinct species were identified by the algorithm as juveniles of already-known species; the morphological differences that human experts had interpreted as taxonomically significant were reinterpreted by the network as developmental, the product of growth stage rather than phylogenetic separation. The species count, in these cases, was too high. Some of what taxonomists had taken for diversity was ontogeny.

What Morphometrics Measures That the Eye Misses

Geometric morphometrics — the field that provides the quantitative foundation for these analyses — proceeds by placing landmarks on biological forms and measuring the spatial relationships among them with a precision that manual description cannot approach. Where a Victorian naturalist would describe a fish skull as “deep,” “robust,” or “laterally compressed” — qualitative assessments that introduce the naturalist’s own perceptual standards as an unacknowledged variable — morphometric analysis produces a coordinate matrix: the exact position of each landmark in three-dimensional space, comparable across specimens and amenable to statistical treatment.

A convolutional neural network trained on such matrices learns the topology of morphological space without being told what features to attend to. It discovers, through the gradient of its own training loss, which configurations of landmarks are diagnostic of which categories — which shapes cluster together in the high-dimensional geometry of morphological variation, and which shapes occupy unexpected positions, falling into clusters that the existing classification had not anticipated. The network, in this respect, performs a kind of unsupervised archaeology on the museum collection: it finds structure that was always present in the data but invisible to classification schemes built around different assumptions.

The Neanderthal Problem

In 2022, a study published in PLOS ONE applied AI morphometric analysis to a different and more sensitive domain: the attribution of fossil bones to Neanderthals in museum collections. The results were, as the researchers carefully phrased it, suggestive of certain attribution errors in specimens that had been accepted elements of the Neanderthal fossil record for decades.

The sensitivity of this domain is not merely professional. The Neanderthal fossil record is thin — a few hundred specimens, distributed across dozens of sites spanning the breadth of Eurasia, covering a temporal range of several hundred thousand years. Every specimen carries disproportionate evidential weight. If an attributed specimen is in fact a modern human individual of unusual morphology, a Denisovan whose remains have been exposed to diagenetic alteration, or a representative of a lineage not yet recognized in the record, the error compounds through every study that has cited it. The bibliography of Neanderthal paleobiology is, in this sense, a network of dependencies; an error at the foundation propagates upward through the literature with the efficiency of any inherited trait.

What the AI analysis offers, in this context, is not a definitive verdict but a flag: a systematic, reproducible, and theoretically neutral signal that certain attributions warrant re-examination. This is, in itself, a contribution of the first order. The discipline has not previously had a tool capable of surveying entire museum collections and producing ranked lists of specimens whose morphometric position in taxonomic space is anomalous relative to their assigned classification. It has such a tool now.

The review here of The Origin of Humankind by Richard Leakey offers useful context on how human paleontology has historically managed the politics of reclassification — the field has never been entirely insulated from the personalities and theoretical commitments of its leading practitioners.

Chicago’s Digital Re-Examination

The Field Museum in Chicago — one of the largest natural history collections in the world, with approximately 40 million specimens in its holdings — has initiated an AI-assisted re-examination of legacy fossil collections that extends this logic to an institutional scale. The program is part of a broader digitization effort, but its implications for the taxonomy of the existing record are substantial. A collection of this size contains specimens accessioned over more than a century, classified under dozens of different theoretical frameworks by researchers whose methodological standards varied considerably. The AI passes over all of this without discrimination, treating a specimen catalogued in 1902 and a specimen catalogued last year with identical methodological consistency.

This consistency is, in itself, a kind of intellectual hygiene. Human science advances by paradigm shift — the old framework is replaced by a new one, and the new practitioners are trained in the new framework while the old specimens sit in their drawers, never systematically re-examined under the new assumptions. The AI has no framework to shift. It applies the same morphometric analysis to everything in the collection and reports what it finds.

What Darwin Would Have Made of This

Darwin spent years in taxonomic work — years, famously, on barnacles. He understood the discipline from the inside, understood its capacity for error and its resistance to revision, understood the social and intellectual inertia that attached to established classifications. He was, throughout this period, developing the theoretical framework that would eventually dissolve the fixity of species altogether, replacing the static Linnaean taxonomy with a dynamic picture in which every classification was provisional, every species boundary a snapshot of a process in motion.

The AI applications now being brought to bear on fossil collections are, in this sense, tools built for a Darwinian world: instruments calibrated for a reality in which the categories are not fixed, the specimens are not simple, and the previous classification is not the last word. They extend no particular theory; they carry no phylogenetic commitments. They observe, measure, and report. The conclusions — when the morphometric evidence is sufficiently anomalous — are the profession’s to draw.

For those interested in how machine learning is reshaping the life sciences more broadly, Understanding Emergent Properties in AI: When Machines Surprise Us examines the recurring pattern in which AI systems discover structures their designers did not anticipate.

You Might Also Like

Apes, Angels and Victorians by William Irvine: When Science Found Its Bulldog


Sources


Similar Posts