Published in the peer-reviewed journal BMC Biology, a team of European researchers has fully sequenced the genome of a 1918 influenza A virus (H1N1) strain recovered from the preserved lung tissue of an 18-year-old man who died in Zurich, Switzerland, during the first wave of the pandemic. This is the first complete genome from a European first-wave case and provides a clearer timeline of how the virus adapted to its human host.
Led by Dr. Verena Schünemann, the study combines archaeogenetics, historical pathology, and modern virology to understand not just the virus’s genetic structure, but how it was already changing in the early days of one of the deadliest pandemics in history.
The Lung That Told a Century-Old Story
The lung tissue came from a young man who died on July 15, 1918, just weeks before the influenza pandemic escalated into its most devastating second wave. His preserved autopsy sample had been stored at the University of Zurich’s medical collection, largely untouched for over a century.
Because RNA degrades quickly over time, and the tissue had been stored in formalin, sequencing viral material from it was considered nearly impossible until now. Formalin, a preservative commonly used in pathology, causes crosslinking that damages RNA—making genome reconstruction a major technical challenge.
To bypass these limitations, Schünemann’s team developed a new ligation-based RNA sequencing protocol designed specifically for degraded or ancient viral RNA. This method, which avoids the use of toxic phenol-chloroform chemicals and retains information about RNA strand orientation, enabled the team to recover more than 98% of the viral genome at high quality.

Mutations That Changed the Course of a Pandemic
One of the study’s key findings is that several mutations linked to human adaptation were already present in the virus as early as July 1918.
The virus had:
- A mutation in the HA (hemagglutinin) gene—D222, which improves the virus’s ability to bind to α2–6-linked sialic acid receptors in the human upper respiratory tract. These receptors are critical for airborne transmission.
- Two important amino acid changes in the NP (nucleoprotein)—G16D and L283P, which help the virus evade the human antiviral protein MxA, a defense mechanism designed to block influenza viruses that originate in birds.
These mutations contradict previous assumptions that the virus only became more dangerous in the fall of 1918. “The virus already showed signs of human-specific adaptation well before the second wave hit,” the researchers write.
Rewriting the Timeline of the Spanish Flu
Until this study, most high-quality genome sequences from the 1918 flu pandemic came from autumn or winter 1918, mainly in North America. That led researchers to believe that the most dangerous viral changes happened later in the pandemic. But this genome—recovered from sample ZH1502, as it’s designated—pushes that timeline back.
Two earlier sequences from Berlin, collected in late June 1918, still showed “avian-like” markers (G16 and L283 in NP), while the Zurich genome collected just three weeks later already had the human-adapted versions. This suggests a rapid evolutionary shift was underway across Europe at that time.
Whether this represents co-circulating viral strains or a swift mutation sweep isn’t yet certain. But it underscores how fast and early the 1918 flu was adapting—something that has major implications for how we think about the early stages of pandemics.
1918 vs. 2009: How Fast Do Flu Viruses Diversify?
Using phylogenetic analysis, the team compared the 1918 genome diversity with that of the 2009 H1N1 pandemic—a more recent flu outbreak with a rich sequencing record.
The researchers found that 1918 strains displayed more genetic divergence and diversity in key segments like PB2, PA, and HA than their 2009 counterparts. These segments are central to how flu viruses replicate and spread in the body.


More specifically, PB2 had the highest number of amino acid-altering mutations, suggesting rapid viral adaptation during the early pandemic phase. In 2009, PB2 was among the more stable segments.
This higher diversity in 1918 may explain the virus’s sharp increase in pathogenicity between the first and second waves. The team notes that viral reassortment could also be playing a role, but current data are too limited to confirm.
The Technology That Made This Possible
To retrieve viable RNA from a 107-year-old lung stored in formalin, the team created a ligation-based extraction protocol that:
- Works without hazardous chemicals like phenol or chloroform.
- Recovers short and highly degraded RNA fragments.
- Retains strand orientation, allowing scientists to distinguish viral replication patterns.
- Minimizes data loss during rRNA depletion.
This protocol outperformed traditional random hexamer-based sequencing methods, especially in ancient or degraded samples, and could now unlock tens of thousands of medical samples stored in pathology museums around the world.
Could Other Pandemics Be Hiding in Formalin Jars?
The study opens the door for wider use of historical tissue archives to study other pathogens. The researchers emphasize that formalin-fixed medical collections—long thought to be scientifically unusable—might now offer genomic insights into diseases like tuberculosis, HIV, or even pre-antibiotic bacterial infections.
Larger collections such as Vienna’s Narrenturm (with over 35,000 wet specimens) or pathology institutes in Paris, London, and Berlin could hold forgotten viral genomes that help scientists trace how pathogens jumped species, evolved, or gained resistance to treatments.
This research also shows that cross-disciplinary work—combining medical history, molecular biology, and advanced sequencing—is essential to piece together the early steps of pandemics.