Discovery of antimicrobial peptides in the global microbiome with machine learning
Today, we are sharing a significant research article published in the journal Cell. This study employed a machine learning-based approach to predict nearly one million novel Antimicrobial Peptides (AMPs) from the global microbiome, encompassing 63,410 metagenomes and 87,920 high-quality prokaryotic genomes. The research established a comprehensive, open-access resource named AMPSphere, containing 863,498 non-redundant candidate AMPs (c_AMPs). The team synthesized 100 predicted peptides for experimental validation, finding that 79 exhibited antibacterial activity in vitro, and 63 effectively targeted ESKAPEE pathogens. Further investigations using structural biology, biochemistry, and murine infection models confirmed that these peptides function by disrupting bacterial membranes. Notably, some lead compounds demonstrated in vivoanti-infective efficacy comparable to clinical antibiotics. This work not only reveals the vast diversity of AMPs within the microbiome, providing a novel molecular library to combat antibiotic resistance, but also powerfully demonstrates the enormous potential of artificial intelligence in accelerating antimicrobial drug discovery.

01 Research Background
Antibiotic resistance has become a global health crisis, directly contributing to approximately 1.27 million deaths in 2019. The problem is particularly severe for Gram-negative bacteria. Antimicrobial Peptides (AMPs), a class of short peptides widely present in nature, are considered strong candidates for next-generation antibiotics due to their unique membrane-targeting mechanisms and low propensity for inducing resistance. However, traditional AMP discovery methods, such as culture-based screening or rational design, are inefficient. Known AMPs are primarily derived from model organisms, representing only a small fraction of microbial diversity. Although recent studies have utilized metagenomics to mine AMPs from specific niches like the human gut, their scope remains limited. Crucially, because small open reading frames (smORFs) are often overlooked in genome annotation, the vast majority of AMP resources within microbiomes remain untapped. This study aims to systematically explore the hidden treasure trove of AMPs in the global microbiome (covering environmental, plant, animal, and human-associated habitats) by integrating machine learning with large-scale metagenomic data, to discover structurally novel and highly active antimicrobial lead compounds.
02 Innovative Highlights
Methodological Innovation: First large-scale application of machine learning for AMP prediction across the global microbiome.
The research team employed their self-developed machine learning tool, Macrel (a random forest-based algorithm emphasizing prediction accuracy over recall), to efficiently screen over 4.5 billion predicted smORFs. Unlike previous studies focused on specific habitats (e.g., human gut), this analysis encompassed samples from 72 different habitats (including soil, ocean, human body), vastly expanding the scope of AMP discovery.
Resource Innovation: Construction of the most comprehensive AMP resource library to date: AMPSphere.
AMPSphere contains 863,000 non-redundant c_AMPs, with a striking 91.5% of sequences having no homologs in existing databases (e.g., DRAMP, APD), revealing immense novelty. The resource not only provides peptide sequences but also integrates information on their source habitat, genomic context, physicochemical properties (e.g., net charge, hydrophobicity), and predicted secondary structures. It is openly accessible to global researchers via a user-friendly website .
Validation Strategy Innovation: High-throughput synthesis and multi-dimensional functional validation.
To assess prediction reliability, the team synthesized 100 representative c_AMPs (including 50 high-confidence predictions and 50 randomly selected ones) and conducted systematic in vitro, ex vivo, and in vivoexperiments. Validation results showed that a remarkable 79% of predicted peptides possessed antibacterial activity, significantly higher than random discovery probability, demonstrating the high accuracy of the machine learning model.
Mechanistic Insight Innovation: Revealing a new association between AMP density and bacterial transmissibility.
By analyzing c_AMP density (ρAMP) within microbial species, the researchers discovered an intriguing phenomenon: in human and oral microbiomes, bacterial species with lower ρAMP were more easily transmitted between mother and infant. This suggests that AMP production may influence microbial colonization resistance and population competition, offering new insights for microbial ecology.
03 Results and Discussion
3.1 Construction of AMPSphere and Basic Characteristics of c_AMPs
Through smORF prediction and Macrel screening of massive metagenomic and genomic datasets, the study ultimately obtained 863,498 non-redundant c_AMPs. These c_AMPs have an average length of 37 amino acids and exhibit typical AMP physicochemical features: positively charged (average +4.7), high isoelectric point (pI ~10.9), and amphipathicity. Sequence cluster analysis revealed that nearly half of the c_AMPs are "singletons," and most are accessory genomic elements in microbial populations, indicating high habitat specificity and strain specificity.
3.2 Evolutionary Origin and Genomic Context of c_AMPs
Analysis indicated that approximately 7% of c_AMPs are homologous to full-length proteins in the GMGCv1 database, with 27% sharing start codons, suggesting some c_AMPs may originate from premature termination or gene truncation. Genomic synteny analysis found that c_AMP genes are frequently flanked by conserved functional units like ribosomal protein genes, supporting their generation and retention as independent functional units during evolution. Compared to known encrypted peptides (EPs), c_AMPs in AMPSphere show distinct differences in amino acid composition and structural preferences, indicating they represent a unique family of AMPs.

Figure 1. The genome context of c_AMPs shows a preference for neighborhoods containing ribosome assembly proteins.
3.3 Experimental Validation: Highly Active In VitroAntibacterial Effects
Testing of 100 synthetic peptides showed that 79 c_AMPs effectively inhibited bacterial growth in vitro. Among these, 63 were active against ESKAPEE pathogens (e.g., Acinetobacter baumannii, drug-resistant Staphylococcus aureus), with minimum inhibitory concentrations (MIC) as low as 1-4 µM, comparable to known highly effective AMPs. Importantly, these active peptides showed no toxicity to human cells (e.g., HEK293) and were non-hemolytic, demonstrating good selectivity.

Figure 2. Amino acid composition, structure, antimicrobial activity, and mechanism of action of c_AMPs.
3.4 Mechanism of Action: Membrane Targeting and Structural Properties
Mechanistic studies showed that active c_AMPs primarily function by disrupting bacterial membrane structure. Membrane permeabilization assays (NPN uptake) and membrane depolarization assays (DiSC3-[5] fluorescence) revealed that these peptides increase outer membrane permeability and cause cytoplasmic membrane depolarization rapidly. Circular dichroism spectroscopy further indicated that many active peptides tend to form α-helical or β-sheet structures in membrane-mimetic environments, consistent with their amphipathic characteristics, facilitating interaction with membrane lipids.
3.5 In VivoEfficacy Validation: Protective Effects in a Murine Infection Model
In a mouse skin abscess infection model, ten lead c_AMPs were selected for testing. Results showed that several peptides (e.g., lachnospirin-1, enterococcin-1) significantly reduced bacterial load at the infection site (by 3-4 log units) with a single dose, achieving effects comparable to polymyxin B. No weight loss or significant toxicity was observed in mice during the experiment, demonstrating their in vivoanti-infective efficacy and safety.

Figure 3. Anti-infective activity of AMPs in preclinical animal model.
04 Conclusion and Future Perspectives
This study successfully utilized a machine learning approach to mine a near-million-scale antimicrobial peptide resource library, AMPSphere, from the global microbiome, and confirmed its immense development potential through rigorous experimental validation. This work not only provides a "digital gold mine" freely available to researchers worldwide, injecting new vitality into antimicrobial drug development, but, more importantly, establishes a new paradigm of "data-driven natural product discovery." Looking ahead, this research can be further advanced in several directions: first, functional screening and structural optimization of other high-potential c_AMPs within AMPSphere; second, employing synthetic biology to achieve efficient expression in heterologous hosts, addressing large-scale production challenges; finally, in-depth exploration of the ecological functions of AMPs in microbial population dynamics and host-microbe interactions. In conclusion, this study powerfully demonstrates the immense value of combining artificial intelligence and metagenomics in addressing the global challenge of antibiotic resistance, providing a key to unlocking the door to next-generation antimicrobial therapies.
Original Article:
Santos-Júnior CD, Torres MDT, Duan Y, Rodríguez Del Río Á, Schmidt TSB, Chong H, Fullam A, Kuhn M, Zhu C, Houseman A, Somborski J, Vines A, Zhao XM, Bork P, Huerta-Cepas J, de la Fuente-Nunez C, Coelho LP. Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell. 2024 Jul 11;187(14):3761-3778.e16.
https://doi.org/10.1016/j.cell.2024.05.013















