AI Unlocks the Code for Peptide Discovery
Peptides, the "small molecules of life" composed of amino acid chains, have emerged as a critical breakthrough in addressing intractable medical challenges—including metabolic diseases, cancer, and drug-resistant infections—thanks to their unique strengths: high activity, high target selectivity, low toxicity, and strong druggability. From the global market explosion of GLP1 peptide drugs to antimicrobial peptides combating the "superbug" crisis, peptide therapeutics have become the most promising growth segment in biomedicine.Yet peptide discovery was long constrained by a formidable "code lock": the nearinfinite theoretical sequence space and the inefficient trialanderror paradigm of traditional research made druggable peptide molecules like "looking for a needle in a haystack." Today, the rise of artificial intelligence (AI) acts as a precise master key, unlocking the core codes of peptide discovery across target analysis, sequence generation, structural optimization, and druggability validation. It drives a paradigm shift from "empirical blind trial" to "rational intelligent manufacturing," accelerating oncedistant novel peptide drugs from lab to clinic.
The Code Dilemma: Twofold Locked Challenges in Traditional Peptide Discovery
The therapeutic value of peptides resides in the arrangement of their amino acid sequences. A 30mer peptide composed of 20 natural amino acids yields 20³⁰ possible sequences—yet only a tiny fraction possess high activity, stability, and low toxicity. For decades, traditional peptide discovery has been trapped in two core dilemmas, like two unbreakable code locks:
First Lock: The Spatial Confinement of Sequence Exploration
Traditional discovery relies on natural peptide extraction, homologous sequence modification, or random library screening. It is confined to the "similarity neighborhood" of known peptides and cannot break free from natural sequence frameworks. Many evolutionarily distant, structurally novel, and functionally superior potential peptides remain undiscovered—especially those targeting "undruggable" targets. Many diseaserelated proteins lack stable structures, and traditional structurebased screening simply cannot reach them, severely narrowing the scope of exploration.
Second Lock: The Balancing Act of Druggability Optimization
Peptides are prone to protease degradation, short halflife, and poor membrane permeability. Traditional methods struggle to precisely balance activity, stability, pharmacokinetics, and safety. Many highly active peptides fail in development due to insufficient druggability. In particular, peptides with molecular weight above 2000 Da face persistent transdermal absorption challenges—a major barrier to innovative peptide formulation.
These two locks made peptide discovery arduous—until AI arrived to truly break the deadlock and open a new era.
The AI Master Key: Four Core Technologies Unlocking the Full Peptide Discovery Pipeline
Leveraging deep learning, protein language models, generative algorithms, and molecular dynamics simulations, AI treats peptide sequences as a "biological language" and translates structure–function relationships into computable mathematical models. Like a key precisely matching a code, AI overcomes traditional limitations and builds a fullchain intelligent platform covering target identification → sequence generation → structural optimization → druggability prediction → experimental validation, making peptide discovery efficient, accurate, and reproducible.
Key 1: Protein Language Models — Decoding Peptide "Biological Grammar"
Amino acid sequences follow a unique biological grammar, and protein language models (e.g., ProT5, ProteoGPT, Pep MLM) are the tools that master this grammar. Built on the Transformer architecture, these models learn coevolutionary rules and structure–function relationships from massive protein and peptide data. Without relying on homologous sequences, they accurately predict activity, stability, and toxicity—breaking the limits of traditional similaritybased screening and enabling discovery of distant, novel, highpotency peptides.
Unlike conventional structuredependent methods, models such as Pep MLM design peptide drugs directly from amino acid sequences, making targeting "undruggable" proteins feasible. Many cancer and neurodegenerative diseaserelated proteins lack stable folds, yet protein language models can generate peptides that bind and disrupt toxic assemblies. For example, Pep MLM successfully designed peptides targeting toxic proteins in cancer and Huntington’s disease in laboratory tests.The HMDAMP model developed by The Chinese University of Hong Kong and Shenzhen Institute of Advanced Technology uses protein language models to identify distant antimicrobial peptides from millions of sequences. These peptides show superior activity against drugresistant bacteria compared to clinical antibiotics, with low toxicity and low resistance propensity.
Key 2: Generative AI — Creating Ideal Peptide Sequences from Scratch
If protein language models decode existing codes, generative AI (diffusion models, GANs, reinforcement learning) creates entirely new ones. It revolutionizes peptide design by shifting from "screening existing sequences" to "generating customized sequences on demand." By setting constraints such as high activity, stability, low immunogenicity, and synthetic feasibility—and using activity and druggability as reward functions—it iteratively optimizes sequences to produce clinically desirable peptides, drastically improving efficiency.
In cyclic and macrocyclic peptide design, invariant diffusion models (e.g., CycleRFdiffusion) generate peptide backbones that fit target binding pockets with atomic precision, boosting cyclic peptide design efficiency by over 100fold and overcoming the poor stability of linear peptides.Westlake University’s TransSAFP model uses generative AI to design selfassembling antimicrobial peptides that kill drugresistant bacteria and eradicate biofilms without inducing resistance. ProteinQure’s platform, combining physical modeling and generative machine learning, developed PQ203—a peptide–drug conjugate targeting SORT1 in advanced solid tumors—with a binding affinity of 0.14 nM. It is now in Phase I clinical trials, becoming one of the first AIdesigned peptide conjugates to reach clinical development.
Key 3: HighPrecision Structure Prediction — Unlocking Peptide–Target Interaction Codes
Peptide activity depends on specific 3D conformations, and the binding mode between peptide and target is the core code of therapeutic efficacy. AI structure prediction tools including AlphaFold3, RoseTTAFold, and PEPFOLD3 achieve nearexperimental accuracy (RMSD < 1 Å) in predicting peptide and peptide–target complex structures. They visualize binding interfaces, key residues, and interaction mechanisms—guiding structural optimization with atomic precision and eliminating the time cost of traditional experimental structure determination.
In the PostAlphaFold era, structure prediction has become foundational to peptide drug discovery. MoleculeMind used AlphaFoldderived technology to resolve the scorpion toxin LqhαIT–sodium channel complex and applied its proprietary ComplexDDG algorithm to design 101 candidates in hours. One optimized peptide showed doubled efficacy and low mammalian toxicity, closing the full loop of "mechanism analysis → AI design → experimental validation."
In GLP1 optimization, Shanghai Jiao Tong University used AI structure prediction to engineer disulfide bonds and sidechain modifications, developing an ultralonglasting agonist with a halflife three times that of semaglutide—greatly improving patient compliance.
Key 4: MultiProperty Druggability Prediction — Avoiding "Code Errors" in DevelFurthermore, when combined with molecular dynamics simulations, AI enables in-depth analysis of the thermodynamic and kinetic characteristics of peptide–target binding, providing more precise guidance for druggability optimization and compensating for the limitations of pure AI methods in dissecting complex molecular mechanisms.
Peptide druggability involves pharmacokinetics, toxicology, and physicochemical properties. Traditional methods require extensive experimental testing and often lead to latestage failures: highly active peptides abandoned due to instability or toxicity. AI multiproperty prediction models simultaneously assess halflife, metabolic stability, cytotoxicity, immunogenicity, solubility, and membrane permeability—eliminating lowquality molecules early and significantly derisking development, like adding a builtin code validation step.
The peptide largemodel platform jointly built by Sunno Health and Huawei Cloud integrates the computing power of the Pangu model, enabling accurate peptide–target binding prediction (r > 0.6) and expanding virtual libraries to millions of entries. It reduces synthetic and experimental costs while improving preclinical candidate (PCC) success rates.
Notably, AI has also broken through longstanding barriers in transdermal, oral, and blood–brain barrier delivery. Kanion Pharmaceutical, using an AI transdermal prediction model, identified highly permeable peptide candidates and developed KYS2301 gel—overcoming the transdermal challenge for peptides > 2000 Da. It has become one of the world’s first AIdesigned peptide drugs to enter clinical trials.Furthermore, when combined with molecular dynamics simulations, AI enables in-depth analysis of the thermodynamic and kinetic characteristics of peptide–target binding, providing more precise guidance for druggability optimization and compensating for the limitations of pure AI methods in dissecting complex molecular mechanisms.
AI unlocking the code of peptide discovery is only the beginning. With continuous iteration of AI algorithms, deep integration of multimodal data, and synergistic application with physical modeling, peptide drug discovery will enter a new era of “on-demand design”, unlocking more undiscovered peptide codes and driving major breakthroughs in the biomedical industry.
In the future, multimodal large models will become a central focus, integrating peptide sequences, structures, functions, literature, and clinical data for joint training to enable end-to-end intelligent development from target identification to clinical candidate molecules, delivering more accurate predictions and more efficient generation. The widespread use of non-natural amino acids will be deeply integrated with AI, allowing models to incorporate over 2,000 non-natural building blocks to design “super peptides” with high stability, high potency, oral or transdermal deliverability, and break through the constraints of natural peptides. Cyclic and macrocyclic peptides will enter their golden age, with AI diffusion models enabling efficient design of diverse cyclic molecules that exhibit antibody-like affinity and superior druggability, making them a major focus in oncology and anti-infective research. AI-powered automated “lights-out factories” will gradually be deployed, enabling an unmanned, closed-loop workflow of “AI design → robotic synthesis → automated screening → AI re-optimization”. These systems can screen thousands of peptides daily and advance multiple pipelines monthly, further shortening development cycles and reducing R&D costs.
Meanwhile, AI-driven peptide development still faces emerging challenges, including the scarcity of high-quality labeled data and difficulties in accommodating modified amino acids and non-canonical cyclization. These issues must be addressed step by step through industry–academia–research collaborative innovation. Going forward, the synergistic fusion of AI and physical modeling will be key to overcoming these hurdles, further improving the accuracy and efficiency of peptide discovery.
Lijinjie, Email: lijinjie@dilunbio.com
















