“Discovering highly potent antimicrobial peptides with deep generative model HydrAMP”
Today, we share a research article led by Ewa Szczurek's team, published in Nature Communications. This study developed a deep generative model named HydrAMP, a conditional variational autoencoder (cVAE) that generates novel antimicrobial peptides (AMPs) with high antimicrobial activity by learning continuous low-dimensional representations of peptides. This work is the first to achieve the controllable generation of analogues of known active/inactive peptides, with the exceptional antibacterial effects of the generated peptides validated through wet-lab experiments. It provides a powerful computational design tool to address the global antimicrobial resistance crisis.
Fig.1. HydrAMP architecture and data traversal overview.
01 Research Background
Microbial infections, particularly those caused by drug-resistant bacteria, have become a major global health threat and are projected to be the leading cause of death by 2050. Antimicrobial peptides (AMPs) are promising alternatives to conventional antibiotics due to their lower propensity to induce resistance. However, traditional AMP discovery methods rely on trial-and-error modifications of known peptide sequences, a process that is tedious, time-consuming, and limited by existing sequence frameworks. Existing computational methods, such as classifiers, QSAR models, autoregressive models, and genetic algorithms, either cannot directly generate new sequences or struggle to produce truly diverse and highly effective AMPs due to the sparsity of high-dimensional discrete sequence spaces. Consequently, there is an urgent need for novel generative models that can operate in a continuous, low-dimensional space and are directly optimized for multiple tasks (e.g., analogue generation, de novogeneration).
02 Innovative Highlights
Multi-task optimized deep generative model: HydrAMP is the first cVAE model specifically optimized for the direct generation of peptide analogues (including from both active and inactive prototypes) and for unconstrained generation tasks. Its innovative training objectives (reconstruction, analogue, unconstrained) and regularization terms (e.g., Jacobian disentanglement regularization) ensure exceptional performance across multiple tasks.
Parametrically controllable "creativity": Introduces a temperature parameter (τ) that precisely controls the degree of divergence between generated analogues and the prototype peptide (τ=1 for conservative, τ=5 for exploratory), enabling flexible control over the generation process and balancing similarity with innovation.
End-to-end platform with high experimental validation rate: Provides not only the model code but also establishes an integrated pipeline incorporating auxiliary classifier consensus screening and molecular dynamics (MD) simulation pre-screening, accompanied by a user-friendly web service (https://hydramp.mimuw.edu.pl/). The experimental validation success rate is significantly higher than existing methods; for example, against E. coliATCC 25922, the validation rate for highly active peptides (MIC ≤ 32 μg/mL) reached 38%.
03 Results and Discussion
3.1 Model Architecture and Training Strategy
The core of the HydrAMP model is a conditional variational autoencoder (cVAE) that works by learning low-dimensional continuous latent representations (z) of peptide sequences. The conditions (c) include whether a peptide is an AMP (cAMP) and whether it has high activity (low MIC, cMIC). The model comprises an Encoder, a Decoder, and a pre-trained Classifier. Training is conducted through three modes: reconstruction mode (learning peptide structure), analogue generation mode (learning conditional generation based on a prototype), and unconstrained generation mode (learning conditional de novogeneration), ensuring the model's multifunctional capability (Fig.1).
3.2 Analogue Generation Performance Surpasses Existing Methods
In the analogue generation task, HydrAMP's performance significantly outperformed comparative models (PepCVAE, Basic, Joker). Whether generating active analogues starting from active peptides (positive prototypes) or inactive peptides (negative prototypes), HydrAMP performed best in terms of the proportion of prototypes for which it achieved the "baseline discovery" (generating a highly active AMP) and "improved discovery" (generating a peptide superior to the prototype) criteria. For instance, when generating from negative prototypes, HydrAMP (τ=5) found baseline discovery analogues for 399 prototypes, far exceeding Joker (108) and Basic (41). By adjusting τ, HydrAMP effectively controlled the sequence difference (Levenshtein distance) between generated analogues and prototypes (e.g., Pexiganan, CAMEL), achieving parametric control over creativity (Fig.2).

Fig. 2 | Analogue generation performance in terms of number of generated analogues of HydrAMP (red), in comparison to PepCVAE (dark blue), Basic (light blue), and Joker (dark green).
3.3 Unconstrained Generation Produces Peptides Matching Desired Conditions
In the unconstrained generation task, peptides generated by HydrAMP better conformed to the set antimicrobial (cAMP) and high-activity (cMIC) conditions. The median of the distributions for the Classifier-predicted antimicrobial probability (PMAMP) and low MIC probability (PMMIC) for its generated positive peptides were both close to 1, significantly outperforming PepCVAE, Basic, and other generative models like AMP-LM and Dean-VAE (Figs. 4a-d). Among 50,000 generated candidate peptides, HydrAMP produced the highest number of peptides simultaneously meeting the criteria of high PMAMP (>0.8) and high PMMIC (>0.5), nearly 44% more than AMP-GAN (Fig. 3).

Fig. 3. Unconstrained generative performance of HydrAMP (red), in comparison to PepCVAE (dark blue), Basic (light blue), AMP-LM (green), Dean-VAE (pink), Muller-LSTM (aquamarine), and AMP-GAN (orange) (methods indicated on the x-axis).
3.4 Generated Peptides Exhibit Desirable Physicochemical Properties
Analysis showed that analogues generated by HydrAMP achieved significant shifts in key physicochemical properties (e.g., isoelectric point, charge, hydrophobic ratio, aromaticity) from inactive prototypes towards the characteristics of active peptides, and could further enhance the properties of active prototypes – shifts that other compared models (e.g., PepCVAE, Basic) failed to achieve. This demonstrates HydrAMP's capability in capturing crucial structure-activity relationships of AMPs (Fig. 4).

Fig. 4. Physicochemical properties of analogues generated by HydrAMP and compared methods in analogue generation for non-AMP or AMP prototypes in comparison with real and random data.
3.5 Wet-Lab Validation Confirms Highly Active Novel AMPs
The study conducted two rounds of wet-lab validation. In the first round, without additional pre-screening, 7 analogues generated from prototypes Pexiganan and Temporin-A were tested, successfully validating 2 highly active peptides (e.g., Hydraganan-1, MIC = 2 μg/mL against E. coli), yielding a positive validation rate of 29% (Table 1). Molecular dynamics simulations indicated that the active peptide Hydraganan-1 maintained a more stable alpha-helical structure and penetrated deeper into the membrane core (higher S parameter), whereas the inactive analogue Pex-P1-4 did not.
Table 1. Pexiganan and Temporin-A analogues obtained in the analogue generation process without additional preselection.

The second round of experiments incorporated a pre-screening pipeline based on auxiliary classifier consensus and MD simulations, generating 24 candidate peptides targeting clinically relevant peptide prototypes (OP-145, Omiganan, Syphaxin) and the inactive peptide GQ20. Results showed that under stricter activity thresholds (e.g., MIC ≤ 8 μg/mL), HydrAMP's success rate (AMP success rate) was significantly higher than models like Joker and CLaSS (Fig. 5). Ultimately, several novel AMPs with high activity (MIC as low as 2-4 μg/mL) against multiple drug-resistant strains (including MRSA and Acinetobacter baumannii) and low hemolytic toxicity were obtained, such as Sophieganan and Ratigan generated from the inactive peptide GQ20. Notably, all 6 analogues of GQ20 were validated as active AMPs, demonstrating HydrAMP's powerful ability to create novel active peptides from an inactive starting point.

Fig. 5. AMP success rates depending on activity threshold.
04 Conclusion and Future Perspectives
This study successfully developed HydrAMP, a highly functional deep generative model for antimicrobial peptides. Its core advantages lie in: 1) achieving controllable, diverse, and efficient AMP generation through multi-task optimization and learning in a continuous latent space; 2) introducing parametric creativity control and innovative regularization methods; and 3) establishing a complete pipeline encompassing computational pre-screening and experimental validation, accompanied by an accessible web service. Experimental results demonstrate that HydrAMP can generate novel AMPs with high activity against various clinically relevant drug-resistant bacteria and low toxicity, achieving a validation rate significantly higher than existing methods. This work provides a powerful computational tool for accelerating the discovery of new antibiotics and represents a significant step forward in addressing the antimicrobial resistance crisis. Future work could further extend the model, for instance, by integrating toxicity optimization objectives, focusing on other bacterial targets (e.g., Gram-positive bacteria), and exploring more complex sequence-dependent modeling (e.g., incorporating attention mechanisms) to continuously enhance the drug development potential of the generated peptides.
Original Article:
Szymczak P, Możejko M, Grzegorzek T, Jurczak R, Bauer M, Neubauer D, Sikora K, Michalski M, Sroka J, Setny P, Kamysz W, Szczurek E. Discovering highly potent antimicrobial peptides with deep generative model HydrAMP. Nat Commun. 2023 Mar 15;14(1):1453.















