A language model for enhanced lasso peptide property prediction

 62
A language model for enhanced lasso peptide property prediction

A language model for enhanced lasso peptide property prediction

In the hunt for new therapeutics for cancer and infectious diseases, lasso peptides prove to be a catch. Their knot-like structures afford these molecules high stability and diverse biological activities, making them a promising avenue for new therapeutics. To better unleash their clinical potential, a research team developed LassoESM, a new large language model for predicting lasso peptide properties.

The collaborative study was recently published in Nature Communications.

Lasso peptides are natural products made by bacteria. To produce these peptides, bacteria use ribosomes to build chains of amino acids that are then folded by biosynthetic enzymes into a unique slip knot-like structure. Through this process, thousands of different lasso peptides are generated, many of which have demonstrated antibacterial, antiviral, and anticancer properties.

“There are striking opportunities to use lasso peptides in drug discovery, from targeting receptors to developing stable oral therapeutics,’ said a co-leader of the study. “By building a dedicated language model for these molecules, we’ve created a tool that helps us unlock these possibilities far more efficiently.”

Machine learning models have become essential tools for researchers, particularly for recognizing patterns in large data sets. This enables scientists to find new connections, while also saving months of time and effort. Protein prediction especially benefits from this technology, helping to uncover new insights into complex protein interactions and accelerate the discovery of new therapeutics. But commonly used AI platforms for protein prediction, such as AlphaFold, fall short when tasked with lasso peptides. 

“Because of the unique structure of the lasso peptide, none of the current AI programs actually work in terms of doing a structure prediction,” said a project co-leader. 

Similar to the large language models powering AI chatbots, protein language models are trained to learn and apply the language of proteins: their amino acid sequences, three-dimensional structures, and interactions with surrounding environments. But without lasso peptide specific training data, these algorithms lack specificity for these molecules.

“Predicting lasso peptide properties has been challenging due to the scarcity of experimentally labeled data and the complexity of enzyme–peptide substrate interactions,” said the first author. “We developed LassoESM, a lasso peptide-tailored protein language model, to capture peptide-specific features that are often missed by generic protein language models.”

The group first used bioinformatics methods to find thousands of lasso peptide sequences that different microorganisms produce. To improve the quality of the data, the team also manually validated any new lasso peptide sequences they discovered. 

“Then, we learned the language of those lasso peptides using masked language modeling, which is where you hide part of the peptide, and then you try to predict the other half,” the senior author said. “Once you have learned the language of how the lasso structure is formed in nature, then you can train efficient property prediction models based on these language model parameters.”

By combining the group’s machine learning knowledge with experimental data collected by the group, the team applied LassoESM for numerous useful prediction tasks. One area of focus is the identification of compatible lasso peptide and lasso cyclase pairs to expand the clinical potential of these molecules. Lasso cyclases are the enzymes responsible for the knot-forming step of lasso peptide biosynthesis. Like different locks require unique keys, different peptides require specific lasso cyclases to tie the characteristic knot.

“We built the models to predict which lasso cyclase could actually form a lasso peptide using only the sequence of amino acids in a peptide. If we can understand the substrate scope or we can engineer lasso cyclases, then we can potentially make any peptide into a lasso,’ the author said. Without LassoESM, these enzyme-substrate interactions are difficult to predict, highlighting the utility of this artificial intelligence tool.

The author said, “We demonstrated that LassoESM enables accurate prediction of various lasso peptide properties, even with limited training data. This work provides a powerful AI-driven tool to accelerate the rational design of functional lasso peptides for biomedical and industrial applications.”

Moving forward, the team aims also aims to expand their model to accommodate new prediction capabilities, such as building tailor-made language models for other peptide natural products and engineering lasso peptides to target specific proteins.

https://www.nature.com/articles/s41467-025-63412-3

https://sciencemission.com/LassoESM