Exceptional Performance by Our Team at CASP16

The team lead by professors Dima Kozakov (Stony Brook University) and Sandor Vajda (Boston University) made remarkable progress in improving the prediction of protein multimers and protein-ligand complexes

Background of CASP

CASP (Critical Assessment of Structure Prediction) is a worldwide experiment for protein structure prediction taking place every two years since 1994. In 2018 CASP has been integrated with CAPRI (Critical Assessment of Predicted Interactions), a similar contest for predicting molecular interactions. In both events participants are expected to predict the structures of proteins or protein complexes that, at the time of the competition, are not yet released. Independent assessors compare the models to experimental structures, The Google-owned company DeepMind participated in CASP-14 in 2020 with their machine learning based protein structure prediction program AlphaFold-2. For many of the protein targets the program predicted structures that were indistinguishable from the structures determined by costly X-ray crystallography or nuclear magnetic resonance experiments. The AlphaFold-2 program, released in in 2021, has been adopted and now is widely used by the biomedical research community. Most participants of the CASP-15 competition in 2022 used modified versions of AlphaFold-2, achieving moderate improvements in the accuracy of the models. The leaders of the AlphaFoild-2 project, Dennis Hassabis and John Jumper, shared the 2024 Nobel prize in chemistry. During the last two years, Google DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI) have partnered to create AlphaFold DB to make the predictions freely available to the scientific community. The latest database release contains over 200 million entries, providing broad coverage of proteins and protein complexes. More recently, DeepMind and its subsidiary, Isomorphic Labs, released the program AlphaFold-3. The new program further increases the accuracy of predicting the structures of proteins and protein complexes, and also enables determining interactions between proteins and small druglike molecules that was not possible using AlphaFold-2.

The most recent CASP contest has been conducted during the summer months of 2024, and results were presented at the CASP16 conference held in Punta Cana, Dominican Republic, on December 1-4, 2024. Based on the evaluator’s report, the team lead by professors Dima Kozakov (Stony Brook University) and Sandor Vajda (Boston University) made remarkable progress in improving the prediction of protein multimers and protein-ligand complexes.

Protein Multimer

The group submitted models that exceeded the accuracy reached by other participants by a large margin. Figure 1 shows the performance of the participating CASP+CAPRI groups on CAPRI targets, with G274 being the identification number of the Kozakov/Vajda group. As shown, the group substantially outperformed all other teams, in spite of the fact that all groups had access to the latest versions of the AlphaFold-2 and the AlphaFold-3 programs, and the methods used by most groups incorporated these tools in some form.

Figure 1. Performance of the CASP+CAPRI groups in predicting the structures of multimeric CAPRI targets. The Kozakov/Vajda group is G274, represented by the first bar on the left.

We note that this increase in accuracy was possible because the targets of CASP-16 included several antibody-antigen complexes, and both AlphaFold-2 and AlphaFold-3 are known to perform relatively poorly when predicting the structures of such multimers. In fact, as shown in Figure 2 released by the CASP assessors, the Kozakov/Vajda team obtained very good results for such complexes. It may be interesting to note that their results are much better than the ones produced by AlphaFold-3, which already made improvements relative to the AlphaFold-2 models of antibody-antigen complexes. The results displayed in Figure 2 were presented by the assessors at the CASP-16 meeting. ClusPro is the prediction server of the Kozakov/Vajda group. They participated both as a human predictor group and as a server, which is allowed by the CASP/CAPRI rules, and both teams obtained very good results, with ClusPro being the second-best predictor.

Figure 2. Best and first model average ICS values by the CASP participating groups for the antibody-antigen targets.
Protein-Ligand

As with the Protein Multimer section, models submitted by our team attained the highest accuracy among all participants, as presented in Figure 3. This achievement stems from our efficient methods, which allowed for the rapid and economical exploration of target conformational space and the subsequent identification of the most favorable poses for submission.

Figure 3. Performance of the CASP16 groups in predicting the structures of protein-ligand complexes. ClusPro and kozakovvajda groups submitted 233 structures and are represented by the first and second bar on the left, respectively.
Conclusion

The relatively high prediction accuracy by the Kozakov/Vajda team is due to the unique protocol developed by the two labs during the last three years. The central idea of the approach is integrating the physics of protein interactions and the geometry of the conformational space into the machine learning models.  In the current machine learning models, the sampling schedule tends to be biased by the training data. Thus, when required to predict novel interactions not encountered in the training, sampling becomes essentially random, which makes the method very inefficient due to the vast conformational space. In contrast, the Kozakov/Vajda team employs an ML method that systematically samples regions of interest, allowing the identification of correct structures in a rational and efficient manner. This systematic sampling is enabled by fast Fourier transform (FFT)-based methods of evaluating the energies of docked structures. The approach is being implemented in the widely used ClusPro server for predicting the structure of macromolecular interactions, which currently has nearly 40,000 users. While the method demonstrated success at CASP in specific tasks, the core principle of combining machine learning with physics-based sampling is expected to enhance performance in a variety of applications, particularly when the available data are insufficient for effective training.

MHC-Fine: Enhancing AlphaFold for Precise MHC-Peptide Interaction Prediction

The precise prediction of major histocompatibility complex (MHC)-peptide complex structures is pivotal for understanding cellular immune responses and advancing vaccine design. In our latest study, published in Biophysical Journal, we have enhanced AlphaFold’s capabilities by fine-tuning it with a specialized dataset consisting exclusively of high-resolution class I MHC-peptide crystal structures.

AlphaFold, while broadly effective, lacked the granularity necessary for the high-precision demands of class I MHC-peptide interaction prediction. Our tailored approach addresses this by providing a more detailed and accurate model. A comparative analysis was conducted against the homology-modeling-based method Pandora, as well as the AlphaFold multimer model. Our fine-tuned model demonstrates superior performance, with a median root-mean-square deviation (RMSD) for Cα atoms in peptides of 0.66 Å and improved predicted local distance difference test scores.

Moreover, our additional comparisons with AlphaFold3 on new MHC-I structures from the Protein Data Bank (PDB) published after January 1, 2023, show that our model has 15% more samples under 1 Å deviation, highlighting its enhanced precision.

These advances have substantial implications for computational immunology, potentially accelerating the development of novel therapeutics and vaccines by providing a more precise computational lens through which to view MHC-peptide interactions.

ClusPro AbEMap Server: predicting antibody epitopes

We developed a novel approach for modeling antibodies in complex with their corresponding antigens, and incorporated it as an Advanced function of the ClusPro Server. The Antibody-Epitope Mapping (AbEMap) Server allows the user to predict antibody-antigen interactions with three types of inputs: (i) X-ray structures, (ii) computationally predicted structures, and (iii) simply amino acid sequences. The details of processing these three input types and differences in efficiencies are discussed in this publication in Nature Protocols.

High Accuracy Prediction of PROTAC complex structures

A novel method to aid in design of PROTACs was developed by our group and published in JACS!

PROTAC – PROteolysis TArgeting Chimera is a heterobifunctional drug-like molecule that hijacks the Ubiquitin-Proteasome System (UPS) in mammalian cells and catalytically drives the process of ubiquitination of our protein of interest. The ubiquitinated proteins then are recognized and degraded by the native proteasome system of the cell.

In this work, we present a computational modeling approach that drastically reduces the cost of novel PROTAC design, also considering that synthesizing PROTAC molecules is often a challenge. In our publication, we’ve shown that our method is successfully predicting the benchmark datasets based on calculated Weighted Sum Potentials, and is especially precise in deriving preferred linker lengths and linker attachment points.

A novel structural systems biology approach

In a collaboration with Boston University, we developed a new, faster approach in investigating the interactome using mass spectrometry and applied it to reveal and understand mechanisms that drive the malignant cell phenotype formation. This work resulted in two publications in Nature Communications.

In our first publication, we introduced a new multiplex Co-fractionation/Mass Spectrometry (mCF/MS) platform that is more technically efficient, cost-effective and faster than previously reported Co-fractionation/Mass Spectrometry (CF/MS) methods. The mCF/MS approach was applied to compare the global protein interactome of mammary epithelial cells to the Protein Interaction Network (PIN) of two breast cancer cell lines, where many multimolecular complexes that drive malignant cell formation were described and investigated.

In the second publication based on our work, we introduced PAMAF: a Parallelized multidimensional analytic framework that examines 12 modalities: protein abundance in whole-cells, nuclei, exosomes, secretomed and membranes; N-glycosylation, phosphorylation; metabolites; mRNA, miRNA; and, in parallel, single-cell transcriptomes. Using this method, we explored the key proteins in the process of Epithelial to Mesenchymal Transition.

SARS-CoV-2 paper published

Dr. Kozakov and Dr. Padhorny in collaboration with researchers from Boston University and Boston National Emerging Infectious Diseases Laboratories (NEIDL), have analyzed the difference in phosphorylation patterns between SARS-COV2 virus-infected and healthy alveolar lung cell (AT2). This survey revealed 4,688 differential phosphosites mapping to 1,166 unique proteins, which were clustered into distinct clusters based on temporal enrichment, associated with protein domains and cellular processes linked to infection, such as viral messenger RNA synthesis and export of viral ribonucleoproteins, as an immediate response to SARS- CoV-2 entry. Our group has performed in silico structural modeling of experimentally observed viral-host protein-protein interactions, using award-winning computational tools developed in our lab. That enabled us to structurally characterize the interactions that were detected in MS experiments and independently corroborate experimental observations. Our modeling identified several key types of proteins that dominated these interactions, including the kinases of GSK3, MAPK, and CK1 families, and a number of other targets. We hypothesized that modulating those targets might have antiviral effect. We have identified and modeled interactions of several clinically safe compounds from BROAD database targeting those families using our award wining LigTBM molecular modeling approach. Six of the selected compounds have efficiently inhibited viral replication by more than 90% in the AT2 lung cells. The paper has appeared in Molecular Cell.

COVID-19 efforts

During the current crisis, it is everyone’s responsibility to make their best efforts. Our group is now targeting the research towards the search of new compounds targeting SARS-CoV-2 proteins. We tightly collaborate with experimental and computational groups on these projects. Besides doing our part in helping with the current epidemic, we hope that the methods and tools developed will allow faster response to future viral threats. More information is available on a dedicated page: https://abcgroup.cluspro.org/research/covid-19/

Our team has been awarded IACS SEED grant

On April 28th, the results of the IACS Seed Grant competition were announced. The joint project of our group and Dr. Rezaul Chowdhury, “Speeding up flexible peptide-protein docking using convolutional neural networks,” was chosen as one of the two winners. We’re grateful for this support and are happy that the grant committee shares our belief in the idea of applying state-of-the-art deep learning methodology to the notorious for its complexity important scientific and medical problem of structure prediction of protein-peptide complexes.

ClusPro ranks first in the latest CAPRI evaluation round

CAPRI (Critical Assessment of Predicted Interactions) experiment is a community-wide effort dedicated to evaluating the current state of methods for prediction of protein complex structure.

The evaluation of results for the last three years, was recently published in Proteins.

Automated protein docking server ClusPro developed by our group was ranked first in the server category for all targets . The summary of the results is shown below. For each predictor group, the table shows the number of acceptable or better predictions, and among those the number of high quality models, indicated by three stars, as well as the number of medium quality solutions, indicated by two stars.

Server rankings
ServerTop 5
Predictions
ClusPro10/6**
HDOCK8/1***/5**
HADDOCK8/2***/2**
LZERD8/1***/4**
MDOCKPP9/1***/3**
GalaxyPPDock6/4**
Swarmdock6/1***/1
PYDOCKWEB3/1**

In addition our human group was the top performer in protein-protein docking category . The results for the 10 best-performing groups are provided below for comparison.

Human predictor rankings
GroupPredictions
Kozakov/Vajda6/1***/6**
Venclovas5/2***/3**
Seok5/1***/4**
Pierce5/2***/2**
Andreani/Guerois5/1***/3**
Zou4/1***/3**
Zacharias5/1***/2**
Kihara5/1***/2**
Gray5/1***/2**
Shen4/1***/2**
Our LigTBM ligand docking approach is top performer in D3R Grand Challenge 4 Blind docking competition

Our LigTBM ligand docking approach is top performer in D3R Grand Challenge 4 Blind docking competition

D3R (Drug Design Data Resource) Grand Challenge is a blinded prediction challenge for the computational chemistry community, with components addressing pose-prediction, affinity ranking, and free energy calculations. Its fourth installment, D3R GC4, was held from October to December 2018.

Our group participated in the pose prediction challenge for the macrocyclic inhibitors for Beta secretase 1 (BACE1). This protein is involved in the generation of beta-amyloid peptides and presents an important target for developing drugs for Alzheimer’s disease. In the stage 1a of the challenge, the organizers presented participants with the apo-structure of the receptor and SMILES strings describing 20 ligands.

According to the official rankings, the template-based method developed in our group scored best, out of 74 total submitted entries, in terms of the Mean RMSD. Our group achieved sub-angstrom mean and median RMSD for this challenge.

Our method relies on finding structures of distant homologs of the target protein that bind similar ligands, and using them at all stages of the protocol: initial pose generation, structure refinement, and the final scoring. More details are available in our paper, published in the Journal of Computer-Aided Molecular Design.

We refined and automated the approach used in this challenge. It is now available for free academic and non-commercial use as a user-friendly web-server, ClusPro LigTBM. This version of the protocol is described in our Journal of Molecular Biology paper.