xiaogli – ABC Group

Exceptional Performance by Our Team at CASP16

February 12, 2025June 26, 2025xiaogli

The team lead by professors Dima Kozakov (Stony Brook University) and Sandor Vajda (Boston University) made remarkable progress in improving the prediction of protein multimers and protein-ligand complexes

Background of CASP

CASP (Critical Assessment of Structure Prediction) is a worldwide experiment for protein structure prediction taking place every two years since 1994. In 2018 CASP has been integrated with CAPRI (Critical Assessment of Predicted Interactions), a similar contest for predicting molecular interactions. In both events participants are expected to predict the structures of proteins or protein complexes that, at the time of the competition, are not yet released. Independent assessors compare the models to experimental structures, The Google-owned company DeepMind participated in CASP-14 in 2020 with their machine learning based protein structure prediction program AlphaFold-2. For many of the protein targets the program predicted structures that were indistinguishable from the structures determined by costly X-ray crystallography or nuclear magnetic resonance experiments. The AlphaFold-2 program, released in in 2021, has been adopted and now is widely used by the biomedical research community. Most participants of the CASP-15 competition in 2022 used modified versions of AlphaFold-2, achieving moderate improvements in the accuracy of the models. The leaders of the AlphaFoild-2 project, Dennis Hassabis and John Jumper, shared the 2024 Nobel prize in chemistry. During the last two years, Google DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI) have partnered to create AlphaFold DB to make the predictions freely available to the scientific community. The latest database release contains over 200 million entries, providing broad coverage of proteins and protein complexes. More recently, DeepMind and its subsidiary, Isomorphic Labs, released the program AlphaFold-3. The new program further increases the accuracy of predicting the structures of proteins and protein complexes, and also enables determining interactions between proteins and small druglike molecules that was not possible using AlphaFold-2.

The most recent CASP contest has been conducted during the summer months of 2024, and results were presented at the CASP16 conference held in Punta Cana, Dominican Republic, on December 1-4, 2024. Based on the evaluator’s report, the team lead by professors Dima Kozakov (Stony Brook University) and Sandor Vajda (Boston University) made remarkable progress in improving the prediction of protein multimers and protein-ligand complexes.

Protein Multimer

The group submitted models that exceeded the accuracy reached by other participants by a large margin. Figure 1 shows the performance of the participating CASP+CAPRI groups on CAPRI targets, with G274 being the identification number of the Kozakov/Vajda group. As shown, the group substantially outperformed all other teams, in spite of the fact that all groups had access to the latest versions of the AlphaFold-2 and the AlphaFold-3 programs, and the methods used by most groups incorporated these tools in some form.

**Figure 1.** Performance of the CASP+CAPRI groups in predicting the structures of multimeric CAPRI targets. The Kozakov/Vajda group is G274, represented by the first bar on the left.

We note that this increase in accuracy was possible because the targets of CASP-16 included several antibody-antigen complexes, and both AlphaFold-2 and AlphaFold-3 are known to perform relatively poorly when predicting the structures of such multimers. In fact, as shown in Figure 2 released by the CASP assessors, the Kozakov/Vajda team obtained very good results for such complexes. It may be interesting to note that their results are much better than the ones produced by AlphaFold-3, which already made improvements relative to the AlphaFold-2 models of antibody-antigen complexes. The results displayed in Figure 2 were presented by the assessors at the CASP-16 meeting. ClusPro is the prediction server of the Kozakov/Vajda group. They participated both as a human predictor group and as a server, which is allowed by the CASP/CAPRI rules, and both teams obtained very good results, with ClusPro being the second-best predictor.

**Figure 2.** Best and first model average ICS values by the CASP participating groups for the antibody-antigen targets.

Protein-Ligand

As with the Protein Multimer section, models submitted by our team attained the highest accuracy among all participants, as presented in Figure 3. This achievement stems from our efficient methods, which allowed for the rapid and economical exploration of target conformational space and the subsequent identification of the most favorable poses for submission.

Conclusion

The relatively high prediction accuracy by the Kozakov/Vajda team is due to the unique protocol developed by the two labs during the last three years. The central idea of the approach is integrating the physics of protein interactions and the geometry of the conformational space into the machine learning models. In the current machine learning models, the sampling schedule tends to be biased by the training data. Thus, when required to predict novel interactions not encountered in the training, sampling becomes essentially random, which makes the method very inefficient due to the vast conformational space. In contrast, the Kozakov/Vajda team employs an ML method that systematically samples regions of interest, allowing the identification of correct structures in a rational and efficient manner. This systematic sampling is enabled by fast Fourier transform (FFT)-based methods of evaluating the energies of docked structures. The approach is being implemented in the widely used ClusPro server for predicting the structure of macromolecular interactions, which currently has nearly 40,000 users. While the method demonstrated success at CASP in specific tasks, the core principle of combining machine learning with physics-based sampling is expected to enhance performance in a variety of applications, particularly when the available data are insufficient for effective training.

MHC-Fine: Enhancing AlphaFold for Precise MHC-Peptide Interaction Prediction

June 17, 2024June 17, 2024xiaogli

The precise prediction of major histocompatibility complex (MHC)-peptide complex structures is pivotal for understanding cellular immune responses and advancing vaccine design. In our latest study, published in Biophysical Journal, we have enhanced AlphaFold’s capabilities by fine-tuning it with a specialized dataset consisting exclusively of high-resolution class I MHC-peptide crystal structures.

AlphaFold, while broadly effective, lacked the granularity necessary for the high-precision demands of class I MHC-peptide interaction prediction. Our tailored approach addresses this by providing a more detailed and accurate model. A comparative analysis was conducted against the homology-modeling-based method Pandora, as well as the AlphaFold multimer model. Our fine-tuned model demonstrates superior performance, with a median root-mean-square deviation (RMSD) for Cα atoms in peptides of 0.66 Å and improved predicted local distance difference test scores.

Moreover, our additional comparisons with AlphaFold3 on new MHC-I structures from the Protein Data Bank (PDB) published after January 1, 2023, show that our model has 15% more samples under 1 Å deviation, highlighting its enhanced precision.

These advances have substantial implications for computational immunology, potentially accelerating the development of novel therapeutics and vaccines by providing a more precise computational lens through which to view MHC-peptide interactions.

ClusPro AbEMap Server: predicting antibody epitopes

May 15, 2023September 25, 2023xiaogli

We developed a novel approach for modeling antibodies in complex with their corresponding antigens, and incorporated it as an Advanced function of the ClusPro Server. The Antibody-Epitope Mapping (AbEMap) Server allows the user to predict antibody-antigen interactions with three types of inputs: (i) X-ray structures, (ii) computationally predicted structures, and (iii) simply amino acid sequences. The details of processing these three input types and differences in efficiencies are discussed in this publication in Nature Protocols.

High Accuracy Prediction of PROTAC complex structures

March 24, 2023September 25, 2023xiaogli

A novel method to aid in design of PROTACs was developed by our group and published in JACS!

PROTAC – PROteolysis TArgeting Chimera is a heterobifunctional drug-like molecule that hijacks the Ubiquitin-Proteasome System (UPS) in mammalian cells and catalytically drives the process of ubiquitination of our protein of interest. The ubiquitinated proteins then are recognized and degraded by the native proteasome system of the cell.

In this work, we present a computational modeling approach that drastically reduces the cost of novel PROTAC design, also considering that synthesizing PROTAC molecules is often a challenge. In our publication, we’ve shown that our method is successfully predicting the benchmark datasets based on calculated Weighted Sum Potentials, and is especially precise in deriving preferred linker lengths and linker attachment points.

A novel structural systems biology approach

February 8, 2023September 25, 2023xiaogli

In a collaboration with Boston University, we developed a new, faster approach in investigating the interactome using mass spectrometry and applied it to reveal and understand mechanisms that drive the malignant cell phenotype formation. This work resulted in two publications in Nature Communications.

In our first publication, we introduced a new multiplex Co-fractionation/Mass Spectrometry (mCF/MS) platform that is more technically efficient, cost-effective and faster than previously reported Co-fractionation/Mass Spectrometry (CF/MS) methods. The mCF/MS approach was applied to compare the global protein interactome of mammary epithelial cells to the Protein Interaction Network (PIN) of two breast cancer cell lines, where many multimolecular complexes that drive malignant cell formation were described and investigated.

In the second publication based on our work, we introduced PAMAF: a Parallelized multidimensional analytic framework that examines 12 modalities: protein abundance in whole-cells, nuclei, exosomes, secretomed and membranes; N-glycosylation, phosphorylation; metabolites; mRNA, miRNA; and, in parallel, single-cell transcriptomes. Using this method, we explored the key proteins in the process of Epithelial to Mesenchymal Transition.