Home Research COVID-19 Services Publications People Teaching Job Opening News Forum Lab Only
Online Services

I-TASSER QUARK LOMETS COACH COFACTOR MetaGO MUSTER CEthreader SEGMER FG-MD ModRefiner REMO DEMO SPRING COTH BSpred ANGLOR EDock BSP-SLIM SAXSTER FUpred ThreaDom ThreaDomEx EvoDesign GPCR-I-TASSER MAGELLAN BindProf BindProfX SSIPe ResQ IonCom STRUM DAMpred

TM-score TM-align MM-align RNA-align NW-align LS-align EDTSurf MVP MVP-Fit SPICKER HAAD PSSpred 3DRobot MR-REX I-TASSER-MR SVMSEQ NeBcon ResPRE WDL-RF ATPbind DockRMSD DeepMSA FASPR EM-Refiner

BioLiP E. coli GLASS GPCR-HGmod GPCR-RD GPCR-EXP Tara-3D TM-fold DECOYS POTENTIAL RW/RWplus EvoEF HPSF THE-DB ADDRESS Alpaca-Antibody CASP7 CASP8 CASP9 CASP10 CASP11 CASP12 CASP13 CASP14

[Back to DEMO2 homepage]


About DEMO2 Pipeline



What is DEMO2?

    DEMO2 is a new version of DEMO for multidomain protein structruce assembly, which integrates structural analogous templates with deep-learning predicted inter-domain spatial restraints.

How does DEMO2 assemble multidomain protein structures?

    When user submits the individual domain models, the server first identifies the global and local analogous templates by structurally threading the domains through a nonredundant multidomain structural library using TM-align. Meanwhile, the inter-domain restraints are predicted by a deep convolutional neural-network predictor DeepPotential.

    In the second step, L-BFGS simulation is used to assemble the domain structruces under the guidence of structurally analogous templates, the inter-domain spatial restraints predicted by DeepPotential, and the knowledge-based inter-domain potentials.

    In the last step, the model with lowest energy is selected for the linker reconstruction and further refined with fragment-guided molecule dynamics (FG-MD) simulations.


    Figure 1. Pipeline of DEMO2 for multidomain protein structruce assembly.

What are the performances of DEMO2 server compared with other methods?

    DEMO2 was tested on a benchmark set containing 356 multidomain proteins, including 275 cases consisting of continuous domains and 81 cases containing discontinuous domains. Here a discontinuous domain was defined as that containing 2 or more segments from separate regions of the query sequence. The individual domain models are generated by D-I-TASSER with all homologous templates with a sequence identity >30% to the query have been excluded. To clearly check the effect of domain assembly and rule out the negative impact from incorrect domain models, the data focus only on cases with all domain folds correctly predicted by I-TASSER with a TM-score >0.5. As shown in Fig. 2a, the average TM-score of the full-length model assembled by DEMO2 is 0.70, which is 11% and 30% higher than that by its processor DEMO (0.63) and AIDA (0.54).

    We further compared the full-length model assembled by DEMO2 using independently generated domain models by D-I-TASSER with the full-length models directly created by the trRosetta. The DEMO2 models have an average TM-score of 0.70 and the global fold is correct, with 83% cases with a TM-score >0.5. This compares favorably with the full-length models built directly by trRosetta which has an average TM-score of 0.64 but with only 70% cases with a TM-score >0.5 (Fig. 2b).

    CASP (or Critical Assessment of Techniques for Protein Structure Prediction) is a community-wide experiment for testing the state-of-the-art of protein structure predictions which takes place every two years since 1994. The experiment (often referred as a competition) is strictly blind because the structures of testing proteins are unknown to the predictors. We have used DEMO2 (as ‘Zhang-Server’) to assemble all multidomain targets in the latest CASP14. Fig. 2c shows the comparisons between DEMO2 and other top 4 servers for multidomain protein structure prediction in CASP14, in which we sorted the servers according to the average GDT-score of the full-length models for all multidomain proteins with ≥ 1 template-free modeling (FM) or template-free modeling/template-based modeling (FM/TBM) domain. As shown in the figure, the performance of DEMO2 on the full-length model of multidomain proteins is better than other servers.


    Figure 2. Performance of DEMO2 on the 356 benchmark proteins and CASP14 targets. (a) Comparion of DEMO2 with DEMO and AIDA on the performance of full-length models assembled using D-I-TASSER predicted domain models. (b) TM-scores of models assembled by DEMO2 vs. models directly generated by whole-chain trRosetta prediction. (c) Comparison between DEMO2 (Zhang-Server) with the other top 4 servers in CASP14 on the full-length multidomain models in terms of the global distance test (GDT) score, where the servers were sorted according to the GDT score of the full-length models for multidomain proteins with ≥ 1 FM or FM/TBM domain.

What are the input of the DEMO2 server?

   Mandatory:

  • At least 2 domain models in PDB format
    Users can click the button "Add domain" to add text boxes for input more domain models. The server currently can assemble up to 5 domains. Users can interactively assemble the model or download the standalone package to run DEMO2 locally if they have >5 domain models.
   Optional:
  • Full-chain sequence in the standard FASTA format
  • Email address for receiving the results
  • Name of the query protein
  • Templates in PDB format to guide the domain assembly
  • Selection for removing templates sharing >30% sequence identity with target
  • Experimental data including cross-linking and cryo-EM density map to guide the assembly
What are the output of the DEMO2 server?

    The output of the DEMO2 server include:
  • Up to five full-length atomic models (ranked based on the energy)
  • Estimated accuracy of the predicted models (including a confidence score of all models, and predicted TM-score and RMSD for the first model)
  • User provided domain models
  • Top 10 full-length templates for domain assembly
  • Predicted distance and inferface maps for domain assembly

    An illustrative example of the DEMO2 output can be seen from here.

How to interpret the output data generated by the DEMO2 server?

    The outputs of the DEMO2 results are generally summarized in a webpage, the link of which is sent to the users by email (if provide) after the assembly is completed (see an example of DEMO2 output). In the following, we present answers to several most frequently asked questions in interpreting the DEMO2 results.

    • What is the 'top 5 models assembled by DEMO2'?

      For each target, DEMO2 reports up to five full-length models ranked by the total energy. It is possible that the lower-rank models have a higher C-score. Although the first model has a higher C-score and a better quality in most cases, it is not unusual that the lower-rank models have a better quality than the higher-rank models.

    • What are the "top 10 full-length templates for domain assembly"?

      DEMO2 identifies the analogous full-length templates from a non-redundant multidomain protein library using TM-align structural alignments. All domain models are aligned to each template of the library by TM-align, and the harmonic mean TM-score of all domains is defined as the score (TplScore) of a template. The top 10 templates with the highest score are selected to generate the initial full-length model and deduce the inter-domain distance restraints to guide the domain assembly.

    • What is C-score?

      C-score is a confidence score for estimating the quality of predicted models by DEMO2. It is calculated based on the convergence parameters of the domain assembly simulations, the quality of the full-length templates for domain assembly, the satisfaction degree of the inter-domain distances, and the estimated accuracy of the individual domain model. C-score is typically in the range of [-5,2], where a C-score of higher value signifies a model with a high confidence and vice-versa.

    • What is TM-score?

      TM-score is a metric for measuring the structural similarity between two structures (see Zhang and Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710). The purpose of proposing TM-score is to solve the problem of RMSD which is sensitive to the local error. Because RMSD is an average distance of all residue pairs in two structures, a local error (e.g. a misorientation of the tail) will arise a big RMSD value although the global topology is correct. In TM-score, however, the small distance is weighted stronger than the big distance which makes the score insensitive to the local modeling error. A TM-score >0.5 indicates a model of correct topology and a TM-score <0.17 means a random similarity. These cutoff does not depends on the protein length.

      Here the 'Estimated TM-score' is an estimated value of TM-score over the correlation between TM-score and C-score which is observed by a nonredundant training set.

    • What are distance and interface maps?

      Distance map shows the the probability that inter-residue distances fall within 36 equal-width bins from [2, 20] Å, as well as two additional bins with distances <2 Å and >20 Å. The domain-domain interface map is extracted from the predicted distances by the summation of the cumulative probability of distances <18 Å. In the distance map, the first and second columns are the residue indexes which start from 1. Starting from the third column, the value is the probability that the distance located in the bin [0, 2], [2, 2.5], [2.5, 3],..., [20, ∞], respectively. Similar to the distance map, the first and second columns in the interface map are the residue indexes, and the third column is the probability of the distance <18 Å.

How to use known information (e.g. full-length templates, experimental data) to improve DEMO2 assembly?

    If users have some information or experimental data about the structure of the query protein, the information can be conveniently uploaded to the DEMO2 server. The information can significantly improve the quality of the assembly.

    The DEMO2 server currently accepts the following information:

    • Up to 20 full-length templates in PDB format
    • Experimental cross-linking data
    • Inter-domain contact/inteface restraints
    • Cryo-EM density map
How long does it take for DEMO2 to generate the final models for your protein?

    It usually takes 2-15 hours from submitting the domain models to receiving the assembly results. But if too many jobs are accumulated in the queue, the procedure may take a longer time. The time also depends on the protein size and a smaller protein takes shorter time than a larger protein.

How to cite DEMO2

    You are requested to cite following article when you use the DEMO2 server:

    • Xiaogen Zhou, Chunxiang Peng, Wei Zheng,Yang Li, Guijun Zhang, Yang Zhang. DEMO2: Multi-domain protein structures assembly by coupling quaternary structural alignment with deep-learning inter-domain restraint prediction, to be submitted.
    • Xiaogen Zhou, Jun Hu, Chengxin Zhang, Guijun Zhang, Yang Zhang. Assembling multidomain protein structures through analogous global structural alignments. Proceedings of the National Academy of Sciences, 116: 15930-15938 (2019).

Funding support

    The development of DEMO2 server is supported by the National Institute of General Medical Sciences (GM136422 and S10OD026825), the National Institute of Allergy and Infectious Diseases (AI134678), the National Science Foundation (IIS1901191 and DBI2030790). This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation (ACI1548562).

Contact information

    The DEMO2 server is in active development with the goal to provide the most accurate multidomain protein structure assembly. Please help us achieve the goal by sending your questions, feedback, and comments to yangzhanglab@umich.edu.

yangzhanglabumich.edu | (734) 647-1549 | 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218