INSTALLATION AND IMPLEMENTATION OF D-I-TASSER SUITE
           (Copyright 2025 by University of Michigan, All rights reserved)
                    (Version 3.00, 2025/01/01)

1. What is D-I-TASSER Suite?
   
   The D-I-TASSER Suite is a composite package of programs for protein
   structure prediction and function annotations. The Suite includes the following programs:

   a) D-I-TASSER: A hierarchical program for protein structure prediction
   b) DeepMSA/DeepMSA2: A program for multiple sequence alignmnet generation
   c) MUSTER: A threading program for protein template identification
   d) CEthreader: A contact-based threading program for protein template identification
   e) LOMETS3: A meta-server approach consisting of multiple threading programs
   f) AttentionPotential: An attention network based deep-learning algorithm for residue-residue contact/distance prediction
   g) DeepPotential: A residual convolutional network based deep-learning alforithm for residue-residue contact/distance/hydrogen bond prediction
   h) ResTriplet,TripletRes,ResPre,ResPLM and DeepPLM: Deep-learning-based programs for residue-residue contact prediction
   i) DeepFold: A protein ab initio structure prediction program based on AttentionPotential or DeepPotential predicted restraints
   j) PotentialFold: A protein ab initio structure prediction program based on AttentionPotential or DeepPotential predicted restraints
   k) SPICKER: A clustering program for structure decoy selection
   l) HAAD: Quickly adding hydrogen atoms to protein heavy atom structure
   m) EDTSurf: Construct triangulated surfaces of protein molecules
   n) ModRefiner: Construct and refine atomic model from C-alpha traces
   o) NWalign: Protein sequence alignments by Needleman-Wunsch algorithm
   p) PSSpred: A program for Protein Secondary Structure PREDiction
   q) ResQ: An algorithm to estimate B-factor and residue-level error of models
   r) COACH: A function annotation program based on COFACTOR, TM-SITE and S-SITE
   s) COFACTOR: A program for ligand-binding site, EC number & GO term prediction
   t) TM-SITE: A structure-based approach for ligand-binding site prediction
   u) S-SITE: A sequence-based approach for ligand-binding site prediction
   v) AlphaFold2: A third-party protein structure prediction software developed by DeepMind used in D-I-TASSER-AF2 pipeline
   w) Modeller: A third-party homology modeling program for constructing protein 3D structures based on alignments with known structures.
   x) FUpred: A contact map-based domain prediction method utilizing recursion to detect domain boundaries from predicted contact maps and secondary structure information.
   y) DEMO: A composite package of programs for multi-domain protein structure assembly.
   z) ThreaDomEx: A unified package combining ThreaDom and DomEx for accurate protein domain boundary prediction, including discontinuous domains.

2. How to install the D-I-TASSER Suite?

   a) download the D-I-TASSER Suite 'D-I-TASSER-3.0.tar.bz2' from
      https://zhanggroup.org/D-I-TASSER/download_standalone.html
      and unpack 'D-I-TASSER-3.0.tar.bz2 by
      > tar -xvf D-I-TASSER-3.0.tar.bz2
      The root path of this package is called $pkgdir, e.g. 
      /home/yourname/D-I-TASSER-3.0. You should have all the programs under this 
      directory. You can install the package at any location on your linux computer.

   b) Install the environment required by D-I-TASSER Suite
      Go to your D-I-TASSER Suite package directory, $pkgdir. For example, 
      /home/yourname/D-I-TASSER-3.0. 
      A script 'Install_DIT_env.sh' is provided in the package for automated
      environment installation.
      D-I-TASSER suite requires Modeller for automatic domain partition and assembly.
      First, go to Sali Lab (https://salilab.org/modeller/registration.html) to register 
      and get the Modeller key. 
      run command './Install_DIT_env.sh', and input the Modeller key you get, it will create 
      a sub-folder 'DIT_anaconda3' in your D-I-TASSER Suite $pkgdir.
   
   c) Download D-I-TASSER and COACH library files from
      https://zhanggroup.org/D-I-TASSER/download_standalone.html 
      or using the following 'download_lib.py' (recommanded) download the libraries.
      Go to your D-I-TASSER Suite package directory, $pkgdir. For example, 
      /home/yourname/D-I-TASSER-3.0. 
      A script 'download_lib.py' is provided in the package for automated
      library download and update of the libraries.
      We recommend putting the library files under the path /home/$yourname/ITLIB.
      run command './DIT_anaconda3/envs/DIT/bin/python download_lib.py -h' for the help.
            
      Usage:
      ./DIT_anaconda3/envs/DIT/bin/python ./download_lib.py -libdir /home/$yourname/ITLIB -P true -B true -N true -MSA DeepMSA2-IMG -ITmode DIT-AF2

      When you provide -libdir parameter, 'download_lib.py' will automatic change the databaserootpath (databasesrootpath = os.path.join(pkgdir, 'ITLIB')) 
      variable in program/DeepMSA2/config.py. If you later change your ITLIB to other path, please change this databasesrootpath in program/DeepMSA2/config.py again.

      -libdir template_library_directory (full path for saving the library files, such as /home/zhang/ITLIB)
      ====================
      Optional arguments:
      ====================
      -P       [true or false], whether to download PDB template files (default: true)
               You can set it as false if you already have modeled structure and just need function predictions
      -B       [true or false], whether to download BioLiP function library files (default: true)
               You can set it as false if you do not need function predictions
      -N       [true or false], whether to download the non-redundant sequence database nr (default: true)
               You can set it as false if you want to use your own nr database. (In this case, you will have to go to
               the D-I-TASSER library directory, and make a soft link with the command: ln -s location_of_your_nr nr)
      -MSA     [DeepMSA2 or DeepMSA2-IMG], whatever MSA pipeline you will use in D-I-TASSER (default: DeepMSA2-IMG)
               DeepMSA2  uniclust30, uniref, metaclust, mgnify and bfd databases, it will take around additional 2TB harddisk.
               DeepMSA2-IMG requires additional IMG/JGI databases, it will take around additional 1~2TB harddisk.
      -ITmode  [IT, CIT, DIT or DIT-AF2], what protein structure prediction pipeline will be used, IT means I-TASSER,
               CIT means C-I-TASSER, DIT means D-I-TASSER, and DIT-AF2 means D-I-TASSER using AlphaFold2 distances (default is DIT-AF2).
               IT, CIT and DIT use the same library, and DIT-AF2 requires additional libraries for AlphaFold2 with around 200GB harddisk.

   d) Third-party software installation:

      While the majority of programs in the package 'D-I-TASSER-3.0.tar.bz2' are
      developed in the Zhang Lab herein the permission of use is released,
      there are some programs and databases (including alphafold2, blast, nr, GOparser, Modeller,
      uniclust30, uniref90, bfd, mgnify and metaclust) which were developed by third-party groups. 
      A default version of alphafold2 (modified by our group), blast, Modeller and nr are included in the package. 
      It is user's obligation to obtain license permission and key (For Modeller) from the developers for all the third-party software 
      before using them. In addition, your system needs to have Java installed.


   e) Updates:

      (i)   We include new MSA construction pipeline DeepMSA2 (DeepMSA2-IMG) in the version 3.0
      
      (ii)  A new protein folding pipeline D-I-TASSER-AF2 has been included in version 3.0.
      
            D-I-TASSER-AF2 pipeline is designed by combining D-I-TASSER with AlphaFold2 through two aspects: 
            (1) the top AlphaFold2 models, which are ranked by the default quality assessment ranking pipeline 
            included in AlphaFold2 pipeline, are added to D-I-TASSER as additional templates, together with 220 
            templates generated by the 11 component servers of LOMETS3, where each server generates 20 top templates 
            that are sorted by their Z-scores for each threading algorithm. The top 10 templates are finally selected 
            from the 240 templates based on the scoring function. 
            (2) AlphaFold2-predicted contact and distance maps are combined with the DeepPotential 
            and AttentionPotential-predicted contact and distance maps, and final contacts and distances are selected from them using scoring functions, respectively. 

      (iii) A new multi-domain handling module, based on FUpred, ThreaDom and DEMO2, is newly added to 
            do domain partition and assembly for the multi-domain proteins.

      (iv)  Serveral bugs in MUSTER running, DeepMSA running, and LOMETS3 has be fixed based on version 3.0

3. Bug report:

   Please report and post bugs and suggestions at D-I-TASSER message board: 
   https://zhanggroup.org/forum


   #######################################################
   #                                                     #
   #  4. Installation and implementation of D-I-TASSER   #
   #                                                     #
   #######################################################
   
4.1. Introduction of D-I-TASSER
   
   D-I-TASSER (Deep learning-based Iterative Threading ASSEmbly Refinement) is a new method 
   extended from I-TASSER for high-accuracy protein structure and function predictions. 
   Starting from a query sequence, D-I-TASSER first creates the multiple sequence alignment (MSA)
   by DeepMSA2 that iteratively search the genomics and metagenomics sequence
   databases, then generates inter-residue distance/contact/hydrogen-bond maps 
   using multiple deep neural-network predictors, including AttentionPotential, DeepPotential, ResTriplet, ResPLM, DeepPLM, ResPRE, TripletRes, and AlphaFold2 (optional). 
   It then identifies structural templates from the PDB by multiple threading approach LOMETS3, 
   with full-length atomic models assembled by contact/distance/hydrogen-bond maps guided replica-exchange Monte Carlo simulations.  
   A new multi-domain handling module, based on FUpred, ThreaDom and DEMO2, is newly added to 
   do domain partition and assembly for the multi-domain proteins.
   The large-scale benchmark tests showed that D-I-TASSER generates significantly more 
   accurate models than I-TASSER and AlphaFold2, especially for the sequences that do not have homologous templates in the PDB.

   For function annotation, the D-I-TASSER structure model is matched through 
   the function library (BioLiP) to identify functional template. The biological 
   insights (including ligand-binding, enzyme classification, and gene ontology) 
   are inferred from the functional templates by COACH based on the consensus
   of predictions from COFACTOR, TM-SITE and S-SITE.

4.2. How to run D-I-TASSER?
   
   a) Main script for running D-I-TASSER is $pkgdir/I-TASSERmod/runI-TASSER.pl. 
      Run it directly without arguments will output the help information.

   b) The following arguments must be set (mandatory arguments). One example is: 

      "$pkgdir/I-TASSERmod/runI-TASSER.pl -libdir /home/yourname/ITLIB -seqname example -datadir /home/yourname/D-I-TASSER-3.0/example"

      -libdir  means the path of the template libraries
      -seqname means the unique name of your query sequence
      -datadir means the directory which contains your sequence 

   c) Other arguments are optional whose default values have been set.
      User can reset one or more of them. One example of command line is: 

    ==================
    Notice:
    ==================
    The default D-I-TASSER pipeline is set as "DIT" and "DeepMSA2" (without IMG search), which only run DeepPotential and AttentionPotential 
    to predict the distance without IMG/JGI metagenome. It is less accurate than "DIT-AF2" pipeline with "DeepMSA2-IMG", but run faster and 
    save resource. If you want to get more accurate model results, please change the -itmode flag as "DIT-AF2" and -msapipe flag as "DeepMSA2-IMG".
    see details in Optional arguments section.

    ==================
    Optional arguments:
    ==================
    -runstyle      default value is "serial" which means running D-I-TASSER simulation sequentially.
                   "localparallel" means running D-I-TASSER contact, threading, DeepMSA2, and simulations in parallel, distributed on multiple cores of one computer, using build in JobManager
                   "slurm" means running D-I-TASSER contact, threading, DeepMSA2, domain-level modeling, and simulations in parallel, distributed on multiple cores of one computer, distributed
                   on cluster nodes, using slurm job scheduling system.

                   For "localparallel" and "slurm" modes, you can modify the default account and CPU number in the following section:
                   ./I-TASSERmod/JobManager.py, class JobConfig.
                   The default CPU number, memory, and number of models for AlphaFold2 can be found under:
                   ./program/DeepMSA2/config.py

    -homoflag      [real, benchmark],"real" will use all templates, "benchmark" will exclude homologous templates    
    -idcut         sequence identity cutoff for "benchmark" runs, default value is 0.3, range is in [0,1]    
    -ntemp         number of top templates output for each threading program, default is 20, range is in [1,50]    
    -nmodel        number of final models output by D-I-TASSER, default value is 5, range is in [1,10]
    -LBS           [true or false], whether to predict ligand-binding site (default: false)
    -EC            [true or false], whether to predict EC number (default: false)
    -GO            [true or false], whether to predict GO terms (default: false)
    -traj          true or false, (default: true) deposit the trajectory files
    -light         true or false, (default: false) this option runs quick simulations
    -hours         specify maximum hours of simulations (default=5 when -light=true)
    -outdir        where the final results should be saved (default value is set to data_dir)
    -itmode        what kind of simulation is used, 
                   "IT" for I-TASSER, 
                   "CIT" for C-I-TASSER, 
                   "DIT" for D-I-TASSER (default), 
                   "DIT-AF2" for D-I-TASSER-AF2 (more accurate, but slow, recommand use this option if you have resource).
    -msapipe       what kind of MSA pipeline will be used, 
                   "DeepMSA2" for DeepMSA2 pipeline without IMG database searching (default), 
                   "DeepMSA2-IMG" for DeepMSA2 pipeline with IMG database searching (require downloading or building IMG/JGI database and very long time running by single CPU).
    -Nmsa          How many sequences will be used in MSA for MSA transformer and attention [1-1024], default=128
    -msasele       Methods for selecting best MSA for modeling,
                   "deeppotential" for selecting MSA based on DeepPotential score (default),
                   "alphafold2" for selecting MSA based on AlphaFold2 plDDT score (will automatically been set if -itmode DIT-AF2 and only use for DIT-AF2 pipeline),
    -af2_version   AlphaFold2 version, 
                   "20" for AlphaFold2.0,
                   "21" for AlphaFold2.1,
                   "22" for AlphaFold2.2,
                   "23" for AlphaFold2.3 (default),
                   We recommend using version "23" for the D-I-TASSER-AF2 pipeline. 
                   Versions lower than "23" only support CUDA 11.3.1 for GPU usage. 
                   If your GPU requires a CUDA version higher than 11.3.1, please set use_gpu to false.
    -multi_domain  true or false, (default: false) this option enable domain partition if the query sequence was predicted as multi-domain protein
                   if you want to use your own domain boundaries, please use -domain_str option
    -domain_str    Domain string to specific domain boundaries seperate by :,  (default null) this option will use user specific domain boundaries to do domain partition and assembly, for example:1-194,288-340:195-287
    ======================
    Tips for path setting:
    ======================
    -pkgdir:    directory of D-I-TASSER suite. go to the I-TASSERmod folder and enter the command "pwd",
                you may get similar message like this /home/myname/D-I-TASSER-3.0/I-TASSERmod
                then the path is /home/myname/D-I-TASSER-3.0
    -libdir:    directory of D-I-TASSER library. go to the MTX and enter the command "pwd", 
                you may get similar message like this /home/myname/ITLIB/MTX
                then the path is /home/myname/ITLIB    
    -java_home: enter the command "which java", you may get a path like /usr/bin/java, then the path is /usr
    -python2:   path to python 2, for example /DIT_anaconda3/envs/py2/bin/python
    -python3:   path to python 3 for contact/distance/hb prediction, need to support pytorch >=1.7.0, for example /DIT_anaconda3/envs/py2/bin/python
    -seqname:   this name must be different for different targets so that you can run multiple jobs at the same time.
    -datadir:   this is the directory where your input sequence "seq.fasta" is located.
                When you run multiple jobs, different targets need to be put under different folders
    
     We suggest testing your installation first with a short sequence (e.g., about 50 residues) before running production jobs for your proteins.
     An example command for running D-I-TASSER using a sequence "seq.fasta" under the folder /home/myname/data/example


   NOTE:
   a) Outline of steps for running D-I-TASSER by 'runI-TASSER.pl':
      a1) standardize 'seq.fasta' to 'seq.txt' and get the sequence length
      a2) run 'deepmsa2' to generate deep multiple sequence alignment
          run 'psiblast' to generate 'chk', 'out', 'pssm', 'mtx' files
          run 'PSSpred' to get 'seq.dat', 'seq.dat.ss'
          run 'solve' to get 'exp.dat'
          run 'pairmod' to get 'pair1.dat' and 'pair3.dat'
      a3) run 'alphafold2' (optional), 'attentionpotential', 'deeppotential','restriplet','tripletres','respre','resplm' and 'deepplm' to predicted contact/distance/HB maps
      a4) run 'LOMETS3' threading programs sequentially
          run 'mkinit.pl' to generate restraints, run 'prepare.pl' to get additional energy potentials
      a5) run 'domain_TASSER.py' to do domain parition and assembly, it will call a1-a4 again for doamin-level sequences.
      a6) run D-I-TASSER simulation
      a7) run SPICKER clustering program
          run 'get_cscore.pl' to get confidence score
          run 'EMrefinement.pl' to get full-atomic models
          run 'get_rsq_bfp.pl' to get local accuracy and B-factor estimations
      a8) run 'runCOACH.pl' to generate ligand-binding sites, EC number and 
          GO terms predictions.
   b) 'seq.fasta' is the query sequence file in FASTA format, which is the
      only needed input file for running D-I-TASSER. This file should be
      put in $datadir before running this job.
   c) D-I-TASSER structure assembly simulations contains multiple independent 
      runs by decided by protein type. This number can be modified if the user wants to run
      more simulations, especially for big protein without good templates.
   d) If working on a cluster with multiple nodes, it is recommended to set 
      $runstyle="parallel". You need have PBS server installed in your system. 
      Parallel jobs will run faster since jobs are distributed among different 
      nodes. The default setting $runstyle="serial" will run all the jobs on a 
      single computer.
   e) If the job has been executed partially and encounter some error, you can 
      rerun the main script without modification. It will check the existing 
      files and start from the correct position.

4.3 System requirement:

   a) x86_64 machine, Linux kernel OS, Free disk space of more than 60G.
   b) Perl and java interpreters should be installed. GO:Parser should be installed 
      if you want to predict GO terms
   c) Basic compress and decompress package should be installed to support: 
      tar and bunzip2.
   d) If you are using computer clusters, job management software PBS server should 
      support 'qsub' and 'qstat'. If using other job management software, such as 
      SGE and SLRUM, some changes should be made.

4.4. How to cite D-I-TASSER and D-I-TASSER Suite?

   1. Wei Zheng, Qiqige Wuyun, Yang Li, Quancheng Liu, Xiaogen Zhou, Yiheng Zhu, P. Lydia Freddolino, Yang Zhang. 
      Integrating deep learning potentials with I-TASSER for single- and multi-domain protein structure prediction. Submitted. (2023).
   2. Wei Zheng, Qiqige Wuyun, Peter L Freddolino, Yang Zhang. 
      Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15. Proteins, (2023).
   3. Wei Zheng, Yang Li, Chengxin Zhang, Xiaogen Zhou, Robin Pearce, Eric W. Bell, Xiaoqiang Huang, Yang Zhang.
      Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14. Proteins, (2021). 
   4. Wei Zheng, Chengxin Zhang, Yang Li, Robin Pearce, Eric W. Bell, Yang Zhang. 
      Folding non-homology proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. Cell Reports Methods, 1: 100014 (2021).
   5. Wei Zheng, Yang Li, Chengxin Zhang, Robin Pearce, S. M. Mortuza, Yang Zhang. 
      Deep-learning contact-map guided protein structure prediction in CASP13. Proteins, 87: 1149-1164 (2019).
   6. Y Zhang. 
      I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9: 40 (2008).
   7. A Roy, A Kucukural, Y Zhang. 
      I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, 5: 725-738 (2010).
   8. J Yang, R Yan, A Roy, D Xu, J Poisson, Y Zhang. 
      The I-TASSER Suite: Protein structure and function prediction. Nature Methods, 12: 7-8 (2015)

   ###############################################################################
   #                                                                             #
   #  5. Installation and implementation of contact/distance/HB predictors       #
   #                                                                             #
   ###############################################################################
   
5.1. Introduction of DeepPotential, AttentionPotential, ResTriplet, TripletRes, ResPRE, ResPLM and DeepPLM
   
   DeepPotential is deep-learning based contact/distance/hydrogen bond predictor 
   using three co-evolutionary features: the covariance matrix (COV) 
   proposed by DeepCov; the precision matrix (PRE) formulated by ResPRE; 
   and the coupling parameters of the inverse Potts model obtained through 
   pseudolikelihood maximization (PLM).

   AttentionPotential is an improved model that can predict various inter-residue 
   geometry potentials. In AttentionPotential model, the coevolutionary information 
   is directly extracted using the attention mechanism that can model the 
   interactions between residues, instead of the precomputed evolutionary 
   coefficients in DeepPotential. 
   
   TripletRes and ResTriplet are deep-learning based contact predictors 
   using three co-evolutionary features: the covariance matrix (COV) 
   proposed by DeepCov; the precision matrix (PRE) formulated by ResPRE; 
   and the coupling parameters of the inverse Potts model obtained through 
   pseudolikelihood maximization (PLM).

   ResPRE is our in-house contact-map predictor, which consists of two 
   consecutive steps of precision matrix-based feature generation and 
   deep residual neural network-based contact inference.

   ResPLM is also an in-house contact-map predictor similar to ResPRE. 
   The only difference is that ResPLM was trained using the PLM feature. 
   
   DeepPLM is our in-house contact-map prediction approach that has the 
   same deep-learning architecture as ResPRE, except it uses different 
   features that are generated by CCMpred.

5.2. How to install those programs?

   When you unpack the D-I-TASSER Suite, AttentionPotential, DeepPotential, ResTriplet, TripletRes, ResPRE, ResPLM and DeepPLM programs are already installed.


5.3. How to cite contact?

   If you are using the TripletRes program, you can cite:

   Yang Li, Chengxin Zhang, Eric W Bell, Wei Zheng, Dongjun Yu, Yang Zhang. 
   Deducing high-accuracy protein contact-maps from a triplet of 
   coevolutionary matrices through deep residual convolutional networks. 
   PLOS Computational Biology, (2021).

   If you are using the ResTriplet program, you can cite:

   Yang Li, Chengxin Zhang, Eric W. Bell, Dongjun Yu, Yang Zhang.
   Ensembling multiple raw coevolutionary features with deep residual 
   neural networks for contact-map prediction in CASP13.
   Proteins: Structure, Function, and Bioinformatics, 87: 1082-1091 (2019).

   If you are using the ResPre program, you can cite:

   Yang Li, Jun Hu, Chengxin Zhang, Dong-Jun Yu, and Yang Zhang. 
   ResPRE: high-accuracy protein contact prediction by coupling precision 
   matrix with deep residual neural networks. Bioinformatics, 35: 4647-4655 (2019). 

   If you are using the ResPLM and DeepPLM programs, you can cite:

   Wei Zheng, Yang Li, Chengxin Zhang, Robin Pearce, S. M. Mortuza, Yang Zhang.
   Deep-learning contact-map guided protein structure prediction in CASP13.
   Proteins: Structure, Function, and Bioinformatics, 87: 1149-1164 (2019).


   #######################################################################
   #                                                                     #
   #  6. Installation and implementation of DeepFold and PotentialFold   #
   #                                                                     #
   #######################################################################

6.1. Introduction of DeepFold and PotentialFold
   
   DeepFold is a deep-learning based method for ab initio protein 
   structure prediction. Starting from a query sequence, it first 
   collects multiple sequence alignments (MSAs) from whole- and 
   meta-genome sequence libraries. Spatial restraints (contact/distance 
   maps and inter-residue orientations) are then predicted by DeepPotential. 
   Finally, full-length structural models are constructed using 
   an L-BFGS folding algorithm.

   PotentialFold is a program for protein structure prediction 
   based on protein inter-residue geometry prediction, which is similar with DeepFold.

6.2. How to install DeepFold and PotentialFold program?

   When you unpack the D-I-TASSER Suite, DeepFold and PotentialFold program is already installed.

6.3. How to cite DeepFold and PotentialFold?

   If you are using the DeepFold program, you can cite:

   Robin Pearce, Yang Li, Gilbert S. Omenn, Yang Zhang. 
   Fast and Accurate Ab Initio Protein Structure Prediction 
   Using Deep Learning Potentials. Submitted, 2021.

   If you are using the PotentialFold program, you can cite:

   Yang Li, Chengxin Zhang, Dong-Jun Yu, Yang Zhang. 
   Deep learning geometrical potential for high-accuracy 
   ab initio protein structure prediction. iScience, (2022).


   #######################################################
   #                                                     #
   #  7. Installation and implementation of MUSTER       #
   #                                                     #
   #######################################################
   
7.1. Introduction of MUSTER
   
   MUSTER (MUlti-Sources ThreadER) is a protein threading algorithm to 
   identify the template structures from the PDB library. It generates 
   sequence-template alignments by combining sequence profile-profile 
   alignment with multiple structural information.

7.2. How to install MUSTER program?

   When you unpack the D-I-TASSER Suite, MUSTER program is already installed.


7.3. How to cite MUSTER?

   If you are using the MUSTER program, you can cite:

   S Wu, Y Zhang. MUSTER: Improving protein sequence profile-profile 
   alignments by using multiple sources of structure information. 
   Proteins, 72: 547-556 (2008).

   #######################################################
   #                                                     #
   #  8. Installation and implementation of CEthreader   #
   #                                                     #
   #######################################################
   
8.1. Introduction of CEthreader

   CEthreader is a novel threading algorithm, which first predicts 
   residue-residue contacts by coupling evolutionary precision matrices with
   deep residual convolutional neural-networks. The predicted contact maps 
   are then integrated with sequence profile alignments to recognize 
   structural templates from the PDB. 

8.2. How to install CEthreader program?

   When you unpack the D-I-TASSER Suite, CEthreader program is already installed.

8.3. How to cite CEthreader?

   If you are using the CEthreader program, you can cite:

   W Zheng, Q Wuyun, Y Li, SM Mortuza, C Zhang, R Pearce, J Ruan, Y Zhang. 
   Detecting distant-homology protein structures by aligning deep 
   neural-network based contact maps. PLOS Computational Biology, 15: 
   e1007411 (2019).


   #######################################################
   #                                                     #
   #  9. Installation and implementation of LOMETS3      #
   #                                                     #
   #######################################################
   
9.1. Introduction of LOMETS3
   
   LOMETS3 (Local Meta-Threading-Server) is meta-server approach to protein
   fold-recognition. It consists of 15 individual threading programs: DeepFold2 (DeepFold+AttentionPotential), 
   PotentialFold2 (PotentialFold+AttentionPotential), 
   DeepFold (DeepFold+DeepPotenntial), PotentialFold (PotentialFold+DeepPotential),
   CEthreader, mCEthreader, eCEthreader, MUSTER, PPA, dPPA, dPPA2, sPPA, wPPA, wdPPA, wMUSTER.
   The mCEthreader and eCEthreader are variances of CEthreader which includes 
   different scoring functions. The last 7 programs are variances of MUSTER 
   which includes different optimized energy terms.

9.2. How to install LOMETS3 program?

   When you unpack the D-I-TASSER Suite, LOMETS3 programs are already installed.

9.3. How to cite LOMETS3?

   If you are using the LOMETS3 program, you can cite:

   Wei Zheng, Qiqige Wuyun, Xiaogen Zhou, Yang Li, Peter Freddolino, Yang Zhang. 
   LOMETS3: Integrating deep-learning and profile-alignment for advanced protein template recognition and function annotation. 
   Nucleic Acids Research, 50: W454-W464 (2022).  
   
   Wei Zheng, Chengxin Zhang, Qiqige Wuyun, Robin Pearce, Yang Li, Yang Zhang. 
   LOMETS2: improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins. 
   Nucleic Acids Research, 47: W429-W436 (2019).

   S Wu, Y Zhang. 
   LOMETS: A local meta-threading-server for protein structure prediction. 
   Nucleic Acids Research, 35: 3375-3382 (2007).

   #################################################################
   #                                                               #
   #  10. Installation and implementation of DeepMSA/DeepMSA2      #
   #                                                               #
   #################################################################
   
10.1. Introduction of DeepMSA/DeepMSA2
   
   DeepMSA is a new open-source method for sensitive MSA construction, 
   which has homolo- gous sequences and alignments created from multi-sources 
   of whole-genome and metagenome databases through complementary hidden 
   Markov model algorithms. 

10.2. How to install DeepMSA program?

   When you unpack the D-I-TASSER Suite, DeepMSA program is already installed.

10.3. How to run DeepMSA program?

   The DeepMSA main script is $pkgdir/program/DeepMSA/scripts/build_MSA.py. The running 
   option of this program is similar to that in runI-TASSER.pl. By running
   the program without argument, you can print all the running options.

10.4. How to cite DeepMSA?

   If you are using the DeepMSA program, you can cite:

   Wei Zheng, Qiqige Wuyun, Yang Li, Chengxin Zhang, P Lydia Freddolino, Yang Zhang. 
   Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data. 
   Nature Methods, (2024).
   
   C Zhang, W Zheng, S M Mortuza, Y Li, Y Zhang. 
   DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. 
   Bioinformatics 36: 2105-2112 (2020). 
   
   #######################################################
   #                                                     #
   #  11. Installation and implementation of SPICKER     #
   #                                                     #
   #######################################################
   
11.1. Introduction of SPICKER
   
   SPICKER is a clustering algorithm to identify the near-native models 
   from a pool of protein structure decoys.

11.2. How to install SPICKER program?

   When you unpack the D-I-TASSER Suite, SPICKER program is already installed
   at $pkgdir/I-TASSERmod/spicker50

11.3. How to run SPICKER program?

   To run SPICKER, you need to prepare following input files:
       'rmsinp'---Mandatory, length of protein & piece for RMSD calculation;
       'seq.dat'--Mandatory, sequence file, for output of PDB models.
       'tra.in'---Mandatory, list of trajectory names used for clustering.
                  In the first line of 'tra.in', there are 3 parameters:
                  par1: number of decoy files
                  par2: 1, default cutoff, best for decoys from template-based 
                           modeling; 
                       -1, cutoff based on variation, best for decoys from 
                           ab initio modeling.
                  par3: 1, closc from all decoys; -1, closc clustered decoys
                  From second lines are file names which contain coordinates
                  of 3D structure decoys. All these files are mandatory. See 
                  attached 'rep1.tra1' for the format of decoys.
       'CA'-------Optional, native structure, for comparison to native.

     Output files of SPICKER include:
       'str.txt'-----list of structure in cluster;
       'combo*.pdb'--PDB format of cluster centroids;
       'closc*.pdb'--PDB format of structures closest to centroids;
       'rst.dat'-----summary of clustering results;

    A detailed readme file can be found at
    https://zhanggroup.org/SPICKER/readme

11.4. How to cite SPICKER?

   If you are using the SPICKER program, you can cite:

   Y Zhang, J Skolnick, SPICKER: Approach to clustering protein structures 
   for near-native model selection, Journal of Computational Chemistry, 
   25: 865-871 (2004).


   #######################################################
   #                                                     #
   #  12. Installation and implementation of HAAD        #
   #                                                     #
   #######################################################
   
12.1. Introduction of HAAD
   
   HAAD is a computer algorithm for constructing hydrogen atoms from 
   protein heavy-atom structures. The hydrogen is added by minimizing 
   atomic overlap and encouraging hydrogen bonding. 

12.2. How to install HAAD program?

   When you unpack the D-I-TASSER Suite, HAAD program is already installed
   at $pkgdir/program/abs/mybin/HAAD

12.3. How to run HAAD program?

   Hydrogen atoms in a PDB file(xx.pdb) can be added by running 
   "./HAAD xx.pdb", the output is "xx.pdb.h".

   In "xx.pdb.h", the label in column 57 presents the label for the atoms 
   that have been added by HAAD. When the value of the label is less 
   than 2, the position of the added atom has higher confidence.

12.4. How to cite HAAD?

   If you are using the HAAD program, you can cite:

   Y Li, A Roy, Y Zhang, HAAD: A Quick Algorithm for Accurate Prediction 
   of Hydrogen Atoms in Protein Structures, PLoS One, 4: e6701 (2009).


   #######################################################
   #                                                     #
   #  13. Installation and implementation of EDTSurf     #
   #                                                     #
   #######################################################
   
13.1. Introduction of EDTSurf
   
   EDTSurf is a program to construct triangulated surfaces for macromolecules. 
   It generates three major macromolecular surfaces: van der Waals surface, 
   solvent-accessible surface and molecular surface (solvent-excluded 
   surface). EDTsurf also identifies cavities which are inside of 
   macromolecules. 

13.2. How to install EDTSurf program?

   When you unpack the D-I-TASSER Suite, EDTSurf program is already installed
   at $pkgdir/bin/EDTSurf

13.3. How to use EDTSurf program?

   EDTSurf -i inputfile ...
   Specific options:
         -o prefix of output files (default is the prefix of inputfile)
         -t triangulation type, 1-MC 2-VCMC (default is 2)
         -s surface type, 1-VWS 2-SAS 3-MS (default is 3)
         -c color mode, 1-pure 2-atom 3-chain (default is 2)
         -p probe radius, float point in [0,2.0] (default is 1.4)
         -h inner or outer surface for output, 1-inner and outer 2-outer 
        3-inner (default is 1)
         -f scale factor, float point in (0,20.0] (default is 4.0)

      Molecule is scaled by this factor to fit in a bounding box. Scale 
      factor is the larger the better, but will increase the memory use. 
      Our strategy is first enlarging the molecule to check if it exceeds 
      the maximum bounding box. If yes, then reset a proper scale factor 
      to fit the molecule in the maximum bounding box.

   By running EDTSurf itself, it will print out a brief description on how
   to use the program. A detail description of EDTSurf is available at
   https://zhanggroup.org/EDTSurf/

13.4. How to cite EDTSurf?

   If you are using the EDTSurf program, you can cite:

   D Xu, Y Zhang, Generating Triangulated Macromolecular Surfaces by Euclidean 
   Distance Transform. PLoS ONE 4: e8140 (2009).


   #######################################################
   #                                                     #
   #  14. Installation and implementation of ModRefiner  #
   #                                                     #
   #######################################################
   
14.1. Introduction of ModRefiner
   
   ModRefiner is a standalone program for atomic-level protein structure 
   construction and refinement. It includes two steps: (1) construct
   main-chain models from C-alpha trace; (2) build side-chain models
   and atomic-level structure refinement.

14.2. How to install ModRefiner program?

   When you unpack the D-I-TASSER Suite, ModRefiner program is already installed
   at $pkgdir/I-TASSERmod/ModRefiner.pl

14.3. How to use ModRefiner program?

   ModRefiner supports following four options:
   
   a) add side-chain heavy atoms to main-chain model without refinement
      > ModRefiner.pl 1 ID MD IM ON

   b) build main-chain model from C-alpha trace model
      > ModRefiner.pl 2 ID MD IM RM ON

   c) build full-atomic model from main-chain model
      > ModRefiner.pl 3 ID MD IM RM ON

   d) build full-atomic model from C-alpha trace model
      > ModRefiner.pl 4 ID MD IM RM ON

   ID: the path of the D-I-TASSER package, e.g. '/home/yourname/D-I-TASSER-3.0'
   MD: directory which contains the initial model, e.g. '/home/yourname/D-I-TASSER-3.0/example'
   IM: the initial model to be refined, e.g. 'mode1.pdb'
   RM: reference model that refined model is driven to, e.g. 'combo1.pdb'.
       Only CA trace is needed and the length can be not full which will make 
       the refinement of the missing region flexible. If you don't have the
       reference model, use the name of IM instead.
   ON: the output name of the refined model, e.g. 'model1_ref.pdb'

   By running the program without argument, you can print a brief description
   of how to use the program.
   
14.4. How to cite ModRefiner?

   If you are using the ModRefiner program, you can cite:

   D Xu, Y Zhang. Improving the Physical Realism and Structural Accuracy of 
   Protein Models by a Two-step Atomic-level Energy Minimization. 
   Biophysical Journal, 101: 2525-2534 (2011)


   #######################################################
   #                                                     #
   #  15. Installation and implementation of NWalign     #
   #                                                     #
   #######################################################
   
15.1. Introduction of NWalign
   
   NW-align is simple and robust alignment program for protein 
   sequence-to-sequence alignments based on the standard Needleman-Wunsch 
   dynamic programming algorithm. The mutation matrix is from BLOSUM62 
   with gap opening penalty=-11 and gap extension penalty=-1. 

15.2. How to install NWalign program?

   When you unpack the D-I-TASSER Suite, NWalign program is already installed
   at $pkgdir/bin/align.

15.3. How to use NWalign program?
   
   > align F1.fasta F2.fasta (align two sequences in fasta file)
   > align F1.pdb F2.pdb 1   (align two sequences in PDB file)
   > align F1.fasta F2.pdb 2 (align Sequence 1 in fasta and 2 in pdb)
   > align GKDGL EVADELVSE 3 (align sequences typed by keyboard)
   > align GKDGL F.fasta 4   (align Seq-1 by keyboard and 2 in fasta)
   > align GKDGL F.pdb 5     (align Seq-1 by keyboard and 2 in pdb)

   By running the program itself, it will print out the usage options of
   the program.

15.4. How to cite NWalign?

   There is no published paper associated with this program. If you are using
   the NWalign program, you can cite it as 

   Y Zhang, https://zhanggroup.org/NW-align


   #######################################################
   #                                                     #
   #  16. Installation and implementation of PSSpred     #
   #                                                     #
   #######################################################
   
16.1 Introduction of PSSpred

   PSSpred (Protein Secondary Structure PREDiction) is a simple neural network 
   training algorithm for accurate protein secondary structure prediction. It first 
   collects multiple sequence alignments using PSI-BLAST. Amino-acid frequency and 
   log-odds data with Henikoff weights are then used to train secondary structure, 
   separately, based on the Rumelhart error back propagation method. The final 
   secondary structure prediction result is a combination of 7 neural network 
   predictors from different profile data and parameters.

16.2 How to install PSSpred program?

   When you unpack the D-I-TASSER Suite, NWalign program is already installed
   at $pkgdir/program/PSSpred
  
16.3 How to use PSSpred program?   

   $pkgdir/program/PSSpred/mPSSpred.pl seq.txt $pkgdir $libdir

   Please note that 'seq.txt' should be in current directory and the script will
   generate two files 'seq.dat' and 'seq.dat.ss' in the current folder. Here, 
   $pkgdir is the root path of D-I-TASSER package.
 
16.4 How to cite PSSpred?

   If you are using the PSSpred program, you can cite:
   https://zhanggroup.org/PSSpred


   #######################################################
   #                                                     #
   #  17. Installation and implementation of COFACTOR    #
   #                                                     #
   #######################################################
   
17.1 Introduction of COFACTOR

  COFACTOR is a structure-based method for biological function annotation of 
  protein molecules. COFACTOR threads the structure through three comprehensive 
  function libraries by local and global structure matches to identify functional 
  sites and homology. Functional insights, including ligand-binding site, 
  gene-ontology terms and enzyme classification, will be derived from the best
  functional homology template. The COFACTOR algorithm was ranked as the best 
  method for function prediction in the community-wide CASP9 experiments.

17.2 How to install COFACTOR program?

   When you unpack the D-I-TASSER Suite, COFACTOR program is already installed
   at $pkgdir/program/COFACTOR
   
17.3 How to use COFACTOR program?   

   $pkgdir/I-TASSERmod/runCOFACTOR.pl

17.4 How to interpret the results

   If your input data is at $datadir/model1.pdb, the output of COFACTOR will be at
   $datadir/model1/cofactor:
     (1)List of similar structures in PDB: similarpdb_model1.lst. The columns are
    (PDB_ID, TM-score, RMSD, Cov, Seq_id)
     (2)Ligand-binding sites: BSITE_model1/Bsites_model1.dat. The columns are
        (Rank, C-score, PDB_ID, TM-score, RMSD, Seq_id, Cov, Lig_name, SITE_num, 
    BS-score, LTM, BS_ID, BS_cov,BS_err, BS_ID1,BS_ID2, Binding residues)
     (3)EC number: ECsearchresult_model1.dat The columns are
        (PDB_ID, TM-score, RMSD, Seq_ID, Cov, EC-score, EC number, 
    Active site residues)
     (4)GO terms: GOsearchresult_model1.dat. The columns are
        (PDB_ID, TM-score, RMSD, Seq_ID, Cov, GO-score, GO terms)

17.5 How to cite COFACTOR?

   If you are using the COFACTOR program, you can cite:

   1. A Roy, J Yang, Y Zhang. COFACTOR: An accurate comparative algorithm for
      structure-based protein function annotation. 
      Nucleic Acids Research, 40:W471-W477 (2012).
   2. J Yang, A Roy, Y Zhang. BioLiP: a semi-manually curated database for 
      biologically relevant ligand-protein interactions. 
      Nucleic Acids Research, 41: D1096-D1103 (2013).


   #######################################################
   #                                                     #
   #  18. Installation and implementation of COACH       #
   #                                                     #
   #######################################################
      
18.1 Introduction of COACH
  
  COACH is a meta-server approach to protein function annotations.
  Starting from given structure of target proteins, COACH will generate 
  complementary ligand binding site predictions using two comparative methods:
  TM-SITE and S-SITE, which recognize ligand-binding templates from 
  the BioLiP protein function database by binding-specific substructure and 
  sequence profile comparisons. These predictions will be combined with results
  from COFACTOR to generate multiple function annotations, including 
  ligand-binding sites, enzyme commission and gene ontology terms. 

18.2 How to install COACH program?

   When you unpack the D-I-TASSER Suite, COACH program is already installed
   at $pkgdir/program/COACH
   
18.3 How to use COACH program?

   $pkgdir/I-TASSERmod/runCOACH.pl

18.4 How to interpret the results

   If your input data is at $datadir/model1.pdb, the output of COACH will be at
   $datadir/model1/coach:  

     (1) Ligand-binding sites: Bsites.dat. The columns are
         (C-score, cluster_densitiy, product_of_top_templates_zscore, 
     Binding residues)  
     (2) Detailed clustering information: Bsites.inf, Bsites.clr, which list 
         the templates used in the cluster that generates the prediction in (1).
     (3) Ligand-protein complex structures are with name: CH_complex*.pdb
     (4) Predicions from COFACTOR, TM-SITE, and S-SITE are at, respectively:
         $datadir/model1/cofactor
         $datadir/model1/tmsite
     $datadir/ssite

18.5 How to cite COACH?

   If you are using the COACH program, you can cite:   

   1. J Yang, A Roy, Y Zhang. Protein-ligand binding site recognition using 
      complementary binding-specific substructure comparison and sequence profile 
      alignment. Bioinformatics, 29:2588-2595 (2013).
   2. J Yang, A Roy, Y Zhang. BioLiP: a semi-manually curated database for 
      biologically relevant ligand-protein interactions.
      Nucleic Acids Research, 41: D1096-D1103 (2013).


   #######################################################
   #                                                     #
   #  19. Installation and implementation of TM-SITE     #
   #                                                     #
   #######################################################
   
19.1 Introduction of TM-SITE
  
  TM-SITE is a structure-based approach to protein-ligand binding site prediction.
  Structure alignment between query and BioLiP templates is performed on 
  binding-specific substructure using TM-align. The final ligand-binding sites 
  are collected based on the clustering of multiple templates. 

19.2 How to install TM-SITE program?

   When you unpack the D-I-TASSER Suite, TM-SITE program is already installed
   at $pkgdir/program/COACH
  

19.3 How to interpret the results

   If your input data is at $datadir/model1.pdb, the output of TM-SITE will be at
   $datadir/model1/tmsite:  
     (1)Ligand-binding sites: Bsites.dat. The columns are
        (C-score, top_templates_zscore, JSD_score, cluster_density, 
    Binding residues)  
     (2)Detailed clustering information: Bsites.inf, Bsites_lig.clr, which lists 
        the templates used in the cluster that generates the prediction in (1).
     (3)Ligand-protein complex structures are with name: complex*.pdb

19.4 How to cite TM-SITE?

   If you are using the TM-SITE program, you can cite:   

   1. J Yang, A Roy, Y Zhang. Protein-ligand binding site recognition using 
      complementary binding-specific substructure comparison and sequence profile 
      alignment. Bioinformatics, 29:2588-2595 (2013).
   2. J Yang, A Roy, Y Zhang. BioLiP: a semi-manually curated database for 
      biologically relevant ligand-protein interactions.
      Nucleic Acids Research, 41: D1096-D1103 (2013).


   #######################################################
   #                                                     #
   #  20. Installation and implementation of S-SITE      #
   #                                                     #
   #######################################################
   
20.1 Introduction of S-SITE
  
  S-SITE is a sequence-based approach to protein-ligand binding site prediction. 
  Binding-specific sequence profile-profile alignment is used to recognize 
  homologous templates in BioLiP. The ligand-binding sites predictions are 
  collected from the clustering of multiple homologous templates. 

20.2 How to install S-SITE program?

   When you unpack the D-I-TASSER Suite, S-SITE program is already installed
   at $pkgdir/program/COACH
   

20.3 How to interpret the results

   If your input data is at $datadir/seq.fasta, then the output of S-SITE will 
   be at $datadir/ssite:

     (1)Ligand-binding sites: Bsites_fpt.dat. The columns are
        (C-score, top_templates_zscore, cluster_density, cluster_density1, 
    JSD_score, Binding residues)  
     (2)Detailed clustering information: Bsites_fpt.clr, which list the templates
        used in the cluster that generates the prediction in (1).

20.4 How to cite S-SITE?

   If you are using the S-SITE program, you can cite:

   1. J Yang, A Roy, Y Zhang. Protein-ligand binding site recognition using
      complementary binding-specific substructure comparison and sequence profile
      alignment. Bioinformatics, 29:2588-2595 (2013).
   2. J Yang, A Roy, Y Zhang. BioLiP: a semi-manually curated database for 
      biologically relevant ligand-protein interactions.
      Nucleic Acids Research, 41: D1096-D1103 (2013).

   #######################################################
   #                                                     #
   #  21. Installation and implementation of ResQ        #
   #                                                     #
   #######################################################
   
21.1 Introduction of ResQ
  
   ResQ is a method for estimating B-factor and residue-level quality in protein
   structure prediction, based on local variations of modelling simulations and 
   the uncertainty of homologous alignments. Given a protein structure model, 
   ResQ first identifies a set of homologous and/or analogous templates from 
   the PDB by threading and structure alignment techniques. The residue-level 
   modeling errors are then derived by support vector regression that was 
   trained on the local structural and alignment variations of the templates, 
   with the B-factor of each residue deduced from the experimental records of 
   the top homologous proteins. 

21.2 How to install ResQ program?

   When you unpack the D-I-TASSER Suite, ResQ program is already installed at 
   $pkgdir/program/ResQ.
   
21.3 How to use ResQ program?

   There are two methods to run ResQ depending on how your models were generated.
      1) If your models were generated by D-I-TASSER, you can run the script of 
         $pkgdir/program/ResQ/runResQ_IT.pl to predict B-factor and local structure errors. 
         The only argument required is the directory of the D-I-TASSER decoys. You 
         can read more at the head of this script to get more information about 
         its input.

      2) If your models were not generated by D-I-TASSER, you can run the script 
         $pkgdir/program/ResQ/runResQ.pl to predict B-factor and local structure errors. 
     It will automatically run LOMETS2 to generate the threading alignment 
     file 'init.dat'. LOMETS2 is included in this package.

21.4 What is the output of ResQ?

     For D-I-TASSER models, the output of ResQ is: 
         rsq_bfp_new.dat

     For other models, the output of ResQ is:
         1) global.txt for global accuracy estiamtion
         2) local.txt for local error and B-factor estimation

21.4 How to cite ResQ?

   If you are using the ResQ program, you can cite:

   1. J Yang, Y Wang, Y Zhang. ResQ: Approach to unified estimation of B-factor 
      and residue-specific error in protein structure prediction, 
      Journal of Molecular Biology, 428: 693-701 (2016).

   #######################################################
   #                                                     #
   #  22. Installation and implementation of FUpred      #
   #                                                     #
   #######################################################

22.1 Introduction of FUpred

   FUpred is a contact map-based domain prediction method which utilizes a recursion strategy 
   to detect domain boundary based on predicted contact-map and secondary structure information. 
   Large scale benchmark analysis shows that FUpred has significantly better ability of domain
   boundary prediction than threading-based method and machine learning-based methods. 
   Particularly, our method has obviously excellent performance in detecting discontinuous domain 
   boundary than current methods.

22.2 How to install FUpred program?

   When you unpack the D-I-TASSER Suite, FUpred program is already installed
   at $pkgdir/program/FUpred

22.3 How to use FUpred program?

   The main program is run-FUpred.pl, given a fasta format protein sequence file
   you can run  

   ./run-FUpred.pl sequence.fasta

   where sequence.fasta is your input file.
   you can try

   ./run-FUpred.pl ./example/6paxa.fasta

   the predicted results will be output to screen like this:

   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
   predicting domain and domain boundary...
   domain boundary is:1-65;66-133;
   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

   if you want to customize the FUpred program with different parameters, then you can read (2.1)
   and (2.2) for help, else you can skip them.

   -------------------------------------------------------------------------------------------------------------------
   (2.1). CEdecomposition

   This program is contact map eigen decomposition method, design for FUpred/CEthreading program.

   Usage: CEdecomposition -i inputtype[q: query t: template] -f fasta [-n native pdb] -s psipred.horiz [-d dssp] -c metapsicov_contact_format[if -n no predicted_contact need] -o outfile -m [linear num or exp num or top num] -mtx psiblastmtx

   opitions:
            -i: input type q: query                  [design for CEthreading]
                           t: template               [design for CEthreading]
                           qnm: query without mtx    [design for FUpred]
                           tnm: template without mtx [design for FUpred]
   for query (q and qnm):
            -f: input fasta for query
            -s: psipred horiz out file
            -c: metapsicov format contact file
            -m choose one of following three cutoff, default exp
            linear: linear model for top num*L contact cutoff, default num=2
            exp: exponent model for top L^num contact cutoff, default num=1.2
            top: top num[fixed] contact cutoff
   for template (t and tnm):
            -n: native structure for template
            -d: dssp file
   q and t common:
            -mtx: psiblast mtx file
            -o: output file
   qnm and tnm common:
            -o: output file


   How to build ce file by you own seleced contacts

   For example:
   original contact map by CASP format

   1 19 0 8 0.991
   1 18 0 8 0.71
   91 103 0 8 0.700
   ...........
   32 54 0 8 0.001

   (total 3000 contacts)

   You can select any contacts as you want, for example (only two contacts and ignore the confidence scores)

   91 103 0 8 0.700
   32 54 0 8 0.001

   Then write these to a file (mycontact.con)
   Then use CEdecomposition do eigen decomposition

   CEdecomposition -i qnm  -f fastafile  -s psipred.horiz  -c mycontact.con -o outfile -m  top 2 

   Then you will get a input file basing on a conatct map only contains two contacts. 
   This "-m top" parameters are useful when you build your own contact map. You don't need change any source code!

   -------------------------------------------------------------------------------------------------------------------
   (2.2). FUpred

   This is the contact map-based recursion strategy domain partition program which uses ce format file as input.

   Usage:  -i inputfile [xxxx.ce format ] -2c [two continuous domain cutoff] -2d [two discontinuous domain cutoff] -chip [chip length] -label3c [use 3c or not]

   explanation:
         -chip 
                        when split the squence to domains using recursion strategy, the protein will be 
                        split into small fragments, if the length of the fragment is less than this chip
                        we will merge the fragment to last stage domain fragment to avoid too many small 
                        fragments in final results.
         -label3c
                        this is the parameter that indicate whether FUpred will use a 3-domain continuous
                        domain partition scoring function to refine the domain bounddary. 0 means not use it
                        1 means use it.  

22.4 How to cite FUpred?

   If you are using the FUpred program, you can cite:

   1. Wei Zheng, Xiaogen Zhou, Qiqige Wuyun, Robin Pearce, Yang Li and Yang Zhang. 
      FUpred: Detecting protein domains through deep-learning based contact map prediction. 
      Bioinformatics, 36: 3749–3757 (2020).

   #######################################################
   #                                                     #
   #  23. Installation and implementation of DEMO        #
   #                                                     #
   #######################################################

23.1 Introduction of DEMO
  
   DEMO2 (Domain Enhanced MOdeling, version 2.0) is an improved version of DEMO for automated 
   assembly of full-length structural models of multi-domain proteins by integrating deep-learning 
   predicted inter-domain spatial restraints. Starting from individual domain structures, 
   quaternary structure templates that have similar component domains are identified by 
   domain-level structural alignments using TM-align. Meanwhile, inter-domain spatial restraints 
   are predicted by the deep residual neural-network-based predictor DeepPotential. Full-length 
   models are then created by a fast quasi-Newton optimization for rigid-body domain structure 
   assembly, which are guided by the DeepPotential predicted inter-domain restraints, 
   inter-domain distance profiles collected from the top-ranked quaternary templates, and 
   physics-based steric potentials. The final models are selected from the low energy 
   conformations and further refined with fragment-guided molecule dynamics simulations. 
   Large-scaled benchmark tests showed that the performance is significantly beyond its 
   predecessor.

23.2 How to install the DEMO Suite?

   When you unpack the D-I-TASSER Suite, FUpred program is already installed
   at $pkgdir/program/DEMO2 and $pkgdir/program/DEMO_super4

23.3 How to run DEMO2
   
   a) Main script for running DEMO2 is $pkgdir/program/DEMO2/run_DEMO2.py, where "$pkgdir" is the
      location of run_DEMO2.py script.
      Run it directly without arguments will output the help information.

   b) The following arguments must be set (mandatory arguments). One example is: 

      "$pkgdir/run_DEMO2.py protein_name input_dir sequence [Options]"
    
      'protein_name' is the name of the folder containg the protein sequence and domain models
      'input_dir'    is the directory which contains the query folder
      'sequence'     is the full-chain sequence in FASTA format

   c) Other arguments are optional whose default values have been set.
      User can reset one or more of them. One example of command line is: 

      "$pkgdir/run_DEMO.py protein_name input_dir sequence -template XXX.pdb"

      -template   Provide the template strcuture to guide the domain assembly. The tmeplate
                  should be in PDB format.
      -deepdist   [no or yes], flag of predicted distance by DomainDist to guide the assembly. 
                  The default value is "yes". 
      -EMmap      The cryo-EM density map in MRC or CCP4 format.
      -reso       The resolution of the density map.
      -CLink      The cross link data (follw the format provided on websever).
      -run        [real, benchmark],"real" will use all templates, "benchmark"
                  will exclude homologous templates
                  
   d) Where are the final predicted results?
         The following results are included in "/input_dir/protein_name":

      "fmodel*.pdb"  the final model assembled by DEMO
      "cscore"       the confidence score, estimated TM-score, and estimated RMSD 
                     of the final model

   NOTE:
   a) Outline of steps for running DEMO2 by 'run_DEMO2.py':
      a1) Prase user provided information
      a2) run 'DeepPotential' to predict inter-residue spatial restraints of the full-chain
      a3) run 'DEMO' to assemble all domain models into a full-length model
   b) The domain pdb file should be named as dom1.pdb, dom2.pdb, dom3.pdb... in order.
      They be put in "./input_dir/protein_name" before running this job.
   c) 'seq.fasta' is the query sequence file in FASTA format. This file should be put 
      in "./input_dir/protein_name" before running this job.
   c) If working on a cluster with multiple nodes, it is recommended to set 
      $runstyle="parallel". You need have PBS server installed in your system. 
      Parallel jobs will run faster since jobs are distributed among different 
      nodes. The default setting $runstyle="serial" will run all the jobs on a 
      single computer.
   d) If the job has been executed partially and encounter some error, you can 
      rerun the main script without modification. It will check the existing 
      files and start from the correct position.
   e) If you want to provide the cryo-EM density data to guide the assembly, please use
      the option "-EMmap" and  "-reso" and follw the explanation and example at
      https://zhanggroup.org/DEMO2/explanation_EM.html
   f) If you want to provide the cross link data or contact/distance to guide the 
      assembly, please use the option "CLink" and follw the explanation and example at
      https://zhanggroup.org/DEMO2/explanation_CL.html      

23.4 How to cite DEMO2?

   If you are using the DEMO program, you can cite:

   1. Xiaogen Zhou, Chunxiang Peng, Wei Zheng, Yang Li, Guijun Zhang, and Yang Zhang. 
      DEMO2: Assemble multi-domain protein structures by coupling analogous template alignments with deep-learning inter-domain restraint prediction。
      Nucleic Acids Research, （2022）.
      
    2. Xiaogen Zhou, Jun Hu, Chengxin Zhang, Guijun Zhang, and Yang Zhang. Assembling multidomain 
      protein structures through analogous global structural alignments. Proceedings of the 
      National Academy of Sciences, 116: 15930-15938 (2019)

   #######################################################
   #                                                     #
   #  24. Installation and implementation of ThreaDomEx  #
   #                                                     #
   #######################################################

24.1 Introduction of ThreaDomEx

   Protein domains are subunits that can fold and evolve independently. The identification of protein domains is essential 
   for protein structure determination and functional annotations. ThreaDom2 and DomEx3 are two methods recently developed 
   for protein domain boundary recognition and especially discontinuous domain prediction. ThreaDomEx combines ThreaDom 
   and DomEx into a unified on-line server system for more accurate and user-friendly domain predictions on sequences of 
   both continuous and discontinuous domain structures. ThreaDomEx takes the amino acid sequence of the query protein as input.
   It first creates multiple threading alignments to recognize homologous and analogous template structures, from which a 
   domain conservation score is then calculated for deducing the domain boundaries. Next, a boundary clustering method is 
   used to optimize the domain model selections. For discontinuous domain structures, a symmetric alignment algorithm is 
   applied to further integrate and refine the domain assignments. Output of the server consists of: (a) the predicted domain 
   boundaries and discontinuous domains; (b) the visualized distribution of domain conserve score, predicted secondary structure
   and solvent accessiblity; (c) the threading templates used by ThreaDomEx. The server allows users to interactively edit, 
   save, or re-detect the domain models of the proteins.

24.2 How to install the ThreaDomEx?

   When you unpack the D-I-TASSER Suite, ThreaDomEx program is already installed
   at $pkgdir/program/ThreaDomEx and $pkgdir/program/ThreaDomEx

24.3 How to run ThreaDomEx?

   DomEx USAGE:
   Mandatory arguments:
   ./DomEx.pl -seqname sequence_name 
   Optional arguments:
         -workdir workdir: the work directory.Defaut:'./workspace'
         -b b_value: the cutoff of the parameter b(0.1<=b<=0.9). Default: 0.1.
   Users can change these defaut values from line 23 to 31 of DomEx.pl  

   For example:
      ./DomEx.pl -seqname Targetname -b 0.3
   
      Notice: 
      (1)the Tegetname.fasta and Targetname.sd should be put in the directory workdir/Targetname.Check the example in the workspace,please. 
      (2)Targetname.sd: The file of  predicted domain segments predicted by any domain predictor. The format is like :targetname\tsequencelenth\tsegmentnum\tsegments\n. If users want to combine DomEx with ThreaDOm, install ThreaDom, and copy the segment prediction result to target.sd.
      (3)The program is designed on the PBS-based cluster. Users have to modify the programs if running on workstation.


   For test:
      There is an example in ./output/S50. User can add a task to crontab for multi-call  dDomEx.pl, or excute "./DomEx.pl -seqname S50"  several times mannully.
      User can also check the check.txt file for the running state. 

   For users who want to predict domain boundaries by ThreaDom:
      (1)Download ThreaDom package from https://zhanggroup.org/ThreaDom/ , or subimit your sequence online.
      (2)copy prediction result to Targetname.sd( for example T50.sd)
      (2)excute ./DomEx.pl -seqname Targetname 

24.4 How to cite ThreaDomEx?

   If you are using the DEMO program, you can cite:

   1. Yan wang, Jian Wang, Qiang Shi, Ruiming Li，Zhidong Xue, Yang Zhang. ThreaDomEx: 
      A unified platform for predicting continuous and discontinuous protein 
      domains by multiple-threading and segment assembly. 
      Nucleic Acids Research, doi:10.1093/nar/gkx410(2017).

   2. Z Xue, D Xu, Y Wang, Y Zhang. ThreaDom: Extracting Protein Domain Boundary Information
      from Multiple Threading Alignments. Bioinformatics, 29: i247-i256 (2013).

   3. Zhidong Xue, Richard Jang, Brandon Govindarajoo, Yichu Huang, Yan Wang. Extending 
      Protein Domain Boundary Predictors to Detect Discontinuous Domains. 
      PLoS ONE 10(10): e0141541. doi:10.1371/journal. pone.0141541 (2015).