DeepMSA (version 2) is a hierarchical approach to create high-quality multiple sequence alignments (MSAs)
for monomer and multimer proteins.
The method is built on iterative sequence database searching followed by fold-based
MSA ranking and selection.
For protein monomers, MSAs are produced with three iterative MSA searching pipelines (dMSA, qMSA and mMSA)
through whole-genome (Uniclust30 and UniRef90) and
metagenome (Metaclust, BFD, Mgnify, TaraDB, MetaSourceDB and JGIclust) sequence databases.
For protein multimers, a number of hybrid MSAs are created by pairing the sequences from
monomer MSAs of the component chains, with the optimal multimer MSAs selected based on a combined score of
MSA depth and folding score of the monomer chains.
Large-scale benchmark data show significant advantage of DeepMSA2 in generating accurate MSAs
with balanced depth and alignment coverage which are most suitable for deep-learning based
protein and protein complex stucture and function predictions.
[Example output for monomer]
[Example output for multimer]
- Wei Zheng, Qiqige Wuyun, Yang Li, Chengxin Zhang, P Lydia Freddolino, Yang Zhang.
Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data. Submitted (2023).
- Chengxin Zhang, Wei Zheng, S M Mortuza, Yang Li, Yang Zhang.
DeepMSA: constructing deep multiple sequence alignment to improve
contact prediction and fold-recognition for distant-homology proteins. Bioinformatics,
36: 2105-2112 (2020).