【2.1.1】蛋白二级结构Dali

December 03, 2019 protein_design 阅读量：次

一、简介

大多数新确定的蛋白质序列可通过序列同源性分类为家族。然而，已知蛋白质家族保留了折叠的形状，即使序列在序列水平上几乎没有相似性。这些相似性可以通过将已知3-D结构的蛋白质家族合并为结构类别的结构比较来检测。用于比较蛋白质结构的重要工具是距离矩阵。，它是3D结构的2D表示，因为它包含原子之间的所有成对距离 - 在这种情况下是Cα原子。它们可以通过X射线晶体学和核磁共振（NMR）获得。

DALI ，名字来自于 Distance-matrix ALIgnment，是一种用于比较3D蛋白质结构的网络服务。在某些情况下，比较三维结构可能会发现生物学上有趣的相似之处，这些相似之处是通过比较序列无法检测到的。

The “Protein Structure Comparison by alignment of distance matrices” by Liisa Holm & Chris Sander was first published in1993.
基于Fortran

Dali服务器已经在各个地方运行了20多年，并且通常由结晶器用于新解析的结构。服务器的最新更新为序列和结构保护研究提供了增强的分析。服务器执行三种类型的结构比较：

蛋白质数据库（PDB）搜索将一个查询结构与PDB中的查询结构进行比较，并返回类似结构的列表;
成对比较将一个查询结构与用户指定的结构列表进行比较;
所有结构比较的结果都返回结构相似矩阵，树状图和用户指定的一组结构的多维比例投影。

使用无Java的WebGL查看器PV可视化结构叠加。通过针对Uniprot的序列相似性搜索来增强结构对齐视图。组合的结构 - 序列比对信息被压缩成一堆对齐的序列标志。在堆栈中，每个结构在结构上与查询蛋白质对齐并由序列标志表示。

Dali服务器在过去的8年中，已经处理了超过175 000次PDB搜索
Dali程序优化了结构比对，即C-alpha原子之间的一对一对应顺序。
已经提出了各种各样的评分函数。最重要的评分函数类别是（i）those based on the root mean square deviation of rigid-body superim- position and（ii）those allowing flexible superimposition or plastic deformations。基于早期的折叠的视觉分析的工作强调了塑性变形（plastic deformations）在蛋白质结构演变中的重要性。Dali的评分函数属于后一类，并且已经证明它可以产生与专家分类一致的结构树状图（13,16-18）。

比对的残基长度至少为30 ？？

二、安装

2.1 系统需要：

Linux OS
openmpi
Fortran-90 and C compilers
Perl
Blast
Internet connection

openmpi是可选的，软件可以串行运行。使用MPI版本时，所有节点必须具有对公共磁盘的读写访问权限（我们在多核服务器上运行）。所有节点都从内部结构数据目录（DALIDATDIR_1和DALIDATDIR_2）读取，并将中间结果写入当前工作目录，该目录必须可由主进程读取。 Dali仅使用C-alpha坐标生成结构对齐。数据库搜索选项使用序列比较（Blast）进行PDB的软聚类。聚类用作过滤器以选择显式结构对齐的候选者。

2.2 安装

cd /data/software/dali
wget http://ekhidna2.biocenter.helsinki.fi/dali/DaliLite.v5.tar.gz
tar -zxvf DaliLite.v5.tar.gz

#编译
cd ./DaliLite.v5/bin
make clean
make # ignore Warnings

#并行
# if using openmpi (check OPENMPI_PATH in Makefile)
# make parallel

测试

cd /data/software/dali/DaliLite.v5
# the script assumes that blastp and makeblastdb are in your PATH
# if not, get them from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
./test.csh
# compare output to ./test_output

2.3 安装过程中的报错

报错1

make: gfortran: Command not found

解决办法：

yum install gcc-gfortran  # centos7的安装gfortran

报错2

make: /usr/lib64/openmpi/bin/mpif90: Command not found
make: *** [mpicompare] Error 127

解决办法：

安装 openmpi:

yum install openmpi openmpi-devel

安装后，二进制文件位于 /usr/lib64/openmpi/bin 下，动态库文件位于 /usr/lib64/openmpi/lib 下，因而实际使用的话还需要额外的配置，在 vim /etc/profile 中加入如下语句:

export PATH=/usr/lib64/openmpi/bin/:$PATH
module load mpi/openmpi-x86_64

PS：要使用 module 命令需要先安装 environment-modules 包。

三、使用说明

import.pl - Dali将PDB文件转换内部数据格式，镜像PDB数据库
dali.pl - 执行成对结构对齐，全对比比较或结构数据库搜索

3.1 导入结构数据（import.pl）

* Import single PDB entry to ./DAT:
        import.pl --pdbfile <filename> --pdbid <xxxx> [ --dat <path> ]

* Import list of PDB entries:
        bin/import.pl --pdblist <filename> [ --dat <path> ]

* Automated PDB mirroring:
        import.pl --rsync [ --pdbmirrordir <path> ] [ --dat <path> ]

* Options:
        --dat <path>               directory to store imported data [default: ./DAT]
        --pdbfile <filename>       PDB formatted file, may be compressed (.gz)
        --pdbid <xxxx>             four-letter PDB identifier
        --pdblist <filename>       list of PDB entries, file names of the form pdbXXXX.ent
        --rsync                    automated PDB mirroring
        --pdbmirrordir <path>      PDB mirror directory [default: /data/pdb]
        --clean                    remove temporary files
        --verbose                  verbose

这是导入数据文件（1pptA.dat）的示例。注释已插入以“＃”开头的行：

# The header line gives the structure identifier, number of residues, total number of secondary structure elements (SSEs), number of helices, number of strands, sequence of SSEs 
>>>> 1pptA   36    1    1    0  H
# For each SSE, list its sequential number, start and end position, modified start and end position, length check code (0 = ok, >0 = short)
         1        14        31        14        31         0
# C-alpha coordinates: (x,y,z) triples for each residue sequentially 
     1.5    -9.0    17.3    -1.1   -10.6    15.0    -0.6   -14.2    14.1     0.5
   -14.9    10.5    -2.4   -15.0     8.1    -3.5   -18.2     6.3    -3.1   -18.0
     2.5    -6.6   -18.6     1.0    -5.4   -20.7    -2.0    -4.5   -19.9    -5.6
    -8.1   -20.4    -6.5    -9.6   -18.1    -4.0   -12.0   -15.4    -5.5   -10.3
   -11.9    -5.9   -12.1   -10.6    -2.9   -10.5   -13.2    -0.7    -7.0   -12.4
    -2.1    -7.7    -8.7    -1.4    -8.6    -9.6     2.3    -5.4   -11.8     2.5
    -3.4    -8.9     1.0    -4.7    -6.5     3.7    -4.0    -8.9     6.5    -0.5
    -9.7     5.2     0.1    -5.9     5.1    -0.9    -5.6     8.8     1.4    -8.5
     9.7     4.3    -7.3     7.8     4.1    -3.7     9.3     4.1    -5.4    12.8
     7.0    -7.7    12.2     9.1    -4.9    10.7     8.0    -2.5    13.7     6.8
     0.0    11.1     3.1     0.6    11.9     2.8     3.4     9.3
# Unfolding units in terms of SSEs
>>>> 1pptA    1
# node identifier, status, parent node, two child nodes, SSEs in this node
# node status codes: + / above domain level, * / selected domain, - / below domain level, = / small domain
   1 =    0   0   1   1
# Unfolding units in terms of residues
>>>> 1pptA    1
   1 =    0   0  36   1   1  36
# secondary structure states per residue
-dssp     "LLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHLLLLL
# amino acid sequence
-sequence "GPSQPTYPGDDAPVEDLIRFYDNLQQYLNVVTRHRY
# COMPND record from PDB entry
-compnd   " MOLECULE: AVIAN PANCREATIC POLYPEPTIDE;
# copied from DSSP output: sequential residue number, chain identifier, PDB residue number, accessibility, C-alpha coordinates                         "
-acc    1  A   1   101       1.500      -9.000      17.300
# lines for residues 2-35 removed
-acc   36  A  36   224       2.800       3.400       9.300

导入自己的私人结构，输出到/data/private/DAT

    /data/software/dali/DaliLite.v5/bin/import.pl --pdbfile mymodel.pdb --pdbid mine --dat /data/private/DAT --clean

import.pl接受未压缩和压缩的文件（扩展名为.gz）。
每个结构都有一个四个字母的标识符。标识符的长度必须为4，这是硬编码的。
链标识符将自动附加。
得到的五个字母标识符用于Dali的内部数据库; 该示例将为链’A’创建文件/data/private/DAT/mineA.dat。
结构比较要求所有查询结构都在一个目录（DALIDATDIR_1）中。
同样，所有目标结构必须位于一个目录中（DALIDATDIR_2）。 DALIDATDIR_1和DALIDATDIR_2可以相同，但通常DALIDATDIR_2包含从蛋白质数据库（PDB）下载的公共结构，DALIDATDIR_1包含私有结构。

从PDB中导入公共数据结构

数据库搜索需要PDB结构的本地副本。 –rsync选项维护PDB的副本并自动导入新结构。您可以每周一次从crontab执行以下命令，这是PDB的更新频率。 PDB条目将存储在/data/pdb下，而Dali内部格式的结构数据将存储在/data/ DAT中：

/data/software/dali/DaliLite.v5/bin/import.pl --rsync --pdbmirrordir /mnt/nfs/data/database/pdb --dat /data/database/dali_db --clean

–rsync选项将下载的PDB条目记录到pdb_update.log。如果在Dali导入步骤中出现任何问题，您可以提取新PDB文件列表并再次运行导入步骤：

grep '^..\/pdb' pdb_update.log | perl -pe 's/^/\/data\/pdb\//' > pdb_new.list # prepend path /data/pdb to PDB entries
bin/import.pl --pdblist pdb_new.list --dat /data/DAT --clean

3.2 为结构数据库搜索准备Blast数据库

结构数据库搜索使用序列比较作为聚类一般相似结构的手段。

从 ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ 安装BLAST可执行文件blastp和makeblastdb。如果blastp程序不在$ PATH（Linux环境变量）中，则可以使用dali.pl的–BLASTP_EXE选项指定它。

以下命令可用于将导入结构中的序列提取到FASTA文件中：

# create pdb.fasta
ls /data/database/dali_db | perl -pe 's/\.dat//' > pdb.list

/data/software/dali/DaliLite.v5/bin/dat2fasta.pl /data/database/dali_db < pdb.list | awk -v RS=">" -v FS="\n" -v ORS="" ' { if ($2) print ">"$0 } ' > pdb.fasta # awk removes empty sequences

建立pdb 本地blast库

makeblastdb -in pdb.fasta -out /home/you/pdb.blast -dbtype prot

可以使用dali.pl的–BLAST_DB选项指定数据库的位置。

PDB是高度冗余的。如果系统地搜索非冗余子集，并且在没有明确对齐的情况下消除不同结构的同源物，则数据库搜索更快。 CD-HIT（来自https://github.com/weizhongli/cdhit）可用于生成PDB的非冗余子集。我们使用BLAST和PDB_SELECT算法进行全对比比较，以生成PDB25，这是一个25％序列同一性的非冗余集合。全局比对高于25％序列同一性的蛋白质序列通常在结构上非常相似。

3.3 dali.pl: Structure comparison

3.3.1 命令行

dali.pl执行成对结构对齐和数据库搜索。完整的选项和语法列表如下：

USAGE: bin/dali.pl [ BASIC-OPTIONS] [MPI-OPTIONS] \
        ( --cd1 <xxxxX> |  --pdbfile1 <first.pdb> [ --pdbid1 <mol1> ] | --query <query.list> ) \
        ( --matrix | --cd2 <yyyyY> | --pdbfile2 <second.pdb> [ --pdbid2 <mol2> ] | --db <target.list> \
          [ ( --hierarchical | --walk [WALK-OPTIONS] ) --repset <pdb25.list> [BLAST-OPTIONS] ] ) 

        --cd1 <xxxxX>             query structure identifier
        --pdbfile1 <filename>     query structure in PDB format
        --pdbid1 <xxxx>           four-letter query structure identifier (chain identifier will be appended automatically) [default: mol1]
        --query <filename>        list of query structure identifiers
        --matrix                  all-against-all comparison. Generates additional outputs called 'ordered' (similarity matrix) and 'newick_unrooted' (dendrogram).
        --cd2 <xxxxX>             target structure identifier
        --pdbfile2 <filename>     target structure in PDB format
        --pdbid2 <yyyyy>          four-letter target structure identifier (chain identifier will be appended automatically) [default: mol2]
        --db <filename>           list of target structure identifiers
        --hierarchical            hierarchical structure database search
        --walk                    knowledge-based structure database search
        --repset <filename>       list of structure identifiers of non-redundant subset of PDB

        BASIC-OPTIONS:
        --dat1 <path>             path to directory containing query data [default: ./DAT/]
        --dat2 <path>             path to directory containing target data [default: ./DAT/]
        --oneway                  asymmetric structure comparison (A,B) only [default: symmetric (A,B) and (B,A)]
        --title <string>          written to output [default: test]
        --outfmt <string>         result blocks to output: summary,alignments,equivalences,transrot [default: summary]
        --clean                   remove temporary files

        MPI-OPTIONS:
        --np <integer>            number of processes if using openmpi (between 1 and 99) [default: 1]
`       --MPIRUN_EXE <string>     location of mpirun executable [default: /usr/lib64/openmpi/bin/mpirun ]

        BLAST-OPTIONS:
        --HMAX <integer>          number of top scoring representatives to send to final BLAST [default: 200]
        --KMAX <integer>          number of final BLAST hits to align structurally [default: 2000]
        --BLAST_DB <string>       location of Blast database [default: pdb.blast]
        --BLASTP_EXE <string>     location of Blast executable [default: blastp]
        --BLAST_NUM_THREADS <integer>   number of threads when runnign Blast [default: 32]

        WALK-OPTIONS:
        --targetset <pdb25.list>  used with H to limit the radius of the walk [default: same as --repset]
        --H <integer>             walk radius is Z-score of Hth hit in the target set [default: 100]
        --MAX_HITS <integer>      number of hits returned from walk [default: 10000]
        --MAX_DALICON <integer>   max number of comparisons performed during walk [default: 10000]

成对比较

bin/dali.pl ( --cd1 <xxxxX> | --query <query.list> ) ( --cd2 <yyyyY> | --db <target.list> ) [BASIC-OPTIONS]
bin/dali.pl --pdbfile1 first.pdb [ --pdbid1 mol1 ] --pdbfile2 second.pdb [ --pdbid2 mol2 ] [BASIC-OPTIONS]

一系列结构跟另外一系列比对

bin/dali.pl --matrix --query <query.list> [BASIC-OPTIONS]

数据库搜索：

bin/dali.pl --hierarchical --repset <pdb25.list> ( --cd1 <xxxxX> | --query <query.list> ) --db <pdb.list> [BASIC-OPTIONS] [BLAST-OPTIONS]

bin/dali.pl --walk --repset <pdb25.list> ( --cd1 <xxxxX> | --query <query.list> ) --db <pdb.list> [BASIC-OPTIONS] [BLAST-OPTIONS] [WALK-OPTIONS]

注意：

必须输入事先使用import.pl导入所有结构（除非您使用的是–pdbfile1和–pdbfile2选项）。
可以将单个查询（–cd1）或查询结构列表（–query）与单个目标（–cd2）或目标结构列表（–db）进行比较。
查询和目标结构由五个字母的标识符xxxxX指定，其中前四个字母xxxx表示PDB条目，第五个字母是链标识符

3.3.2 输出形式

所有功能都以类似的格式产生输出。报道了Z分数高于2的结构比对。每个查询结构xxxxX的输出为xxxxX.txt。我们使用以下示例：

# import two PDB entries. They will be split into chains.

/home/you/DaliLite.v5/bin/import.pl --pdbfile ./toy_PDB/pdb1ppt.ent.gz --pdbid 1ppt --dat ./DAT > /dev/null
/home/you/DaliLite.v5/bin/import.pl --pdbfile ./toy_PDB/pdb1bba.ent.gz --pdbid 1bba --dat ./DAT > /dev/null
# pairwise alignment of two structures
/home/you/DaliLite.v5/bin/dali.pl --cd1 1pptA --cd2 1bbaA --dat1 ./DAT --dat2 ./DAT --title "output options" --outfmt "summary,alignments,equivalences,transrot" --clean 2> err

1pptA.txt为输出：

# Job: output options
# Query: 1pptA
# No:  Chain   Z    rmsd lali nres  %id PDB  Description
   1:  1bba-A  3.6  1.8   33    36   39   MOLECULE: BOVINE PANCREATIC POLYPEPTIDE;

# Pairwise alignments

No 1: Query=1pptA Sbjct=1bbaA Z-score=3.6

DSSP  LLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHLLlll
Query GPSQPTYPGDDAPVEDLIRFYDNLQQYLNVVTRhry   36
ident  |  | |||| |  |        |  | |  ||
Sbjct APLEPEYPGDNATPEQMAQYAAELRRYINMLTRpry   36
DSSP  LLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHLLlll


# Structural equivalences
   1: 1ppt-A 1bba-A     1 -  33 <=>    1 -  33   (GLY    1  - ARG   33  <=> ALA    1  - ARG   33 )

# Translation-rotation matrices
-matrix  "1ppt-A 1bba-A  U(1,.)   0.631906 -0.761372 -0.144939           -0.890845"
-matrix  "1ppt-A 1bba-A  U(2,.)   0.512616  0.550832 -0.658642          -10.882093"
-matrix  "1ppt-A 1bba-A  U(3,.)   0.581308  0.341902  0.738366            4.946664"

您可以使用–outfmt选项选择要输出的内容。默认情况下，仅输出摘要内容。下面的块1-4分别是 summary, alignments, equivalences, and transrot。输出文件的第一行是作业说明，可以使用–title选项进行设置。

顶部是summary。在这里，我们将1pptA与一个结构进行比较，因此只有一个结果。包含：

链：命中的结构和链标识符（匹配的蛋白质）
Z得分：摘要列表按Z得分排序。报告的Z分数高于2。具有较高Z分数的命中与查询更相似。
rmsd：三维叠加中结构等效C-alpha原子的均方根偏差
lali：结构上等同的C-α原子数
nres：目标结构中的残基数
％id：结构等同残基中相同氨基酸的百分比
Description：PDB文件的COMPND记录

alignments 。对齐块显示查询和每个匹配的成对对齐。大写字符在结构上是等效的残基，小写表示残基未对齐，但它们打印在同一列中以节省空间。二级结构状态（H =螺旋，E =链，L =环）显示在氨基酸序列的上方/下方。相同的氨基酸用垂直线标记。
equivalences 。结构比对块列出了结构对齐的段的顺序（PDB残基数）。
Translation-rotation matrices。平移 - 旋转矩阵可用于将目标结构叠加到查询结构的坐标系上。旋转矩阵U是由前三个数字列给出的3乘3矩阵。translation 向量T在最后一列中给出。设X是目标结构的（x，y，z）坐标的3-by-nres矩阵。 UX + T产生X在查询结构上的最小二乘叠加。

3.3.2 Pairwise comparison examples

成对比较仅使用结构数据（无序列数据或外部程序）。必须事先使用import.pl将所有结构数据导入到数据目录/home/you/DAT。您可以将两种结构相互比较：

/home/you/DaliLite.v5/bin/dali.pl --cd1 1pptA --cd2 1bbaA --dat1 /home/you/DAT --dat2 /home/you/DAT --clean 2> err

您可以使用结构列表作为查询和目标。每个查询都将与所有目标进行比较。以下等同于上面的示例：

echo 1pptA > query.list
echo 1bbaA > target.list
/home/you/DaliLite.v5/bin/dali.pl --query query.list --db target.list --dat1 /home/you/DAT --dat2 /home/you/DAT --clean 2> err

要系统地将一组查询与PDB的非冗余子集（事先准备好）进行比较，您可以：

/home/you/DaliLite.v5/bin/dali.pl --query query.list --db pdb25.list --dat1 /home/you/DAT --dat2 /data/DAT --clean 2> err

applymatrix.pl

实用程序脚本applymatrix.pl将目标结构的坐标叠加到查询结构的坐标系上。在这个例子中，我们首先生成1ppt和1bba的成对结构对齐，然后生成一个新的PDB文件sup.pdb，其中包含转换后的坐标1bba。

# import PDB structures
/home/you/DaliLite.v5/bin/import.pl --pdbfile /home/you/DaliLite.v5/toy_PDB/pdb1ppt.ent.gz --pdbid 1ppt
/home/you/DaliLite.v5/bin/import.pl --pdbfile /home/you/DaliLite.v5/toy_PDB/pdb1bba.ent.gz --pdbid 1bba
# structural alignment, output translation-rotation matrices to 1pptA.txt
/home/you/DaliLite.v5/bin/dali.pl --cd1 1pptA --cd2 1bbaA --dat1 /home/you/DaliLite.v5/DAT --dat2 /home/you/DaliLite.v5/DAT --outfmt "summary,transrot" --clean
# transform the coordinates of the original target PDB file
/home/you/DaliLite.v5/bin/applymatrix.pl /home/you/DaliLite.v5/toy_PDB/pdb1bba.ent.gz < 1pptA.txt > sup.pdb
# we know that 1pptA:1-33 and 1bbaA:1-33 are structurally equivalent segments
# peek at the transformed coordinates of 1bba
grep ^ATOM sup.pdb | grep ' CA ' | head
# compare to 1ppt
zcat /home/you/DaliLite.v5/toy_PDB/pdb1ppt.ent.gz | grep ^ATOM | grep ' CA ' | head

3.3.3 Pairwise comparison examples

–matrix选项的作用类似于与相同查询和目标列表的成对比较。创建查询结构列表（文件“query.list”）:

101mA   MYOGLOBIN
1a00A   HEMOGLOBIN (ALPHA CHAIN)
1a87A   COLICIN N
1allA   ALLOPHYCOCYANIN
1binA   LEGHEMOGLOBIN A

上述结构已导入到DAT文件夹中。执行命令

/home/you/DaliLite.v5/bin/dali.pl --matrix --query query.list --dat1 /home/you/DaliLite.v5/DAT --clean 2> /dev/null

除了五个xxxxX.txt文件之外，还生成了这个相似性矩阵（文件’ordered'）：

5
1a87A   48.7    7.4     3.1     6.4     5.6
1allA   7.4     29.7    8.1     9.0     8.7
1binA   3.1     8.1     30.8    15.2    13.6
101mA   6.4     9.0     15.2    31.7    20.6
1a00A   5.6     8.7     13.6    20.6    30.5

通过平均连锁聚类（average linkage clustering）从相似性矩阵生成结构树形图。分支长度被转换为ad hoc距离，其中距离是相似性的差异。结构树形图以Newick格式输出（文件’newick’和’newick_unrooted'）：

((((1a00A_HEMOGLOBIN_ALPHA_CHAIN:9.9,101mA_MYOGLOBIN:11.1):6.2,1binA_LEGHEMOGLOBIN_A:16.4):5.8,1allA_ALLOPHYCOCYANIN:21.1):2.975,1a87A_COLICIN_N:43.075);

许多系统发育树绘图程序都接受Newick格式。例如，您可以将Newick字符串粘贴到phylo.io.

3.3.3 Database search examples

准备步骤

在运行结构数据库搜索之前，必须镜像PDB数据库并准备本地PDB-Blast数据库。

# mirror PDB
/home/you/DaliLite.v5/bin/import.pl --rsync --pdbmirrordir /data/pdb --dat /data/DAT --clean
# extract PDB sequences
ls /data/DAT/ | perl -pe 's/\.dat//' > pdb.list
/home/you/DaliLite.v5/bin/dat2fasta.pl /data/DAT < pdb.list | awk -v RS=">" -v FS="\n" -v ORS="" ' { if ($2) print ">"$0 } ' > pdb.fasta # awk removes empty sequences
# create PDB-Blast database
makeblastdb -in pdb.fasta -out /home/you/pdb.blast -dbtype prot
# create PDB70 non-redundant subset of PDB
cd-hit -i pdb.fasta -c 0.7 -o pdb70.fasta
grep '^>' pdb70.fasta | perl -pe 's/^>//' > pdb70.list

使用cd-hit方便地生成PDB70。可以使用PDB25代替PDB70而不会降低性能。然而，PDB25要求Blast进行全对抗序列比较。

如果还没有完成，请记住导入私有PDB结构：

/home/you/DaliLite.v5/bin/import.pl --pdbfile mymodel.pdb --pdbid mine --dat /data/private/DAT --clean

分层搜索

分层搜索执行查询结构与PDB的非冗余子集的系统比较。然后，它使用Blast识别最高得分命中的序列邻居，并将其结构alignments添加到结果中。

/home/you/DaliLite.v5/bin/dali.pl --hierarchical --repset pdb70.list --cd1 mineA --db pdb.list --dat1 /data/private/DAT --dat2 /data/pdb --np 40 --clean

–np npara是并行进程的数量。默认值为npara = 1，它将运行软件的串行版本，不需要openmpi。输出在nxxxA.txt中生成，其中nxxxA是查询标识符。报告Z分数高于2的目标。 Z分数存在尺寸依赖性。需要低阈值来捕捉小domains的折叠级别相似性，但对于较大的结构，输出中可能有数千个结果。

基于知识的搜索

基于知识的搜索，使用快速近似结构比较方法来在预先计算的结构相似性的稀疏网络中找到入口点。然后它以迭代方式“行走”到最近的结构。通过Internet远程访问知识库，因此您必须具有活动的Internet连接。

/home/you/DaliLite.v5/bin/dali.pl --walk --repset pdb70.list --cd1 mineA --db pdb.list --dat1 /data/private/DAT --dat2 /data/pdb --np 40 --H 100 --targetset pdb70.list --clean

输出在nxxxA.txt中生成，其中nxxxA是查询标识符。基于知识的搜索动态调整输出的Z得分阈值。它旨在完全覆盖Z分数高于属于目标集的Hth（–H 100）的hits（–targetset pdb70.list）。目的是限制输出量，但达到有趣的倍数水平相似性。如果查询结构包含多个域，建议您单独搜索每个域，否则可能会将命中集中到一个域，而其他域不会被覆盖。

注：

DaliLite在当前工作目录（CWD）中写入了许多中间结果。如果作业成功完成，则会自动删除锁定文件。如果存在名为dali.lock的文件，则会收到以下错误消息，并且无法在同一目录中启动另一个DaliLite作业：

Directory is locked by dali.lock
      there may be another DALI process running in this work directory
       or, the previous run crashed: remove the dali.lock file

四、报错

五、原理背景介绍

六、讨论

参考资料

http://ekhidna2.biocenter.helsinki.fi/dali/
Liisa Holm; Laura M. Laakso (2016) Dali server update. Nucleic acids research 44 (W1), W351-W355.
http://ekhidna2.biocenter.helsinki.fi/dali/README.v5.html
https://pdfs.semanticscholar.org/9c21/b7300178db8b18e2e289db810284f1575c3b.pdf

药企，独角兽，苏州。团队长期招人，感兴趣的都可以发邮件聊聊：tiehan@sina.cn

个人公众号，比较懒，很少更新，可以在上面提问题，如果回复不及时，可发邮件给我： tiehan@sina.cn