【5.4.3.1】DNA Chisel，一种多功能序列优化器

January 02, 2023 rna 阅读量：次

网页工具： https://cuba.genomefoundry.org/sculpt_a_sequence
开源 Python 库使用 https://github.com/Edinburgh-Genome-Foundry/DNAChisel
说明文档： https://edinburgh-genome-foundry.github.io/DnaChisel/

已经提出了软件解决方案来解决各种情况，包括宿主特异性密码子优化或协调（Claassens等人，2017;Richardson等人，2012），通过CpG岛富集增强基因表达（Raab等人，2010），生物中性序列的设计（Casini等人，2014）或去除合成阻碍的DNA模式（Oberortner等人，2017）。然而，这些项目专注于特定的目标和预定的序列位置（如编码区域），并且很难集成到同一个工作流程中，因为它们的优化可能会相互抵消。D-tailor 框架（Guimaraes et al.， 2014）提出了一种编程解决方案，使用户能够通过 Python 脚本自由定义和组合规范，重点是探索多目标问题。

在 DNA Chisel 中，优化问题由全局或局部规范列表定义，根据这些规范将优化起始线性或循环序列。规范可以是硬约束，必须在最终序列中满足，也可以是优化目标，其分数必须最大化。例如，规范 AvoidChanges 可以用作禁止在给定区域中修改序列的约束，也可以用作简单地惩罚该区域中的更改的目标。在存在多个优化目标（可归因于相对权重）的情况下，DNA Chisel 将寻求使用下一节中描述的启发式方法最大化总加权分数（多目标基因优化的示例在补充部分S1B).

一、软件介绍

1.1 软件可以做什么

https://edinburgh-genome-foundry.github.io/DnaChisel/ref/builtin_specifications.html

1.2 软件逻辑

Figure S2: Restriction of the mutation space by different specification classes. In this example the problem consists of a 21-nucleotide sequence and four constraints. The mutation space, represented in red, consists of contiguous sub-segments, each associated with a set of sequence choices. For instance, the first nucleotide is unconstrained and can take any of the four possible values. The next nucleotides are constrained by @cds which enforces synonymous codons mutations, and these restrictions are combined with a @keep constraint which keeps the affected sequence segment in its original state.

优化算法

DNA Chisel 的算法首先确保验证所有约束，然后根据约束优化目标。求解器遵循以下过程：

对于每个约束：
评估问题的约束以找到所有违规的位置。
对于每个违规位置，从左到右：定义本地问题以通过本地搜索解决违规问题，同时确保不要对已经本地验证的约束造成新的破坏。
对于每个优化目标：
评估问题的目标以找到所有次优区域的位置。
对于每个次优区域，从左到右：定义一个局部问题并用它来优化局部区域，以提高总体目标得分，同时确保所有约束得到验证。

虽然其他一些框架也使用局部优化，但 DNA Chisel 引入了新技术（在下一段中描述）来简化局部问题并加速解决。

定义局部问题

在DNA Chisel中，局部问题是问题的一个版本，其中只有一小段序列(在本节中表示为[start, end])会发生突变，以便局部解决特定的约束破坏，或增加序列相对于局部目标的适应度。

局部序列优化

选择一种搜索方法来探索突变空间。
使用此方法并找到满足所有约束（解决约束时）或最大化目标分数（优化目标时）的序列变体。
将问题的序列替换为成功的变体，然后转到下一个位置进行优化

如果序列只允许少量的变体，求解器执行穷举搜索。如果这个数字超过某个阈值(可以由用户设置，默认为10,000)，则使用引导随机搜索。最后，在被解析(或优化)的规范实现其自己的解析方法的(不常见的)情况下，将使用此自定义方法

二、本地软件

说明文档： https://edinburgh-genome-foundry.github.io/DnaChisel/

2.1 安装

pip install dnachisel     # <= minimal install without reports support
pip install 'dnachisel[reports]' # <= full install with all dependencies

2.2 使用

目标：

* It will be rid of BsaI sites (on both strands).
* GC content will be between 30% and 70% on every 50bp window.
* The reading frame at position 500-1400 will be codon-optimized for E. coli.

代码：

from dnachisel import *

# DEFINE THE OPTIMIZATION PROBLEM

problem = DnaOptimizationProblem(
    sequence=random_dna_sequence(10000),
    constraints=[
        AvoidPattern("BsaI_site"),
        EnforceGCContent(mini=0.3, maxi=0.7, window=50),
        EnforceTranslation(location=(500, 1400))
    ],
    objectives=[CodonOptimize(species='e_coli', location=(500, 1400))]
)

# SOLVE THE CONSTRAINTS, OPTIMIZE WITH RESPECT TO THE OBJECTIVE

problem.resolve_constraints()
problem.optimize()

# PRINT SUMMARIES TO CHECK THAT CONSTRAINTS PASS

print(problem.constraints_text_summary())
print(problem.objectives_text_summary())

# GET THE FINAL SEQUENCE (AS STRING OR ANNOTATED BIOPYTHON RECORDS)

final_sequence = problem.sequence  # string
final_record = problem.to_record(with_sequence_edits=True)

输出结果：

from dnachisel import DnaOptimizationProblem
problem = DnaOptimizationProblem.from_record("my_record.gb")
problem.optimize_with_report(target="report.zip")

分析问题：

problem = DnaOptimizationProblem(...)
problem.optimize_with_report(target="report.zip")

三、我的案例

3.1 替换某个motif

from dnachisel import *
# from dnachisel.biotypes import Protein


one_seq = 'ATGAAGGCGATCATCGTCCTGCTCATGGTGGTGACGAGCAACGCGGATCGGATCTGCACCGGGATCACCTCCAGCAATTCACCTCACGTGGTG'

problem = DnaOptimizationProblem(
    sequence=one_seq,
    
    constraints=[
        AvoidPattern("GTGG",strand=0),
        EnforceTranslation()
    ],
    
)  # objectives=[CodonOptimize(species='h_sapiens')]  sequence_type=Protein,

# SOLVE THE CONSTRAINTS, OPTIMIZE WITH RESPECT TO THE OBJECTIVE


problem.resolve_constraints()
problem.optimize()


# PRINT SUMMARIES TO CHECK THAT CONSTRAINTS PASS

print(problem.constraints_text_summary())
print(problem.objectives_text_summary())

# GET THE FINAL SEQUENCE (AS STRING OR ANNOTATED BIOPYTHON RECORDS)

final_sequence = problem.sequence  # string
final_record = problem.to_record(with_sequence_edits=True)

print(final_sequence)

print(final_record)

EnforceTranslation 太重要了，保证序列翻译出来的氨基酸序列不变。。

如果Motif是一个List，可以这样

avoid_patterns = [EnforceTranslation()]
for one_mo in moitf_list:
    avoid_patterns.append(AvoidPattern(one_mo, strand=0),)
    
problem = DnaOptimizationProblem(
    sequence=one_seq,
    constraints= avoid_patterns,

)

参考资料

https://academic.oup.com/bioinformatics/article/36/16/4508/5869515?login=false 。 DNA Chisel, a versatile sequence optimizer

药企，独角兽，苏州。团队长期招人，感兴趣的都可以发邮件聊聊：tiehan@sina.cn

个人公众号，比较懒，很少更新，可以在上面提问题，如果回复不及时，可发邮件给我： tiehan@sina.cn