NGS QC Toolkit

一、安装

yum install gd-devel

cpan 
cpan[1]>    install String::Approx


wget -c https://src.fedoraproject.org/repo/pkgs/perl-GD/GD-2.46.tar.gz/ea86a94eb45330eae27ecbfd5c2f43bb/GD-2.46.tar.gz
tar zxvf GD-2.46.tar.gz
cd GD-2.46
perl Makefile.PL
make
make install


git clone https://github.com/mjain-lab/NGSQCToolkit.git

注:

我这个安装不是在conda的环境中

二、用法

2.1 QC

IlluQC.pl

Tool for quality control of sequencing data generated using Illumina

platform (FASTQ format) 具体用法可以通过终端 perl IlluQC.pl

主要是去除dapter和低质量的碱基,并有统计结果

可以得到如下的结果(有图有说明)

  1. 每个位置的碱基的平均质量
  2. 每个GC值对应的reads数
  3. 每个质量值对应的reads数
  4. 每个位置对应的碱基个数
  5. 每个位置对应的输入碱基位置和质检后的碱基个数
  6. 对质量的总结。

IlluQC_PRLL.pl

This tool has the same functionality as IlluQC.pl. However, it provides an additional option to use multiple CPUs to speed up the analysis

454QC.pl

Tool for quality control of sequencing data generated using 454 platform (read and quality in FASTA format)

454QC_PRLL.pl

Tool performs same quality control analysis as 454QC.pl and helps to analyze data using multiple CPUs

454QC_PE.pl

Tool for quality control of paired-end sequencing data generated using 454 platform (read and quality in FASTA format)

2.2 Format-converter

这个就是说各个测序出来的fastq文件转换为其他测序机器测序出来的fastq的格式

  • SangerFastqToIlluFastq.pl: To convert fastq-sanger variant to fastq-illumina variant of FASTQ format
  • SolexaFastqToIlluFastq.pl: To convert fastq-solexa variant to fastq-illumina variant of FASTQ format
  • FastqTo454.pl: To convert FASTQ format (any variant) to 454 format (two files in FASTA format: one for reads/sequences (.fna) and another for quality (.qual))
  • FastqToFasta.pl: To convert FASTQ format file to FASTA format file for reads/sequences

2.3 Trimming

TrimmingReads.pl: Tool for trimming reads from 5’ and/or 3’ end of the read (FASTQ or FASTA format)

可以根据认为设定一个阈值来删除那些低质量的碱基,也可以设定左边或右边删除多少个碱基,长度低于多少的就给丢掉等等,

HomoPolymerTrimming.pl: Tool for trimming 3’ end of the reads from the first base of homopolymer of given length

AmbiguityFiltering.pl: Tool for filtering reads containing ambiguous bases or trimming flanking ambiguous bases

2.4 Statistics

AvgQuality.pl: Tool to calculate average quality score for each read and overall quality score for the given FASTA quality file

N50Stat.pl: Tool to generate statistics for read/sequence data given in FASTA format 这个牛逼,可以用来统计你聚类后的contig,给出你聚类后的contig的基本信息。

下载地址:http://www.nipgr.res.in/ngsqctoolkit.html

三、案例

切掉reads左端15bp,使用软件是 NGSQCToolki的Trimming/TrimmingReads.pl,命令如下:

nohup perl /path/NGSQCToolkit/Trimming/TrimmingReads.pl -i Sample_gen_20160524_GTGAAA_L001_R1.fastq  -l 15 -n 25  &

更多用法:

参见 https://www.jianshu.com/p/936fb7789e62/

参考资料

药企,独角兽,苏州。团队长期招人,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn