【4.6.1】预测microRNA结合位点--miRanda

一、 安装

#wget -c https://link.zhihu.com/?target=http%3A//cbio.mskcc.org/microrna_data/miRanda-aug2010.tar.gz
wget -c http://ftp.genek.cn:8888/Share/linux_software/miRanda-aug2010.tar.gz
tar -xvf miRanda-aug2010.tar.gz
cd miRanda-3.3a/
./configure --prefix=/data4/software/microRNA/miRanda-3.3a-install
make && make install
cd /data4/software/microRNA/miRanda-3.3a-install/bin
./miranda --help

二、测试

cd /data4/software/microRNA/miRanda-3.3a/examples

[sam@c01 examples]$ ll
total 16
-rw-r--r-- 1 sam sam   95 Mar 27  2010 bantam_stRNA.fasta
-rw-r--r-- 1 sam sam 4086 Mar 27  2010 hid_UTR.fasta

[sam@c01 examples]$ head bantam_stRNA.fasta
>gi|29565487|emb|AJ550546.1| Drosophila melanogaster microRNA miR-bantam
GTGAGATCATTTTGAAAGCTG

[sam@c01 examples]$ head hid_UTR.fasta
>gi|945100|gb|U31226.1|DMU31226 Drosophila melanogaster head involution defective protein (hid) mRNA, complete cds (3'UTR only)
TGACAAAAAATAAAAAACGAAATCCATCGTGAACAGTTTTGTGTTTTTAAATCAGTTCTAAACACGAAAA
GGGTTGATGAAAAACGCAGAAGAATCCGAAAAACTAACTAACCGAGCAAAAACTTGACTTGAGTGTTGTT
TGACAAATCAGGAAAGATAAAAAACAAATCATAAGAAAAAACTGCACGAAAAATGAAAAAGTTTCTAATA
TTCAAAATCTTGCACAAGAAATACAAAATCAATTAAAGTGAACTCTAACCAAAAGTTGTACACAAAATAA
AAAGCAAAACAAAGCAGCGAAGAACAATCACAAGAAGAGCAAAGTGCCAACAAAGTGCAGGAAGGAAGGA
AGCGGATAAGGACAAAAAGGAAGCCAGCACACACACACACACCCACACAATGGCCGTGCCCTTTTATTTG
CCCGAGGGCGGCGCCGATGACGTAGCGTCGAGTTCATCGGGAGCCTCGGGCAACTCCTCCCCCCACAACC
ACCCACTTCCCTCGAGCGCATCCTCGTCCGTCTCCTCCTCGGGCGTGTCCTCGGCCTCCGCCTCCTCGGC
CTCATCTTCGTCATCCGCATCGTCGGACGGCGCCAGCAGCGCCGCCTCGCAATCGCCGAACACCACCACC

/data4/software/microRNA/miRanda-3.3a-install/bin/miranda bantam_stRNA.fasta hid_UTR.fasta


miranda bantam_stRNA.fasta  hid_UTR.fasta -sc 140 -en -1 > test.out.txt
[sam@c01 examples]$ cat test.out.txt | grep ">"
>gi|29565487|emb|AJ550546.1|    gi|945100|gb|U31226.1|DMU31226  167.00  -24.54  2 20    3340 3360      18       83.33%  94.44%
>gi|29565487|emb|AJ550546.1|    gi|945100|gb|U31226.1|DMU31226  156.00  -20.03  2 17    2505 2525      15       86.67%  93.33%
>gi|29565487|emb|AJ550546.1|    gi|945100|gb|U31226.1|DMU31226  155.00  -14.57  2 16    2852 2872      14       78.57%  85.71%
>gi|29565487|emb|AJ550546.1|    gi|945100|gb|U31226.1|DMU31226  152.00  -14.18  2 18    3820 3841      17       76.47%  76.47%
>>gi|29565487|emb|AJ550546.1|   gi|945100|gb|U31226.1|DMU31226  630.00  -73.32  167.00  -24.54  1      21       3902     3340 2505 2852 3820

结果解读:

1)">“开头的几行,是miRNA(gi|29565487|emb|AJ550546.1|)靶标到基因(gi|945100|gb|U31226.1|DMU31226)上的不同位置

2)">“开头的行对应的信息依次是:miRNA id,基因id,打分,自由能,miRNA起始位置,miRNA终止位置,基因起始位置,基因终止位置,靶标结合miRNA长度,miRNA结合长度与miRNA总长占比,基因上结合长度(大于等于前者,可能是一个跨度)对miRNA总长占比

3)"»“的行是综合信息,依次代表:miRNA id,基因id,总打分,总自由能,最大打分,最大自由能,链信息,miRNA长度,基因长度,靶标到基因上的位置(1个或多个,这里是4个)

4)综上,我们可以抓取“»”的行,来获取靶向的基因,如果需要其他信息也可以自己按需提取,so easy

三、个性化分析

3.1 下载miRBase成熟序列,提取human的miRNA进行举例

cd /data4/software/microRNA
wget https://www.mirbase.org/download/mature.fa

下載相关物种的microRNA

https://www.mirbase.org/browse/results/?organism=mmu
https://www.mirbase.org/browse/results/?organism=hsa

grep -A 1 'Homo sapiens' mature.fa >hsa-mir.fa
grep -A 1 'Mus musculus' mature.fa >mmu-mir.fa
cat mmu-mir.fa hsa-mir.fa > hsa-mmu-mir.fa
grep -v '\-\-' hsa-mmu-mir.fa >hsa-mmu-mir-2.fa


/data4/software/microRNA/miRanda-3.3a-install/bin/miranda /data4/software/microRNA/hsa-mmu-mir-2.fa input/154-no5UTR.fa -sc 140 -en -1 | grep ">>" > result_1.txt


$ less -SN hsa_mature10_targets.txt
      1 >>hsa-let-7a-5p ENST00000000412.7       146.00  -11.97  146.00  -11.97  2       22      2756     960
      2 >>hsa-let-7a-5p ENST00000001008.5       143.00  -17.75  143.00  -17.75  4       22      3732     2611
      3 >>hsa-let-7a-5p ENST00000002125.8       156.00  -21.39  156.00  -21.39  6       22      2176     1248
      4 >>hsa-let-7a-5p ENST00000002829.7       146.00  -16.38  146.00  -16.38  10      22      3802     3480
      5 >>hsa-let-7a-5p ENST00000003084.10      299.00  -39.65  152.00  -20.76  11      22      6132     3077 5087
      6 >>hsa-let-7a-5p ENST00000003100.12      143.00  -18.40  143.00  -18.40  12      22      3210     2586
      7 >>hsa-let-7a-5p ENST00000003302.8       438.00  -47.15  152.00  -16.09  13      22      4669     1132 2011 4340
      8 >>hsa-let-7a-5p ENST00000004531.14      163.00  -20.40  163.00  -20.40  17      22      7560     1101
      9 >>hsa-let-7a-3p ENST00000002125.8       161.00  -12.52  161.00  -12.52  23      21      2176     1476
     10 >>hsa-let-7a-3p ENST00000002165.10      158.00  -14.96  158.00  -14.96  24      21      2356     1993
     11 >>hsa-let-7a-3p ENST00000003084.10      148.00  -6.47   148.00  -6.47   28      21      6132     5758
     12 >>hsa-let-7a-3p ENST00000003302.8       149.00  -16.95  149.00  -16.95  30      21      4669     3959
     13 >>hsa-let-7a-3p ENST00000003912.7       146.00  -7.39   146.00  -7.39   32      21      5481     3640
     14 >>hsa-let-7a-3p ENST00000004531.14      292.00  -16.76  146.00  -13.42  34      21      7560     3954 6657

提取相应的结果

cat *out |grep ">>" |sort -k5,5nr |awk 'NR<50000' |sed 's/>>//g' |cat <(echo "Seq1,Seq2,Tot Score,Tot Energy,Max Score,Max Energy,Strand,Len1,Len2,Positions" |tr "," "\t") - >MirandaOutput.tab

四、讨论

有个在线的工具,可以使用

參考資料

药企,独角兽,苏州。团队长期招人,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn