bed:chromStart从0开始
gff3:chromStart从1开始
https://cloud.tencent.com/developer/article/1078324
https://www.plob.org/article/3748.html
A.bed
1 | chr1 69200 70000 TCGA-3C-AAAU-10A-01D-A41E-01 53225 0.0055 |
B.bed
1 | chr1 69091 70008 OR4F5 |
1 | bedtools intersect -a A.bed -b B.bed -wa -wb | bedtools groupby -i - -g 1-4 -c 10 -o collapse |
1 | chr1 69200 70000 TCGA-3C-AAAU-10A-01D-A41E-01 OR4F5 |
需要这样的一个文件
1 | chr1 69091 70008 exon |
1 | bedtools intersect -a CpG.hyper.bed -b csi.bed -wa -wb | bedtools groupby -i - -g 1-4 -c 10 -o collapse > result.bed |
实践
- 用Galaxy 工具 gff-to-bed将gff3 文件转成bed,然后提取前四列
1 | [qizhengyang@node1 bedtest]$ cat Galaxy3-\[GFF-to-BED_on_data_1\].bed | cut -f1,2,3,4 > csi.bed |
- 处理CpG_regions_myDiff25p.hyper.txt
1 | qizhengyang@node1 bedtest]$ cat CpG_regions_myDiff25p.hyper.txt |cut -f1,2,3 |
- 用bedtools工具进行注释
1 | [qizhengyang@node1 bedtest]$ bedtools intersect -a CpG.hyper.bed -b csi.bed -wa -wb | bedtools groupby -i - -g 1-3 -c 7 -o collapse > result1.bed |
bedtools merge
案例三:-d 两个独立区域间距小于(等于)该值时将被合并为一个区域;-o collapse显示合并了哪些标签
1 | chr1 36001 36100 |
1 | bedtools merge -i test.merge -d 5 -c 1 -o count,collapse |
1 | chr1 36001 36108 2 chr1,chr1 |
1 | 只处理前三列,不会处理第一行 |
CpG_regions_myDiff25p.hyper.txt
1 | chr start end strand pvalue qvalue meth.diff |
Galaxy3-[GFF-to-BED_on_data_1].bed
1 | chr4 17943 22804 gene 0 + |
流程:1
2
3
4bedtools merge -i CpG_regions_myDiff25p.hyper.txt -d 5 -c 1 -o count,collapse > CpG.merge.hyper.txt
# 可以用CpG.merge.hyper.txt这个merge之后的文件,-c 9
bedtools intersect -a CpG.merge.hyper.txt -b Galaxy4-\[GFF-to-BED_on_data_3\].bed -wa -wb | bedtools groupby -i - -g 1-3 -c 9 -o collapse | less
1 | dos2unix csi.promoter.txt |
然后写python脚本清理信息
1 | with open('DMR_annotation.txt','w') as a: |
1 | chr1 36001 36100 TE |