linux shell中将fasta文件按照每行指定碱基数输出
2022/2/3 7:12:27
本文主要是介绍linux shell中将fasta文件按照每行指定碱基数输出,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
1、测试数据
root@PC1:/home/test# ls record.txt test.fa root@PC1:/home/test# cat test.fa >OR4F29_ENSG00000284733_ENST00000426406_20_955_995 AGCCCAGTTGGCTGGACCAATGGAT GGAGAGAATCACTCAGTGGTATCTGAG TTTTTGTTTCTGGGACTC >OR4F16_ENSG00000284662_ENST00000332831_20_955_995 AGCCCAGTTGGCTGGACCAATGGATGGAG AGAATCACTCAGTGGTATCTGAGTTTTTGTTTCTGGGACTCAC >OR4F29_ENSG00000284733_ENST00000426406_20_955_995 AGCCCAGTTGGCTGGA CCAATGGATGGAGAGAATCACTCAGTGGTATCTGAGTTTTT GTTTCTGGGACTCACT >OR4F16_ENSG00000284662_ENST00000332831_20_955_995 AGCCCAGTTGGCTGGACCAATGGATGGAGAGA ATCACTCAGTGGTATCTGAGTTTTTGTTTCTGG >OR4F29_ENSG00000284733_ENST00000426406_20_955_995 AGCCCAGTTGGCTGGA CCAATGGATGGAGAGAATCACTCAGTGGTATCTGAGTTTTT GTTTCTGGGACTCACT
2、脚本
root@PC1:/home/test# ls record.txt test.fa root@PC1:/home/test# cat record.txt #step1 grep -n "^>" test.fa | cut -d ":" -f 1 | paste -d " " -s | awk '{for(i = 1; i < NF; i++) printf("%d %d ", $i+1,$(i+1)-1); printf("\n")}' | awk '{for(i = 1; i <= NF; i++) if(i % 2 == 0) {print $i} else {printf("%s ", $i)}}' > topindex.txt #step2 sed -n "/^>/=" test.fa | awk 'END{print $0 + 1}' | paste - -d " " <(sed -n "$=" test.fa ) > endindex.txt #step3 此处6位指定每行多少个碱基,可以设定为其他数值 cat topindex.txt endindex.txt | while read {i,j}; do awk -v a=$i -v b=$j "NR == a, NR == b" test.fa | awk '{printf("%s", $0)} END {print}' | awk -v c=$i -F "" '{printf("tag%d",c); for(i = 1; i <= NF; i++) if(i % 6 == 0) {print $i} else {printf("%s ", $i)}; printf("\n"); idx = idx + 1} ' >> tempresult; done #step4 grep "^>" test.fa | paste - -d " " <(grep "^tag" tempresult ) | while read {i,j}; do sed -i "/$j/i $i" tempresult ; done #step5 sed 's/^tag[0-9]*//g' tempresult > result.fa rm endindex.txt topindex.txt tempresult
3、测试
root@PC1:/home/test# ls record.txt test.fa root@PC1:/home/test# bash record.txt root@PC1:/home/test# ls record.txt result.fa test.fa root@PC1:/home/test# cat result.fa ## 查看结果 >OR4F29_ENSG00000284733_ENST00000426406_20_955_995 A G C C C A G T T G G C T G G A C C A A T G G A T G G A G A G A A T C A C T C A G T G G T A T C T G A G T T T T T G T T T C T G G G A C T C T T T T T G T T T C T G G G A C T C >OR4F16_ENSG00000284662_ENST00000332831_20_955_995 A G C C C A G T T G G C T G G A C C A A T G G A T G G A G A G A A T C A C T C A G T G G T A T C T G A G T T T T T G T T T C T G G G A C T C A C A G A A T C A C T C A G T G G T A T C T G A G T T T T T G T T T C T G G G A C T C A C
4、每行20个碱基
root@PC1:/home/test# ls record.txt test.fa root@PC1:/home/test# cat record.txt #step1 grep -n "^>" test.fa | cut -d ":" -f 1 | paste -d " " -s | awk '{for(i = 1; i < NF; i++) printf("%d %d ", $i+1,$(i+1)-1); printf("\n")}' | awk '{for(i = 1; i <= NF; i++) if(i % 2 == 0) {print $i} else {printf("%s ", $i)}}' > topindex.txt #step2 sed -n "/^>/=" test.fa | awk 'END{print $0 + 1}' | paste - -d " " <(sed -n "$=" test.fa ) > endindex.txt #step3 此处改为20 cat topindex.txt endindex.txt | while read {i,j}; do awk -v a=$i -v b=$j "NR == a, NR == b" test.fa | awk '{printf("%s", $0)} END {print}' | awk -v c=$i -F "" '{printf("tag%d",c); for(i = 1; i <= NF; i++) if(i % 20 == 0) {print $i} else {printf("%s ", $i)}; printf("\n"); idx = idx + 1} ' >> tempresult; done #step4 grep "^>" test.fa | paste - -d " " <(grep "^tag" tempresult ) | while read {i,j}; do sed -i "/$j/i $i" tempresult ; done #step5 sed 's/^tag[0-9]*//g' tempresult > result.fa rm endindex.txt topindex.txt tempresult root@PC1:/home/test# bash record.txt root@PC1:/home/test# ls record.txt result.fa test.fa root@PC1:/home/test# cat result.fa ## 查看结果 >OR4F29_ENSG00000284733_ENST00000426406_20_955_995 A G C C C A G T T G G C T G G A C C A A T G G A T G G A G A G A A T C A C T C A G T G G T A T C T G A G T T T T T G T T T C T G G G A C T C T T T T T G T T T C T G G G A C T C >OR4F16_ENSG00000284662_ENST00000332831_20_955_995 A G C C C A G T T G G C T G G A C C A A T G G A T G G A G A G A A T C A C T C A G T G G T A T C T G A G T T T T T G T T T C T G G G A C T C A C A G A A T C A C T C A G T G G T A T C T G A G T T T T T G T T T C T G G G A C T C A C >OR4F29_ENSG00000284733_ENST00000426406_20_955_995 A G C C C A G T T G G C T G G A C C A A T G G A T G G A G A G A A T C A C T C A G T G G T A T C T G A G T T T T T G T T T C T G G G A C T C A C T G T T T C T G G G A C T C A C T >OR4F16_ENSG00000284662_ENST00000332831_20_955_995 A G C C C A G T T G G C T G G A C C A A T G G A T G G A G A G A A T C A C T C A G T G G T A T C T G A G T T T T T G T T T C T G G A T C A C T C A G T G G T A T C T G A G T T T T T G T T T C T G G >OR4F29_ENSG00000284733_ENST00000426406_20_955_995 A G C C C A G T T G G C T G G A C C A A T G G A T G G A G A G A A T C A C T C A G T G G T A T C T G A G T T T T T G T T T C T G G G A C T C A C T G T T T C T G G G A C T C A C T
这篇关于linux shell中将fasta文件按照每行指定碱基数输出的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-12如何创建可引导的 ESXi USB 安装介质 (macOS, Linux, Windows)
- 2024-11-08linux的 vi编辑器中搜索关键字有哪些常用的命令和技巧?-icode9专业技术文章分享
- 2024-11-08在 Linux 的 vi 或 vim 编辑器中什么命令可以直接跳到文件的结尾?-icode9专业技术文章分享
- 2024-10-22原生鸿蒙操作系统HarmonyOS NEXT(HarmonyOS 5)正式发布
- 2024-10-18操作系统入门教程:新手必看的基本操作指南
- 2024-10-18初学者必看:操作系统入门全攻略
- 2024-10-17操作系统入门教程:轻松掌握操作系统基础知识
- 2024-09-11Linux部署Scrapy学习:入门级指南
- 2024-09-11Linux部署Scrapy:入门级指南
- 2024-08-21【Linux】分区向左扩容的方法