task6b-哦别做梦了-TP53在TCGA的肝癌的有配对样本病人的转录组数据表达量配对图
2021/10/16 6:17:18
本文主要是介绍task6b-哦别做梦了-TP53在TCGA的肝癌的有配对样本病人的转录组数据表达量配对图,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
作业链接
0.作业题目
- 从ucsc的xena浏览器里面下载感兴趣癌症,比如肝癌的表达矩阵(counts值)
- 然后根据样本名字拿到有配对的几十个病人的癌症和正常对照数据(部分癌症数据并没有对照)
- 接着提取感兴趣基因(比如TP53)的表达量
- 最后套用上面的绘图代码即可!
1.数据下载
下载网址
然后找到LIHC
点击进去下载即可
2.数据提取以及简单统计
提取TP53的表达量数据
#TP53的ensemble id 为ENSG00000141510 zcat TCGA-LIHC.htseq_counts.tsv.gz | grep -E 'Ensembl_ID|ENSG00000141510' >TP53_tcga_expression.txt
library(dplyr) tp53_tcga = read.table('TP53_tcga_expression.txt',header = T,check.names = F) rownames(tp53_tcga) = 'TP53' tp53_tcga = tp53_tcga[,-1]
统计正常样品 和 肿瘤 样品个数
> table(colnames(tp53_tcga) %>% sub('TCGA-\\w+-\\w+-','',.) ) 01A 01B 02A 02B 11A 369 2 2 1 50 #tumor个数:369+2+2+1=374 #normal个数:50 #一共424个样本
其中01-09是tumor样本;10-29是normal样本;
这里只保留01A和11A这两种最常用的样本
01代表的是Primary Solid Tumor;11代表的是Solid Tissue Normal,具体详见https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/sample-type-codes
#只保留01和11两类样本 tp53_tcga <- colnames(tp53_tcga) %>% grepl('-[01]1A', . ,perl = T) %>% which(.) %>% tp53_tcga[,.]
#对保留下来的样本进行统计 > table(colnames(tp53_tcga) %>% sub('TCGA-\\w+-\\w+-','',.) ) 01A 11A 369 50
提取既有normal又有tumor的病人id
如下所示,A10Q这个相同即代表的是同一个捐献者
01A代表是取的肿瘤部位的样品,11A代表取的正常组织的样品
TCGA-BC-A10Q-01A 11.514714054138487
TCGA-BC-A10Q-11A 9.843921051289035
提取有上述配对情况的病人id,共计50个
> names(which((colnames(tp53_tcga) %>% sub('-[01]1A','',.) %>% table(.))==2)) [1] "TCGA-BC-A10Q" "TCGA-BC-A10R" "TCGA-BC-A10T" "TCGA-BC-A10U" "TCGA-BC-A10W" "TCGA-BC-A10X" "TCGA-BC-A10Y" "TCGA-BC-A10Z" "TCGA-BC-A110" [10] "TCGA-BC-A216" "TCGA-BD-A2L6" "TCGA-BD-A3EP" "TCGA-DD-A113" "TCGA-DD-A114" "TCGA-DD-A116" "TCGA-DD-A118" "TCGA-DD-A119" "TCGA-DD-A11A" [19] "TCGA-DD-A11B" "TCGA-DD-A11C" "TCGA-DD-A11D" "TCGA-DD-A1EB" "TCGA-DD-A1EC" "TCGA-DD-A1EE" "TCGA-DD-A1EG" "TCGA-DD-A1EH" "TCGA-DD-A1EI" [28] "TCGA-DD-A1EJ" "TCGA-DD-A1EL" "TCGA-DD-A39V" "TCGA-DD-A39W" "TCGA-DD-A39X" "TCGA-DD-A39Z" "TCGA-DD-A3A1" "TCGA-DD-A3A2" "TCGA-DD-A3A3" [37] "TCGA-DD-A3A4" "TCGA-DD-A3A5" "TCGA-DD-A3A6" "TCGA-DD-A3A8" "TCGA-EP-A12J" "TCGA-EP-A26S" "TCGA-EP-A3RK" "TCGA-ES-A2HT" "TCGA-FV-A23B" [46] "TCGA-FV-A2QR" "TCGA-FV-A3I0" "TCGA-FV-A3I1" "TCGA-FV-A3R2" "TCGA-G3-A3CH"
根据上述提取到的样本名字进一步拿到其对应的tumor和normal的表达量数据
tumor_and_normal = names(which((colnames(tp53_tcga) %>% sub('-[01]1A','',.) %>% table(.))==2)) normal = tp53_tcga[,paste0(tumor_and_normal,"-11A")] tumor = tp53_tcga[,paste0(tumor_and_normal,"-01A")]
> normal TCGA-BC-A10Q-11A TCGA-BC-A10R-11A TCGA-BC-A10T-11A TCGA-BC-A10U-11A TCGA-BC-A10W-11A TCGA-BC-A10X-11A TCGA-BC-A10Y-11A TP53 9.843921 10.06474 10.13955 9.896332 9.79279 10.36304 10.56986 TCGA-BC-A10Z-11A TCGA-BC-A110-11A TCGA-BC-A216-11A TCGA-BD-A2L6-11A TCGA-BD-A3EP-11A TCGA-DD-A113-11A TCGA-DD-A114-11A TP53 10.71167 9.810572 9.575539 10.80574 9.930737 9.623881 10.87498 TCGA-DD-A116-11A TCGA-DD-A118-11A TCGA-DD-A119-11A TCGA-DD-A11A-11A TCGA-DD-A11B-11A TCGA-DD-A11C-11A TCGA-DD-A11D-11A TP53 9.847057 8.839204 10.01262 10.59246 9.259743 8.668885 9.971544 TCGA-DD-A1EB-11A TCGA-DD-A1EC-11A TCGA-DD-A1EE-11A TCGA-DD-A1EG-11A TCGA-DD-A1EH-11A TCGA-DD-A1EI-11A TCGA-DD-A1EJ-11A TP53 10.42731 10.11894 9.609179 10.03067 10.28309 10.29806 10.68212 TCGA-DD-A1EL-11A TCGA-DD-A39V-11A TCGA-DD-A39W-11A TCGA-DD-A39X-11A TCGA-DD-A39Z-11A TCGA-DD-A3A1-11A TCGA-DD-A3A2-11A TP53 10.13699 9.544964 10.26796 10.41574 9.854868 8.897845 10.16993 TCGA-DD-A3A3-11A TCGA-DD-A3A4-11A TCGA-DD-A3A5-11A TCGA-DD-A3A6-11A TCGA-DD-A3A8-11A TCGA-EP-A12J-11A TCGA-EP-A26S-11A TP53 9.346514 11.37829 9.529431 9.278449 9.544964 9.764872 9.949827 TCGA-EP-A3RK-11A TCGA-ES-A2HT-11A TCGA-FV-A23B-11A TCGA-FV-A2QR-11A TCGA-FV-A3I0-11A TCGA-FV-A3I1-11A TCGA-FV-A3R2-11A TP53 10.0348 10.89709 10.39553 10.47675 9.642052 10.02375 10.0348 TCGA-G3-A3CH-11A TP53 9.368506
3.可视化
input <- data.frame(normal = as.numeric(normal), tumor = as.numeric(tumor)) library(ggpubr) ggpaired(input, cond1 = "normal", cond2 = "tumor", fill = "condition", palette = "jco")
参考
http://rpkgs.datanovia.com/ggpubr/reference/ggpaired.html
这篇关于task6b-哦别做梦了-TP53在TCGA的肝癌的有配对样本病人的转录组数据表达量配对图的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-23增量更新怎么做?-icode9专业技术文章分享
- 2024-11-23压缩包加密方案有哪些?-icode9专业技术文章分享
- 2024-11-23用shell怎么写一个开机时自动同步远程仓库的代码?-icode9专业技术文章分享
- 2024-11-23webman可以同步自己的仓库吗?-icode9专业技术文章分享
- 2024-11-23在 Webman 中怎么判断是否有某命令进程正在运行?-icode9专业技术文章分享
- 2024-11-23如何重置new Swiper?-icode9专业技术文章分享
- 2024-11-23oss直传有什么好处?-icode9专业技术文章分享
- 2024-11-23如何将oss直传封装成一个组件在其他页面调用时都可以使用?-icode9专业技术文章分享
- 2024-11-23怎么使用laravel 11在代码里获取路由列表?-icode9专业技术文章分享
- 2024-11-22怎么实现ansible playbook 备份代码中命名包含时间戳功能?-icode9专业技术文章分享