1.chr22.vcf.gz 来自:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr22.filtered.SNV_INDEL_SV_phased_panel.vcf.gz 实际样本数为3690,样本信息见sample_summary_3690.txt 2.GSA-24v3_chr_pos.list 来自:https://support.illumina.com/content/dam/illumina-support/documents/downloads/productfiles/global-screening-array-24/v3-0/GSA-24v3-0-A2-manifest-file-csv.zip 仅提取染色体和位置信息 3.samples_population_3202.txt 来自:https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/20130606_g1k_3202_samples_ped_population.txt AMR => 490 => Ad Mixed American AFR => 893 => African EUR => 633 => European EAS => 585 => East Asian SAS => 601 => South Asian 其中 EAS+SAS构建PanelA[去除20个测试样本后共1166个样本,PanelA_sample.list], AMR+AFR+EUR构建PanelB[去除30个测试样本后共1986个样本,PanelB_sample.list], A+B为PanelO[共3152样本,PanelO_sample.list] 4.sample_summary_3690.tx 来自:https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20140502_sample_summary_info/20140502_complete_sample_summary.txt 5.test_sample 50个测试样本及imputed输出,以样本SAS_NA20872为例: SAS_NA20872_chip.meta_chr22.imputed.metaDose.vcf.gz # MetaMinimac2 输出结果[chip_subset] SAS_NA20872_chip.PanelA_chr22.imputed.dose.vcf.gz # Minimac4 PanelA 输出结果[chip_subset] SAS_NA20872_chip.PanelA_chr22.imputed.empiricalDose.vcf.gz SAS_NA20872_chip.PanelA_chr22.imputed.info SAS_NA20872_chip.PanelB_chr22.imputed.dose.vcf.gz # Minimac4 PanelB 输出结果[chip_subset] SAS_NA20872_chip.PanelB_chr22.imputed.empiricalDose.vcf.gz SAS_NA20872_chip.PanelB_chr22.imputed.info SAS_NA20872_chip.PanelO_chr22.imputed.dose.vcf.gz # Minimac4 PanelO 输出结果[chip_subset] SAS_NA20872_chip.PanelO_chr22.imputed.empiricalDose.vcf.gz SAS_NA20872_chip.PanelO_chr22.imputed.info SAS_NA20872_chip.vcf # GSA芯片22号染色体共计9183个位点,vcf中存在并保留的位点数为8516个位点 SAS_NA20872.meta_chr22.imputed.metaDose.vcf.gz SAS_NA20872.PanelA_chr22.imputed.dose.vcf.gz SAS_NA20872.PanelA_chr22.imputed.empiricalDose.vcf.gz SAS_NA20872.PanelA_chr22.imputed.info SAS_NA20872.PanelB_chr22.imputed.dose.vcf.gz SAS_NA20872.PanelB_chr22.imputed.empiricalDose.vcf.gz SAS_NA20872.PanelB_chr22.imputed.info SAS_NA20872.PanelO_chr22.imputed.dose.vcf.gz SAS_NA20872.PanelO_chr22.imputed.empiricalDose.vcf.gz SAS_NA20872.PanelO_chr22.imputed.info SAS_NA20872.vcf # 样本原始vcf文件,共计1066557个位点 SAS_NA20872_chip_imputed_compare.txt # 原始样本和两种填充方法,按位点比较一致性输出文件 6.Imputed_compare.summary 最终比较结果汇总,分基因型统计,source字段表示原始vcf中的位点数, mac4字段表示Minimac4充填与原始文件一致位点数的比率,meta为MetaMinimac2充填一致位点比率。 最后两列为所有基因型一致位点两种方法的比率。 7.Imputed_compare.log 两种方法比较输出日志 8.PanelO_chr22_group_alt_af.txt 对PanelO中的3152样本,统计每个位点在不同群体中的AF 9.Imputed_r2_test.summary pearson correlation 测试结果,按样本按群体按AF-bin输出结果 10.Imputed_r2_test.log pearson correlation 测试日志,包括选取的位点,及位点对应基因型的测试数据集 11.Imputed_r2_test_page.json pearson correlation 测试Avg r2结果,AF数据log10处理, 用于html网页加载展示 12.Imputed_r2_test_page.html Avg r2可视化作图