Rice Genotype Search and Summary
Locus and SNP/InDel: Annotation Venn:
Chr Length a MBK V1 Loci b SNP Num. c InDel Num. c
Nip R498 Nip R498 Nip R498 Nip R498
Chr1 43270923 44361539 16699 14457 1450225 1263980 247624 196771
Chr2 35937250 37764328 13554 11760 1171449 1080588 198183 160908
Chr3 36413819 39691490 13828 12092 1053768 1040986 182558 152516
Chr4 35502694 35849732 13168 11095 1509485 1208688 211999 155398
Chr5 29958434 31237231 10872 9884 1014356 877490 153959 119933
Chr6 31248787 32465040 11608 9738 1264776 1039007 190419 146262
Chr7 29697621 30277827 11003 8603 1224875 1009180 182227 137476
Chr8 28443022 29952003 10413 8305 1279867 1041900 182879 144750
Chr9 23012720 24760661 8570 6969 971971 853010 142944 112031
Chr10 23207287 25582588 8525 6609 1084044 976084 149655 123509
Chr11 29021106 31778392 10821 7794 1494008 1332080 217630 174971
Chr12 27531856 26601357 10150 7286 1332107 1069036 190727 143704
Total 373245519 390322188 139211 114592 14850931 12792029 2250804 1768229
a Oryza sativa reference genome:
(1) Nip (Nipponbare) Japonica subspecies, Os-Nipponbare-Reference-IRGSP-1.0
(2) R498 (ShuHui498) Indica subspecies, Os-R498-1.0
b Both Nip and R498 genome annotation using evidence-based gene prediction method, named as The MBK V1 (version 1) in our MBKBase, detail see doi:10.1101/gr.088997.108.
c Total 4460 WGS samples be used for call SNP/InDel which mapped Nip genome, and 1257 WGS samples mapped R498 genome respectively, SNP (AF >= 0.01) and InDel (AF >= 0.005) were used for count statistics.
For Nip reference genome, there are three source of gene annotation: MSU Release 7, RAP-DB V1, and MBK V1.
The venn diagram shows the logical relations between gene loci collection of each annotation source. 89.3% MSU loci and 89.7% RAP loci are coincidence with MBK V1 annotation.
The unique loci for MSU, RAP and MBK are 14517, 6220 and 84946 respectively
Base on evidence-based gene prediction method, Nip and R498 were annotated respectively. The homologous Locus sequences of both variety were aligned and classified into one unified Locus. About 58202 homologous unified Loci were identified in both genome.
Locus Model for Genotyping: Locus Statistics:

One Locus maybe have more than one gene models, and different source locus (Source ID) maybe mapped to different reference genome region. In order to obtain a uniform standard of genotype, the overlap loci were integrated and coded with unified locus ID (Locus ID) which mapped to a unique reference genome region. The homologous genome sequences of different variety were aligned and classified into one unified Locus.

For Nip genome unified loci, 61% of loci sequence length less than 1000bp, 47% less than 500bp.

Locus Genotyping and Show: Genotype Statistics:

In our rice genotype database, a genotype is an absolute measure of base composition of a group of WGS samples (germplasms) in a unified locus region. One genotype (GT) is same or differs subtly from reference genomic sequence, and the differs including both SNP and InDel, but not other big structural variations. Capital base means this position variation is homozygous, lowercase is heterozygous, '-' mean missing reads. Because of most cultivated rice germplasms are homozygous, so the rice genotype in our database is same as the conception of haplotype and allele.
For each potentially variant site, both SNP and InDel with allele frequency >0.15% were retained for building genotype. In Ref row, charts 'N' represents multiple continuous bases, which means the variation of this position for corresponding sample is deletion, in genotype row, the 'N' represents insertion of bases

Base on Nip unified locus region, 4460 WGS rice samples population used for call genotype, and genotype frequency greater than 0.22% (sample number >=10) be retained for count. For each unified locus, it's genotype number can represent alleles number of corresponding gene model, and the alleles number can reflect it's conservative and variation degree in the process of evolution and domestication.