Rice Genotype Search and Summary
Locus and SNP/InDel: Annotation Venn:
Chr Length a MBK V3 Loci b SNP Num. c InDel Num. c
Nip R498 Nip R498 Nip R498 Nip R498
Chr1 43270923 44361539 6512 6821 1450225 1829472 247624 334698
Chr2 35937250 37764328 5248 5573 1171449 1075614 198183 209809
Chr3 36413819 39691490 5426 5755 1053768 1014419 182558 197559
Chr4 35502694 35849732 4583 4728 1509485 1197724 211999 205300
Chr5 29958434 31237231 3882 4200 1014356 887154 153959 160098
Chr6 31248787 32465040 4109 4360 1264776 1066204 190419 194435
Chr7 29697621 30277827 3831 3902 1224875 1012588 182227 181995
Chr8 28443022 29952003 3594 3645 1279867 1087538 182879 191435
Chr9 23012720 24760661 2894 2985 971971 846582 142944 147656
Chr10 23207287 25582588 2958 3089 1084044 924897 149655 154320
Chr11 29021106 31778392 3518 3653 1494008 1306939 217630 228054
Chr12 27531856 26601357 3277 3176 1332107 1028976 190727 182179
Total 373245519 390322188 49832 51887 14850931 13278107 2250804 2387538
a Oryza sativa reference genome:
(1) Nip (Nipponbare) Japonica subspecies, Os-Nipponbare-Reference-IRGSP-1.0
(2) R498 (ShuHui498) Indica subspecies, Os-R498-1.0
b Both Nip and R498 genome annotation using evidence-based gene prediction method, named as MBK V3 (version 3)in our MBKBase, detail see doi:10.1101/gr.088997.108.
c Total 4460 WGS samples be used for call SNP/InDel which mapped Nip and R498 reference genome respectively, SNP (AF >= 0.01) and InDel (AF >= 0.005) were used for statistics.
For Nip reference genome, there are three source of gene annotation: MSU Release 7, RAP-DB V1, and MBK V3.
The venn diagram shows the logical relations between gene loci collection of each annotation source. 89.3% MSU loci and 89.7% RAP loci are coincidence with MBK V1 annotation.
The unique loci for MSU, RAP and MBK are 19170, 5518 and 12682 respectively
Base on evidence-based gene prediction method, Nip and R498 were annotated respectively. The homologous Locus sequences of both variety were aligned and classified into one unified Locus. About 29192 homologous unified Loci were identified in both genome.
Locus Model for Genotyping: Locus Statistics:

One Locus maybe have more than one gene models, and different sources of annotation can also lead to the gene boundary differences. In order to obtain a uniform standard for genotyping, the overlap annotation were integrated and coded by a ID (Locus) which mapped to a unique region in different genome.

There are total 95325 loci for both Nip and R498 genome, the average length is 3kbp. 97% of loci sequence length less than 10kbp, 28% less than 1kbp.

Locus Genotyping and Show: Genotype Statistics:

In our genotype database, one genotype (GT) is same or differs from the reference sequence (REF), and the differs including both SNP and InDel (ALT), but not other big structural variations (SV). A Locus GT is an ALT group which located in the region of this locus. Capital base means this position variation is homozygous, lowercase is heterozygous, '-' mean missing reads. Because of most cultivated genome germplasms are homozygous, so the genome genotype in our database is same as the conception of haplotype and allele.
For each positions, the ALT with frequency >0.15% were retained for building genotype. In REF's row, charts 'N' represents multiple continuous bases, which means the variation of this position for corresponding sample is deletion, in GT's row, the 'N' represents insertion of bases

Base on locus model, 5280 WGS samples be used for call locus genotype, and GT% > 0.22% (sample number >=10) be retained for statistics. For each locus, the number of genotype can represent the number of alleles in the population.