Quantcast
Channel: Felix's Thought Logs
Viewing all articles
Browse latest Browse all 109

The true IBS noise range

$
0
0
Last year, I wrote a blog, Noise threshold on atDNA Matches, where I mathematically calculated that IBS noise cannot occur greater than 150 consecutive SNPs. However, this assumes that the population of the world has all possible genotypes. But in reality, the population does not have such diversity. One of the reasons I want to investigate this is because, I am eager to find IBS compound segments between myself and ancient DNA but unsure of thresholds to be used to eliminate noise.

For example, I said:
Every genotype say, AG will match (A will match AA,AG,AT,AC -or- G will match AG,GG,GC,GT) - taking intersection, AG will match AA,AG,AT,AC,GG,GC,GT (7 genotypes out of possible 10).

But in reality, what if there are only 2 genotypes say, AA and AG are found in populations and always universally match? This means, the probability is not 0.7 but always 1. So, the probability for each matching segment with consecutive SNPs drastically varies and purely depends on genotypes found in populations.

Solution

To solve this problem, I took the genotype frequencies from OpenSNP and created a random file which has exactly the same SNPs as my autosomal file, expect the genotype is randomized based on "what is found in populations". I will create multiple random files and compare with my autosomal file to see how it matches. This will help us to figure out the actual noise threshold.

I started off with 150 SNPs / 1 Mb threshold and I didn't get any matches. So, I reduced SNPs to 100 SNPs and below are the matching segments with a random file.

Autosomal match with random file #1

Chr     Start Position  End Position    Len(Mb) SNPs
4 92845437 93847904 1.00247 108
8 78394149 79510525 1.11638 145
17 22034501 25874456 3.83995 103

Largest Segment: 3.83995 Mb
Total Shared: 5.9588 Mb

Autosomal match with random file #2

Chr     Start Position  End Position    Len(Mb) SNPs
4 48960863 53399616 4.43875 106
18 14550375 19351344 4.80097 132

Largest Segment: 4.80097 Mb
Total Shared: 9.23972 Mb

Autosomal match with random file #3

Chr     Start Position  End Position    Len(Mb) SNPs
3 162511139 163554331 1.04319 108
6 58047654 62510310 4.46266 105

Largest Segment: 4.46266 Mb
Total Shared: 5.50585 Mb

The source code used to generate random autosomal files using genotypes found among populations and OpenSNP genotype frequencies can be downloaded from GitHub.

Conclusion

A true noise IBS segment cannot occur above the 150 consecutive SNPs for 1 Mb threshold. Anything above 150 SNPs / 1 Mb threshold must be an IBS compound segment among populations. Please note that 1 Mb unit varies with cM a bit. This result is however is based on OpenSNP genotype frequencies for each SNP. It is basically a IBS-noise vs IBS-compound segments test. While it seems 150 consecutive SNPs cannot just occur randomly to match someone, I am sill not sure how far back such a compound segment say, 200 SNPs/ 2 cM would go back in time coming from a common ancestor. Can it go to the very founder population? I don't know.

Viewing all articles
Browse latest Browse all 109

Trending Articles