- Personal genotypes obtained through SNP chip arrays contain the information required for assessing genetic risks, matching genetic backgrounds between cases and controls in medical research, detecting duplicate individuals or close relatives for medical, legal, or historical reasons. Research purposes served by genotyping include classifying individuals by population, reconstructing human history, quality control prior to whole-genome sequencing, computing kinship matrices to support genome-wide association studies (GWAS), and combining data sets for meta-analysis.
- Many of these applications involve comparison of two or more genotypes. It is challenging to scale such comparisons from pairs to the many millions of individual genotypes we will soon wish to compare in order to provide improved, personalized medical care.
- We developed an ultra-fast method for comparing personal genotypes; our method is akin to locality-sensitive hashing and a modification of our previously published method for computing genome fingerprints. We transform the standard genotype representation (lists of rsids and genotypes) into 'genotype fingerprints' that can be readily compared across array designs and reference versions. Because of their reduced size, computation on the genotype fingerprints is fast and requires little memory. This enables scaling up a variety of important analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genotype cannot be reconstructed from its fingerprint; the method thus has significant implications for privacy-preserving genome analytics.