- Personal genome sequences contain the information required for assessing genetic risks, matching genetic backgrounds between cases and controls in medical research, detecting duplicate individuals or close relatives for medical, legal, or historical reasons. Research purposes served by personal genome sequencing include classifying individuals by population, reconstructing human history, assessing and controlling the quality of the sequence information itself, computing kinship matrices to support genome-wide association studies (GWAS), and combining data sets for meta-analysis.
- Many of these applications involve comparison of two or more personal genomes. However, the size, complexity, and diversity of representations in which they are stored makes comparison of personal genomes in their existing forms error-prone and slow, and therefore challenging to scale from pairs to the hundreds, thousands, or millions of individuals we will soon wish to compare in order to provide improved, personalized medical care.
- We developed an ultra-fast method for comparing personal genomes; our method is akin to locality-sensitive hashing. We transform the standard genome representation (lists of variants relative to a reference) into 'genome fingerprints' that can be readily compared across sequencing technologies and reference versions. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. This enables scaling up a variety of important genome analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genome representation cannot be reconstructed from its fingerprint; the method thus has significant implications for privacy-preserving genome analytics.
- Questions, comments, bug reports, and suggestions for improvements or additional data sets are most welcome!
- Would you like to receive notification of updates to the Genome Fingerprints method? Follow via Twitter, or send me a note.
- If you find Genome Fingerprints useful for your work, please cite:
Glusman G, Mauldin DE, Hood L and Robinson M. Ultrafast comparison of personal genomes via precomputed genome fingerprints. Front. Genet. 2017 8:136.