< R C P > Reference Coverage Profiles are pre-computed multi-genome profiles of depth of coverage from a large data set of high quality whole-genome assemblies.
Importance
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons.
  • The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150-1000x compression) that enables such analyses.
  • Current methods for analyzing variants in whole-genome sequencing data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1-100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes.
Information

  • We analyzed depth of coverage in over 6000 high quality (>40x) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC.
  • To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used.
  • Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes.
  • For more detail, see the publication in Frontiers in Genetics.
Downloadables
Communication

  • Questions, comments, bug reports, and suggestions for improvements or additional data sets are most welcome!
  • Would you like to receive notification of updates to the coverage analysis methods? Follow via Twitter, or send me a note.
  • If you find the RCP method useful for your work, please cite:
    Glusman G., Severson A, Dhankani V, Robinson M, Farrah T, Mauldin DE, Stittrich AB, Ament SA, Roach JC, Brunkow ME, Bodian DL, Vockley JG, Shmulevich I, Niederhuber JE and Hood L. Identification of copy number variants in whole-genome data using Reference Coverage Profiles. Frontiers in Genetics 2015, 6:45 doi:10.3389/fgene.2015.00045.