This README file provides detailed instructions on how to use the Reference Coverage Profile normalization method. Many details will of course depend on your specific machine and environment setup; how long each step takes will depend on the speed of your machine. Memory usage is very low for all scripts. Code and resources can be downloaded from: http://db.systemsbiology.net/gestalt/coverage/ Download the resources and install into a directory called "coverage"; put the scripts in coverage/bin, the RCPs in coverage/RCPs; other files in the coverage directory itself. Suppose you want to analyze coverage in a CGI genome assembly (let's call it CGIASM) and an Illumina genome assembly (let's call it ILMASM). For the CGI genome you'll need to locate the ASM directory - let's call that ASMDIR. For Illumina, you need to locate the BAM file - let's call it BAMFILE. Make sure the resources are available: % ls coverage bin GCbuckets.hg19.gz GCprofile.hg19.gz GCprofile.hg19.gz.tbi RCPs README Make sure HMMSeg is installed: % ls tools/HMMSeg If you have it installed elsewhere, you'll need to edit line 18 of coverage/bin/segmentCoverage.pl, or just create a softlink from tools/HMMSeg to the directory where you have HMMSeg installed. If you don't have HMMSeg, you'll have to obtain it from http://noble.gs.washington.edu/proj/hmmseg/ Create a temporary directory for storing results: % mkdir tmp Compute the condensed coverage for the sample: % coverage/bin/condenseCoverage.pl ASMDIR/REF tmp/CGIASM % coverage/bin/condenseCoverage.pl BAMFILE tmp/ILMASM These commands will each take a couple of hours to run, depending on the speed of your machine. On our development machine, they take slightly less than two hours each. Compute summary statistics (needed for the normalization step): % coverage/bin/totalCoverageByGC.pl tmp/CGIASM coverage/GCbuckets.hg19.gz % coverage/bin/totalCoverageByGC.pl tmp/ILMASM coverage/GCbuckets.hg19.gz Each of these commands takes slightly over two minutes in our system. The output files will be in the same directory as the input (tmp in this case). You could specify an alternative output directory by giving a third parameter. Normalize the coverage to the RCP: % coverage/bin/normalizeCoverage.pl tmp/CGIASM coverage/RCPs/CGI-10.rcp.gz % coverage/bin/normalizeCoverage.pl tmp/ILMASM coverage/RCPs/Illumina.rcp.gz Each of these commands takes slightly under two minutes in our system. The output files will be in the same directory as the input (tmp in this case). You could specify an alternative output directory by giving a third parameter. Segment the coverage using a hidden Markov model: % coverage/bin/segmentCoverage.pl tmp/CGIASM % coverage/bin/segmentCoverage.pl tmp/ILMASM Each of these commands takes half a minute in our system. The output files will be in the same directory as the input (tmp in this case). You could specify an alternative output directory by giving a second parameter.