mergeReps

mergeReps.README
Institute for Systems Biology
(c) Trey Ideker, October 2000

Go up one level to [Data-Processing Pipeline]

mergeReps [OPTIONS] <fileTable> <mergedOutput>

This script merges data from replicate microarray experiments, contained in seperate files produced by the 'preprocess' script, into a single <mergedOutput> file. Data files are listed by name in the <fileTable> along with other required information according to a specific format.

FILETABLE FORMAT: Each row contains three columns: <filename>, <labeling_direction>, and <slide_ID>. Each <filename> pertains to a different replicate data set output by the 'preprocess' script. <Labeling_direction> may be f (forward) or r (reverse) and is used to group files according to which of the two dyes (X or Y) was used to represent condition i vs. ii. For reverse- labeled data, the program will reassign intensity measurements for each gene so that x->y and y->x. <Slide_ID> is a unique alphanumeric identifier assigned to each distinct microarray slide. Comments may be included in the filetable if they are preceeded by a number sign '#'. For example, the following filetable lists four files containing intensity data from four replicate microarray experiments that compare conditions i and ii:

#######################################
# fname            dir             id #
#######################################
processed file 1     f               1
processed file 2     f               1
processed file 3     r               2
processed file 4     r               2
#######################################

In the first two files, dye X represents condition i and dye Y represents condition ii, while in the second two files this mapping is reversed. The first two files (or equivalently the second two files) contain data from replicate microarrays printed next to each other on the same slide, so they have identical slide IDs. The first three lines and the last line of the filetable are comments.

OUTLIER REJECTION: By default, replicate measurements for each gene are filtered to reject outliers according to Dixon's test. Outlier rejection is performed separately for the x replicates and y replicates, and is performed only if 3 or more replicates are available. Intensity pairs in which either x or y is an outlier are flagged with the symbol 'O' in the output file. Outlier rejection may be disabled using the -filter option (see below).

COMMAND LINE OPTIONS:

  -opt <num>    Produce output for error model optimization using VERA. Only 
                returns those genes that are represented by at least <num>
                replicate measurements in the merged data set and which are not
                associated with any saturated intensity measurements (S flag).

  -filter {on,off}  Filter replicate measurements for each gene by performing 
                a statistical test to reject outliers (see above description).
                The default value is 'on'.

  -exclude <gene file>  Do NOT output genes listed in <gene file>. Genes can be 
                specified using either the gene name or description, one gene 
                per row in <gene file>. This option is useful for eliminating
                spots on the microarray that are no longer used or which
                represent depricated genes.

OUTPUT FORMAT EXAMPLE: Each row summarizes the replicate information for a particular gene. Column 'N' lists the number of avaiable replicates, while column 'S' lists the total number of slides these replicates were taken from (column N does not necessarily equal S). 'RATIO' reports the average log ratio of these replicates, and 'STD' reports the standard deviation of the log ratio. Remaining columns list each (x,y) replicate along with that replicate's flags (columns 'F'). In the example, three replicate measurements per gene were analyzed.

   GENE DESCRIPT | N S RATIO STD |   X0   Y0  F0    X1   Y1  F1    X2   Y2  F2
------- -------- | - - ----- --- | ---- ---- ---  ---- ---- ---  ---- ---- ---
YCL052C     PBN1   4 2 -0.34 0.6    161 2396   -  2931 5322   - 14721 11890  -
YGR148C   RPL24B   3 2 -0.36 0.5    161 1254   -  3631 2464   - 10829 17113  O 

...

YIR011C     STS1   3 2 -0.18 0.2     55  204  YX   685 1797   -  6571  8651  -