VERAandSAM.README
Institute for Systems Biology
thorsson@systemsbiology.org
February, 2001

Go up one level to [VERA and SAM home page]

OVERVIEW

VERA and SAM are a pair of programs that provide a method to determine whether any given gene is expressed at a different level in one cell population than in another according to microarray data.

This README includes a brief description of VERA and SAM followed by notes on usage.

VERA takes data from replicate microarray experiments, and describes the overall variability in the data in terms of five parameters, called error-model parameters. Error-model parameters are fitted to the data by starting from an initial guess, and optimizing them in iterated steps until they have converged.

The parameters are:
sigma_epsilon_x : Standard deviation of multiplicative error in X (the 1st dye)
sigma_epsilon_y : Standard deviation of multiplicative error in Y (the 2nd dye)
rho_epsilon     : Correlation between multiplicative errors
sigma_delta_x   : Standard deviation of additive error in X (the 1st dye)
sigma_delta_y   : Standard deviation of additive error in Y (the 2nd dye)

SAM gives a value, lambda, for each gene on an array, which describes how likely it is that the gene is expressed differently in the two cell populations. A large value of lambda means that the gene is almost certainly expressed differentially, while a value close to 0 indicates that there is no evidence for differential expression. A threshold value for differential expression, lambda_c, should be determined from control experiments. In the reference publication below, lambda_c was fixed at the value at which 0.1% of the genes were differentially expressed in a control experiment with identical conditions for both cell populations (lambda_c = 23.8 ).

Reference:
T. E. Ideker, V. Thorsson, A. F. Siegel, and L. E. Hood. Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. Journal of Computational Biology 7, 805-817 (2001).


RUNNING VERA
VERA [OPTIONS] <mergedFile> <ErrorModel>
Input file format: File providing the (X,Y) expression levels for each gene and replicate experiment, e.g. as produced by the mergeReps script. Please see the file-format specification for details.

Output file format: The ErrorModel output file lists the five error-model parameters, for example:
beta
0.3 0.4 0.8 200 100
Command line options:
  -evol                       Use if you would like to generate
                              a file showing how the parameters
                              converge.
-init <ErrorModel> Use if you would like to specify your own initial choices for parameter optimization
-crit <number> Optimization ceases when an all changes after an iteration step are less than <number>
-iter Display details of optimization (Use for debugging only)

RUNNING SAM

SAM [OPTIONS] <mergedFile> <ErrorModel> <mergedFileSignificance>
Input file format: File providing the (X,Y) expression levels for each gene and replicate experiment, e.g. as produced by the mergeReps script. Please see the file-format specification for details.

Output file format: <mergedFileSignificance> contains all of the information in <mergedFile>, but with five additional columns appended:
<mu_X> <mu_Y> <lambda> <muRatio> <T>

<mu_X> 		mean intensity for first dye
<mu_Y> 		mean intensity for second dye
<lambda>        likelihood of differential expression, i.e. that <mu_X> differs from <mu_Y>
<muRatio>       log_10( mu_X / mu_Y )   (unless ratio was tempered, see below)
<T>             'T' if <muRatio> was tempered, '-' if not
The column <muRatio> displays a "tempered" alternative to the ratio if the mean intensity when dye falls below a threshold given by the background.

Command line options:
  -iter             Display details of optimization (Use for debugging only)