VERAandSAM.README
Institute for Systems Biology
mjohnson@systemsbiology.org
October, 2001
Go up one level to [VERA and SAM home page]
OVERVIEW
VERA and SAM are a pair of programs that provide
a method to determine whether any given gene is expressed at a different level
in one cell population than in another according to microarray data.
This README includes a brief description of
VERA and SAM followed by notes on usage.
VERA takes data from replicate microarray
experiments, and describes the overall variability in the data in terms of five
parameters, called error-model parameters. Error-model parameters are fitted to
the data by starting from an initial guess, and optimizing them in iterated
steps until they have converged.
The parameters are:
sigma_epsilon_x : Standard deviation of multiplicative error in X (the 1st dye)
sigma_epsilon_y : Standard deviation of multiplicative error in Y (the 2nd dye)
rho_epsilon : Correlation between multiplicative errors
sigma_delta_x : Standard deviation of additive error in X (the 1st dye)
sigma_delta_y : Standard deviation of additive error in Y (the 2nd dye)
SAM gives a value, lambda, for each gene on
an array, which describes how likely it is that the gene is expressed
differently in the two cell populations. A large value of lambda means that the
gene is almost certainly expressed differentially, while a value close to 0
indicates that there is no evidence for differential expression. A threshold
value for differential expression, lambda_c, should be determined from control
experiments. In the reference publication below, lambda_c was fixed at the
value at which 0.1% of the genes were differentially expressed in a control
experiment with identical conditions for both cell populations (lambda_c = 23.8
).
Reference:
T. E. Ideker, V. Thorsson, A. F. Siegel, and L. E. Hood. Testing for
differentially-expressed genes by maximum-likelihood analysis of microarray
data. Journal of Computational Biology 7, 805-817 (2001).
RUNNING VERA
Chain of Commands:
Input file format: File providing the (X,Y) expression levels for each
gene and replicate experiment, e.g. as produced by the mergeReps script. Please
see the file-format specification for details.
Output file format: The ErrorModel output file lists the five error-model
parameters, for example:
beta
0.3 0.4 0.8 200 100
VERA Options:
VERA options can be accessed by
clicking "Options..." on the main dialog
Option #1- Use if you would like to generate a file
showing how the parameters converge. The
file name and path is also displayed.
Option #2- Optimization ceases when an all changes
after an iteration step are less than <number>
Option #3- Use if you would like to specify your own
initial choices for parameter optimization.
Upon checking the box, the file name will
be requested.
Option #4- Display details of optimization (Use for
debugging only). ".VERAdebug" will be the file
extension.
RUNNING SAM
Chain of Commands:
Input file format: File providing the (X,Y) expression levels for each
gene and replicate experiment, e.g. as produced by the mergeReps script. Please
see the file-format specification for details.
Output file format: <mergedFileSignificance> contains all of the
information in <mergedFile>, but with five additional columns appended:
<mu_X> <mu_Y> <lambda> <muRatio> <T>
<mu_X> mean intensity for first dye
<mu_Y> mean intensity for second dye
<lambda> likelihood of differential expression, i.e. that <mu_X> differs from <mu_Y>
<muRatio> log_10( mu_X / mu_Y ) (unless ratio was tempered, see below)
<T> 'T' if <muRatio> was tempered, '-' if not
The column <muRatio>
displays a "tempered" alternative to the ratio if the mean intensity
when dye falls below a threshold given by the background.
SAM options:
SAM options can be accessed by
clicking "Options..." on the main dialog
Option #1 Select if the SAM input is different from
the VERA output.
Option #2 Display details of optimization (Use for
debugging only)