VeraAndSam

VERA and SAM
Finding significant expression differences in DNA microarray data

VERA - Variability and ERror Assessment
Estimates error model parameters from replicated, preprocessed experiments.

SAM - Significance of Array Measurement
Uses error model to improve the accuracy of the expression ratio and to assign a value 'lambda' to each gene, indicating the likelihood that the gene is differentially expressed.

The need for appropriate statistical methods for coping with error in DNA microarray measurements is widely recognized. Often, a gene is said to be "differentially expressed" if its ratio of expression level in one population to the expression level in a second population exceeds a certain threshold. This thresholding scheme discards useful information on the errors structure.

To address the need for a better statistical test for identifying differentially-expressed genes, we developed VERA and SAM. VERA estimates the parameters of a statistical model that describes multiplicative and additive errors influencing an array experiment, using the method of maximum likelihood. SAM gives a value, lambda, for each gene on an array, which describes how likely it is that the gene is expressed differently between the two cell populations. A large value of lambda means that the gene is almost certainly expressed differentially, while a small value (close to 0) indicates that there is no evidence for differential expression.

VERA and SAM are maintained as part of an ongoing collaboration between the Institute for Systems Biology and the University of California, San Diego. For correspondence regarding VERA and SAM, please contact Trey Ideker or Vesteinn Thorsson (thorssonATSYMBOLsystemsbiologyDOTorg) .

The methods implemented by VERA and SAM are described in detail by the following publication:

T. Ideker, V. Thorsson, A. F. Siegel, and L. Hood. Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data Journal of Computational Biology 7 (6) 805-817 (2000).

This publication may be obtained through Mary Ann Liebert publishers at the Journal of Computational Biology collection of issues online. The microarray data supporting this publication are also available.

Documentation and Download

These tools are currently available for the Windows and UNIX (Linux) platforms. Please read the supporting Windows documentation or UNIX documentation carefully before using. Before running VERA and SAM on your own expression data sets, become familiar with the software and file formats by trying our sample input files.

The source code is also available. Compiling requires that you first obtain source code for VERA's and SAM's optimization routines from Numerical Recipes Software. Please refer to our guide to compilation.

Before you download either the executables or the source, please note that these tools are distributed with ABSOLUTELY NO WARRANTY. However, we hope that they are useful, and you are welcome to augment the source code yourself.

Data-processing pipeline

A number of data processing steps must occur following a microarray experiment but prior to running VERA and SAM. For example, spots in the microarray image must be quantitated, processed (e.g., background-subtracted, normalized), and each spot matched with a corresponding gene identifier. A variety of software packages are available for these purposes. The programs that we use for these tasks at the Institute for Systems Biology are also available: please consult our website describing the microarray data-processing pipeline