FILE FORMAT SPECIFICATION

FILE-FORMAT SPEC FOR VERA AND SAM DATA INPUT
(INTENSITY PAIRS)

The first line gives the column headings. Each subsequent line contains the (X,Y) intensity pairs for a particular gene. Alternatively, lines after the first line that begin with a '#' are skipped (used for comments).

An example data file containing 2 genes and up to 3 replicates:

unique  other     X0   Y0   X1   Y1  X2   Y2   N  F0 F1 F2      
#                                            	                
# optional comments appear here...
#
b01    actin      12   30   5    6   24   60   3  -  -  X       
c35    unknown    100  21   10   3   -     -   2  -  X  - 



|-----------------required------------------||----optional---|
|-order fixed--||--------------------order arbitrary---------|

Required columns:

The first two columns contain text descriptors for each gene: the first must uniquely identify the gene (e.g. a gene name or ORF code), while the second column is used to store additional information about the gene and is allowed to be nonunique. The corresponding column headings "unique" and "other" may be chosen at will.

Also required are the replicate (X,Y) intensity pairs for each gene. The headings for these columns must be labeled as shown above (X or Y followed by the repeat number, starting from repeat 0), although column order does not matter.

Optional columns:

The N column specifies the number of replicates to be used (replicates 0 through N-1). The F (flag) columns specify which specific pairs are to be excluded (see "Data handling" below). The headings for these columns must be labeled as shown above (N; or F followed by the repeat number).

Data handling:

N	F	Handling
not given	not given	Pairs are excluded only if '-' place holder found in either X or Y.
not given	given	Pairs are excluded if an 'X' is found in the flag entry, or if the a '-' place holder is found in either X or Y. In the above example, only the (X0,Y0) and (X1,Y1) pairs will be used for actin.
given	not given	The first N pairs are used. For the unknown gene above, only the pairs (X0,Y0) and (X1,Y1) are used.
given	given	F flag overrides number of samples N.

Other notes:

Maximum 50,000 genes (set by MAXGENES in read_data.c )
X and Y intensities are expected to be floating point ( e.g. 13456.22 ) or integer
Maximum length for text strings: 30 characters
Columns are separated by spaces or tabs. Within a column, entries must not contain spaces or tabs