DAPPLE
A Program for Analyzing cDNA Microarrays

by Jeremy Buhler
jbuhler@cs.washington.edu
Copyright (C) 1998-2000. All rights reserved.

Version 0.831
11/23/2000

* PURPOSE

Dapple is a tool for segmentation, spot-finding, and quantitation of
cDNA microarrays. It implements various algorithms designed to make
analysis of microarrays fast, accurate, and convenient for the user.

* AVAILABILITY

Dapple is available for download from

http://www.cs.washington.edu/homes/jbuhler/dapple/

Its methods are described in UW Computer Science Technical Report
UW-TR 2000-08-05, which is available at the same location.

* LICENSE

Dapple is distributed under the terms of the GNU General Public
License; see the file COPYING for terms. I have to use the GPL
because Dapple depends on the FFTW library, which is itself GPL'd.

I am willing to relicense Dapple under other terms if the recipient
undertakes to replace the FFTW library with another FFT library (see
the interface in fft.cc/h for what must be changed). A more liberal
"free for non-commercial use license" should be possible but requires
interaction with the University of Washington's Office of Technology
Transfer.

Please note that Dapple is distributed with ABSOLUTELY NO WARRANTY --
I hope it's useful, but if it breaks, you get to keep both pieces.

* SUMMARY OF COMMAND-LINE SYNTAX

dapple [-s <settings file>] [-S <state file>] [-hv] [<arrayspec>]

-s <settings file> -- restore settings from the specified file
-S <state file> -- restore settings and load images from specified state

-h -- print a brief help text
-v -- print current version information

<arrayspec> may be a list of channel image files to load or
'-i <index file>' to load images indirectly from an index file.
Using an index file overrides an explicit command-line specification;
using a state file overrides any other array specification.

For more details on the various input options, see the following sections
on input and saving/restoring settings.

***************************************************************************

* INPUT

Dapple operates on one- or two-color microarray images represented as
pairs of image files. Each file represents one channel of the array
image; two-channel images are conventionally displayed as red plus
green, while single-channel images are grayscale with black == zero
intensity. By default, the first channel of a two-channel image (in
order of specification) will be displayed in green; the second
channel, in red. To reverse the colors of a two-channel image or
display a one-channel image with zero == white, toggle the "Reverse
Colors" switch in the Display Dialog, accessible from 'Options ->
Display Options'.

Dapple supports the following image types:

* Molecular Dynamics GEL files (scaling may be linear or square-root)

* 16-bit TIFF files (linearly scaled)

Array images may be specified to Dapple in two ways: directly, by
selecting the channel image file(s) to load, or indirectly using a
single "index file" that points to the image(s). Direct selection may
be more convenient if your array images are stored in separate
directories with mnemonic names (a la MD ImageQuant), while selection
by index file may be better for other naming schemes.

To open channel images directly, select 'File -> Open Array -> by
Channels'. Dapple will pop up a file selection dialog for each
channel image. The number of channels to load can be set in
the Geometry Dialog, accessible from 'Options -> Array Geometry'.

An index file is a one- or two-line text file with the paths to the
channel images, one per line, e.g.

~jbuhler/array/SEP1.GEL
~jbuhler/array/SEP2.GEL

If you do not specify full paths (i.e. paths beginning with '/', '~', or
'~user') to the images, the paths are assumed to be relative to the
directory where the index file is located.

To open an array by index file, select 'File -> Open Array -> by
Index'. Dapple will pop up a file selection dialog for the index
file.

You can also open an array from the command line by typing 'dapple -i
<index-file>' for an index file or 'dapple <channel-one-file> [
<channel-two-file> ]' for one or two channel images.


* INSPECTING AND MOVING AROUND THE IMAGE

You can scroll around an image larger than the Dapple window, either
with the scroll bars and mouse or with the standard movement keys:
left/right/up/down arrows for small steps, Shift-Tab/Tab/PgUp/PgDown
for window-sized steps, and Home/End/Ctrl-PgUp/Ctrl-PgDown to scroll
to the boundaries of the image.

Once grids have been placed, you can jump to a particular spot by
pressing 'j' in the image window and entering the spot to jump to
in the resulting dialog.

To inspect any grid square (or any other part of the image) up close,
simply click-and-hold over it with the middle mouse button. Dapple
pops up a small false-color image of the square in each channel; for
multichannel images, the first channel is on the left. If grids have
not been placed or the cursor is not near a grid square, the popup
image is centered on the mouse position; otherwise, it is centered on
the square.

The popup image is false-colored for improved visibility. "Cool" colors
(blue, cyan) indicate low intensities, while progressively hotter colors
(green, yellow, orange, red, dark red) indicate higher intensities. The
popup image is shown filtered if filtering has been performed for spot
finding or training; otherwise, it is shown unfiltered.

If spots have been found, the border of each channel's popup image is
colored according to the quality of the spot being shown, or neutral
gray if the popup shows an area outside any grid.


* CHANGING THE DISPLAY

Array images generally have a wide dynamic range. To help make all the
spots on an image visible, Dapple provides options for scaling
image intensities for display. These options are accessible through
the Display Options dialog. To open this dialog, select
'Options -> Display Options'.

Dapple prepares array images for display as follows:

1. Each of the channel images is read from its file. The unmodified
images are used without further alteration for spot-finding and
quantitation; further transformations described below are ONLY
used for display purposes and to aid in automated grid placement.

2. Each image is clamped so that its pixels all lie within a fixed
intensity range, possibly smaller than the original dynamic
range of the image. Pixels whose intensities lie outside the
upper and lower clamping thresholds are set to the nearest
threshold value.

The clamping thresholds (in device-normalized intensity units)
can be set from the Display Options dialog.

3. Each image is optionally scaled nonlinearly, to reduce the
effective dynamic range for display. By default, the square
root of each pixel intensity is taken. Logarithmic and
and squaring transformations are also possible, as is
no transformation at all (linear scaling). The transformations
for the main image window and for popups and dialogs can
can be set indepdently from the Display Options dialog.

4. The dynamic range of each transformed image is mapped onto an
eight-bit scale, from 0 to 256, producing an eight-bit grayscale
image. This mapping is controlled by the Image Gamma option, settable
from the Display Dialog. Larger values of gamma make the
image brighter; set this option as appropriate for your display.

5. For two-channel images, the two channels are combined into a
single eight-bit color image, with four bits allocated to each color.
By default, the first channel is colored green, while the second is
colored red. To reverse the channel colors, select 'Reverse Colors'
from the Display Options dialog.

Note that, because array images are quite large, it may take several
seconds to redraw the image after changing display options.


* SPECIFYING GEOMETRY

To place the spots accurately on the array, Dapple needs to know how
many spots there are and how far apart they are. This information
is specified using the geometry dialog, available from the GUI via
'Options -> Array Geometry'.

Dapple's geometry model assumes that the array is laid out as a
rectangular array of grids, each of which is itself a rectangular
array of spots. The example below shows an array of 2x3 grids,
each of which is a 3x4 array of spots. A spot is indexed by
four zero-based coordinates: <grid x> <grid y> <spot x> <spot y>.

X
0 1

0 1 2 3 0 1 2 3

0 * * * * * * * *
0 1 * * * * * * * *
2 * * * * * * * *

0 * * * * * * * *
Y 1 1 * * * * * * * *
2 * * * * * * * *

0 * * * * * * * *
2 1 * * * * * * * *
2 * * * * * * * *

The geometry dialog allows you to set the following parameters:

- Number of grids (x by y)

- "Grid delta", the distance in pixels between *corresponding*
spots in *adjacent* grids. Independent deltas may be specified for
the x and y directions. For example, the distance between
spot (0,0) of grid (0,0) and spot (0,0) of grid (1,2) is
(1 * <x grid delta>, 2 * <y grid delta>).

Note that the grid deltas correspond to the x and y distance
between adjacent pins on the arrayer's spotting head.

- Grid size (x by y) - number of spots per grid

- "Spot delta", the distance in pixels between *adjancent* spots
on the *same* grid. Independent deltas may be specified for the
x and y directions. For example, the distance between spot
(0,0) of grid (0,0) and spot (2,3) of grid (0,0) is
(2 * <x spot delta), 3 * <y spot delta>).

Note that the spot deltas correspond to the x and y distance
by which one pin on the arrayer's spotting head is offset
between deposition of adjacent spots.

Deltas can and often should be fractional. It is important to get
them as correct as possible; otherwise, the resulting grid will place
its spots off-center, confusing Dapple's spot quality estimation algorithm.

The default geometry values (in particular, the deltas) are correct for
our current arrayer setup. You can easily change the numbers of grids and
spots to quantify only part of an array.

Note that Dapple will reject attempts to change the geometry that would
cause the areas of the grids to exceed the bounds of the image.

A two-color array's geometry includes the relative registration of its two
image channels. You can change the registration interactively using the
arrow keys prior to placing grids on the image, or by setting the pixel
offsets in the Geometry Dialog.


* PLACING GRIDS ON THE IMAGE

Dapple uses its knowledge of the array geometry to perform
mostly-automated grid placement. In addition to the geometry, the
user must left-click with the mouse on the (approximate) center of a
single spot on the array ("the origin") to indicate the array's
overall offset within the image. Clicking creates a small icon at the
mouse position which indicates the selected center. Once the origin
has been chosen, select 'Analyze -> Place Grids' to perform the placement
operation for all grids. This may take some time, during which a
progress bar will appear.

By default, Dapple assumes that the origin will be the top-left spot
of the top-left grid in the image (i.e., spot (0,0,0,0)). However,
this spot may be extremely difficult to see or even entirely absent.
To work around this problem, Dapple allows the user to choose an
*arbitrary* spot as the origin. The four coordinates of this spot
(grid x/y, spot x/y) must be specified ahead of time using the
placement options dialog, available via 'Options -> Placement Options'.
Remember that the coordinates are all zero-based!

Note that Dapple will not allow the user to place the origin anywhere
that would cause the grid areas to extend past the boundaries of the image.

If you find that Dapple has placed a grid incorrectly, you can
left-drag with the mouse anywhere inside that grid's area to move it
around. Dapple cares deeply that the grid be properly centered over
the spots; rather than guessing at the best possible center, the user
can auto-center a moved grid precisely with respect to its spots by
right-clicking on it. Very small adjustments to a grid's position are
generally superfluous, since auto-centering will just move it
back to its original position.

Dapple uses the display's dynamic range as a hint to help it place
grids even when the spots are very dim. In particular, it throws out
any pixels with value greater than the maximum specified in the
Display Dialog. If you find that the grids are being placed
incorrectly on your image, try reducing the dynamic range to make the
spots more visible both on the display and for the grid placer.

Note that at any time, you can return to the origin placement mode
(discarding any existing grids, spots, &c) by selecting
'Analyze -> Place Origin'. This is the default mode of operation
when an array is first loaded.

If you find that Dapple is moving the grids far from their correct
locations, you can try reducing the range of placements it tries
by lowering the "jitter" size in the grid placement dialog. The
jitter is a value between 0 and 1 that specifies what fraction of
a grid square in each direction the grid is allowed to move
after Dapple computes its initial (rough) placement.


* FINDING SPOTS

Once the grids have been satisfactorily placed and adjusted, the user
invokes the spot-finder by selecting 'Analyze -> Find Spots'. This
process may take several minutes for a large array, so please
be patient and, as they say, "relaxen und watchen das Blinkenlichten."

Dapple attempts to find one spot in each grid square and evaluate
whether its choice of spot is accurate. Each spot is circled on the
image and possibly annotated with a quality estimate immediately above
and to its left. The quality estimates are pairs of colored
triangles; for two-channel images, the top-left triangle labels the
first channel, while the bottom-right one labels the second channel of
the spot. For monochrome images, both triangles are always the same
color. Each label is green, yellow, or red, respectively indicating
Dapple's belief that the spot is good, marginal, or bad (absent).
If you prefer not to see labels for spots whose channels are all good
(which should be the majority of spots), you can remove them by
changing the option "Label ACcepted Spots" in the Display options
menu.

Note that any channel which is labeled 'bad' (red) will not produce
a foreground intensity value. *Both* channels must be non-bad for
the spot to produce a ratio.

Several options are available to control the spot finder's behavior
through 'Options -> Spot Finder Options'. These are

- the number of times Dapple will attempt to find a spot
in a grid square before deciding that no such spot exists
(default: 3)

- the minimum and maximum radii allowed for a spot
(default: 6 to 15 pixels)

- the assumed difference between the inner and outer radii of a
spot (default: 2 pixels).

Dapple's spot-finding algorithms are sensitive to the *inner* radius
of a spot, that is, the radius at which the brightness levels off to a
roughly constant level above the background. The specified difference
is added to the inner radius of each found spot to estimate its *outer*
radius, that is, the radius at which its brightness begins to rise above
background.

Note: Dapple does not find the outer radius directly because the
inner radius is usually much brighter than the background and
therefore easier to find reliably.

- The size of the median filter applied to the image for spot-finding
purposes (default: 7, i.e. a 7x7 square filter). The filter size
must be an odd integer. The filtered image is used *only* for
spot-finding; quantitation uses the original, unfiltered image.

Median filtering can also be completely disabled for spot finding
(not recommended).


* INSPECTING AND EDITING SPOTS

Dapple provides a convenient mechanism for visually inspecting and
manually editing its choices of spots. A magnifying glass cursor
indicates inspection mode. Any spot can be inspected and edited,
but Dapple directs the user's attention to the most suspicious spots
by labeling them as marginal. Spots labeled 'marginal/marginal' or
'marginal/bad' should receive the most attention.

As always, a transient inspection window can be produced with the
middle mouse button. If a quick inspection shows that a spot should
trivially be accepted or rejected, pressing 'a' or 'r' while the mouse
cursor is over the spot accepts or rejects it, respectively.

To edit a spot, right-click over it. This brings up an editor window
with a larger view of the spot, as above, plus some editing controls.
The spot can be moved by left-dragging inside it, or resized by
right-dragging on its border. For two-color spots, editing either
channel changes both channels, since a grid square can have only one
spot. If no spot exists in a grid square, left-dragging will create
one.

After editing and inspection, the user can choose to accept the new
spot, reject it, or cancel and keep the existing spot. Rejection
deletes the spot entirely and labels it 'bad', while acceptance
labels the spot 'good'.


* QUANTITATING SPOTS

Once spots have been found and edited to the user's satisfaction, they
can be quantitated by selecting 'Analyze -> Quantify' from the GUI.
Dapple will prompt for a file name in which to save the quantitation
data, then quantify the available spots on the array into this file.

The first line of the output file gives the number of spots in the
output and the number of channels quantitated per spot. Each
subsequent line represents a single spot; it has the following values,
separated by single spaces:

* the coordinates of the spot <grid x> <grid y> <spot x> <spot y>

* for the first channel,
- the rated quality: 0 = REJECT, 1 = MARGINAL, 2 = ACCEPT
- the foreground intensity (mean or median)
- the local background intensity (median)
- the standard deviation of the background pixel population

* If the image has two channels, the same four fields for the second channel

* the number of pixels in the foreground and background samples for this spot

* If the image has two channels, the ratio of first to second channel
intensity, according to the formula

foreground_1 - background_1
---------------------------
foreground_2 - background_2

NOTE: for any grid square with no spot present, the foreground intensity in
each channel is 0, while the background is the median intensity of *all*
the pixels in the grid square. The ratio is 0 for grid squares with no spot
or for which (foreground - background) in either channel would be <= 0.

The quantifier's behavior may be controlled from
'Options -> Quantitation Options':

- the foreground intensity may be set to the mean or median
intensity inside the spot area

- the ratio of background-subtracted intensities may be normalized by the
ratio of total intensities over all spots in each channel. Choosing this
option alters the ratio but does *not* normalize the reported
foreground and background intensities themselves.


* RETRAINING THE QUALITY ESTIMATOR

Dapple decides whether a given spot should be accepted, rejected, or flagged
as marginal based on previous experience with similar array data which has
been hand-scored by a human. The decision function is encoded as parameters
to a classifier algorithm built into the program. These parameters can be
changed by the user by retraining Dapple on a new set of manually-scored
spots.

To train on spots from a given image, first load the image and place
its grids as usual. Change any spot finder parameters as desired in
the Finder options menu. Then, instead of selecting "Find Spots", go
to the Train menu and select "Train->Gather Training Data". Each spot
will be overwritten with a question mark, indicating that it has not
yet been used for training.

To train on a single spot, left-click over its grid square. Dapple will
pop up a training window which shows a series of false-color images of
the grid square (one channel at a time) with proposed spots circled.
If the spot is correct, press "Accept"; otherwise, press "Reject".
Each user decision provides another data point for training Dapple's
spot classifier. To end training at any time, press "Quit".

To train on many spots at once, right-click anywhere in the image
window. If you click on a spot, Dapple will present all the spots
below and to the right of that spot in the image sequentially for
training in the same window. If you click outside any grid, the
trainer starts with spot (0,0,0,0). You may quit and resume at any
time.

Once you have trained Dapple on a collection of spots, save the training
information to a file by choosing "Train->Save Training Data". This
information can be reloaded into Dapple at any time in the future by
choosing "Train->Load Training Data". Loads of training data are
cumulative; that is, loading five training files into memory loads
all entries from all the files. To clear all training data from memory,
choose "Train->Clear Training Data".

Every save of training data also saves the current array state for later
reference (see below). If training data is saved in the file "foo", then
the associated state is saved in the file "foo.state".

To retrain Dapple's classifier, load any training information you wish to
use into memory, then select "Train->Train Classifier". You will first
be prompted for a file containing a *LOSS MATRIX* which weights the
costs of different classification errors. A loss matrix file is a
3x2 matrix of whitespace-delimited numbers, e.g.

0 5
1.1 2
5 0

The numbers in the first column, from top to bottom, are the penalties
for marking a spot as "Accept", "Show" (marginal), and "Reject", given
that the *correct* classification as determined by the user was
"Accept". The second column provides similar penalties for spots
which the user has classified as "Reject". Larger penalties for a
given type of error bias the classifier more strongly against making
that error.

The top-left and bottom-right entries of the loss matrix should always
be zero, since correct classifications should not be penalized. The
relative sizes of the other entries should be chosen based on the
relative desirability of marking a spot for inspection versus making
an incorrect classification. If the cost of marking a spot for
inspection is very small relative to the cost of a mistake, then
Dapple will be extremely conservative in accepting or rejecting spots
outright, at the cost of requiring many more manual spot inspections
after spot finding. Conversely, if the cost of marking a spot for
inspection is large relative to the cost of a mistake, then Dapple
will make almost all quality decisions itself -- but it may make more
errors in marginal cases.

Once the loss matrix file is supplied, training should be almost
instantaneous. The resulting classifier may be used immediately
for spot finding and may also be saved as part of the application
settings and reloaded later before performing spot finding.


* SAVING AND RELOADING SETTINGS AND APPLICATION STATE

Dapple has numerous user-settable parameters which affect its operation.
All of these settings can be selectively stored to a file and restored
for later use.

To save the current application settings, select 'File -> Save'.
may choose to save All Settings or only Selected Settings (in which case
Dapple presents a dialog asking which settings to save). In either case,
you will be prompted for a file name in which to store the settings.

To reload saved settings, select 'File -> Load -> Settings' and supply the
settings file name. Settings may also be restored at runtime from the
command line using the option '-s <settingsFile>'.

In addition to user settings, Dapple can save the current application
state, including the current array files, settings, placed grids, and
found spots, for later inspection and editing. To save the state,
select 'File -> Save -> Complete State'. You will be prompted for a
state file name.

To reload the saved state, select 'File -> Load -> Complete State'.
The state may also be restored at runtime from the command line using
the option '-S <stateFile>'. Because the state file includes saved array
file names, specifying '-S <stateFile>' on the command line overrides any
other array specification.

Note that using 'File -> Load -> Settings' or '-s' on a saved state file
will load only the settings portion of the file.