Affymetrix QC

Technical documentation

arrayanalysis.org - affyAnalysisQC workflow - document version: 1.0.0

Table Of Content

[Introduction]
[Use the on-line affyAnalysisQC module]
[Install R and the required libraries for local usage]
[Parameter settings and use affyAnalysisQC as a R function]
[Scripts and functions description]

Introduction

Overview of the documentation

This technical documentation has two main objectives:
- to guide you in the installation and/or use of affyAnalysisQC module (available on-line, as a GenePattern module or as a R function)
- to describe in detail all steps and all functions executed by affyAnalysisQC module (for developers or curious users)

All source code has been written in R and is open-source, available under the Apache License version 2.0. It is available on our Download page.

affyAnalysisQC can be run :
  - on-line via the arrayanalysis.org webportal (follow "Get started").
  - locally as a GenePattern module or
  - locally as an automated R workflow consisting on a R function

The four first sections of this documentation will guide you for each usage.

The main functions of affyAnalysisQC are:
   - to compute array quality information;
   - to plot images that allow identifying any aberrations present in the dataset;
   - to return pre-processed data and QC reports.

The last section of this document describes the functions used by the module whatever solution you choose. An interpretative description of the plots and statistics produced is available on the arrayanalysis.org webportal (follow "Module description")

How to use the documentation

As shown in the Table Of Content, you will find the following sections :
• Using the on-line affyAnalysisQC module
• Installing R and the required libraries for local usages
• Installing and using the GenePattern module
• Using the user-defined settings script affyanalysisQC.R
• Description of the executing script run_affyanalysisQC.R and of all its functions

When you want to use affyAnalysisQC workflow on-line, through the webportal, you just need to read the first section.

When you want to run the workflow on a GenePattern server, you would need to read the second and the third sections.

When you want to run it locally as a R function, read the second and fourth sections.

This fourth section describes in detail how to set up the basic information that the scripts need: you only need to make sure you have a working version of R and all required Bioconductor libraries installed and then complete the affyAnalysisQC.R script file with your preferences and run this in R. It will automatically call the run_affyAnalysisQC.R script, which is the core script of the module.

The last section gives more details on the internal functions used by affyAnalysisQC and describes the default calls to these functions by run_affyAnalysisQC.R script. Also it gives the exact parameters to be passed to and retrieved from each function. This section serves as a reference for users who wish to better understand how the output is built and to modify the output produced by the workflow.

Bug tracking system

If you encounter an issue by using the code, you can report it at any moment or, once you have your own account, using our internal tracking system. You can also use this system to post comments or feature suggestions.

Example datasets

Note that three example datasets has been made available on our Download page. They include:
• dataset raw CEL files,
• description file,
• affyAnalysisQC ouput files:
   - execution logfile,
   - report file (pdf),
   - zip archive with images and tables and
   - normalized data (text file)

[Top]

Use the on-line affyAnalysisQC module

You can access the on-line module on arrayanalysis.org webportal: follow "Get started" .
JavaScript has to be enabled (activated) in your web browser. You will be warned if it is not the case. You can activate it at any time in the browser options (see activatejavascript.org if needed)
You don't need to log in; you just need to prepare a zipped file containing all your Affymetrix .CEL files and possibly a file describing your dataset, called the description file. A presentation of this description file is available in the fourth section, subsection "Parameter description".

The on-line module contains three steps before the launch of the analysis:
- Step1: First you load the archive of .CEL files
- Step2: Then you complete the description of the dataset
- Step3: And finally you choose the plots to be computed and their paramters.
Then:
- Execution: The module is executed with the settings you choose
- Results: You get the results after the execution step, or by e-mail.

First step: load the CEL files

The following picture shows the screen for the first step:

step 1

The interrogation mark button will help you by giving you a contextual help. Note that this feature is available when Javascript is activated and is not yet supported by GoogleChrome and Safari browsers.
Loading the zip file may take a while as CEL files are heavy; don't click any button after clicking on the "Next" button otherwise the loading of the file may be compromised. When the file is loaded without error, you are automatically directed to the next step. Otherwise you get a message indicating the error encountred:

step 1 (error message)

Second step: describe the dataset

The following picture shows the screen obtained after completing the first step:

step 2

The interrogation mark buttons will help you by giving you a contextual help.
Your dataset has been read and the following information is presented in a three columns table:
Column "ArrayDataFile" contains the .CEL file names of your N arrays found in the input zip file. You cannot edit this column.
Column "SourceName" is filled with Array1 .. ArrayN. These names will be used for the analyses. Feel free to modify these names at the condition you use only unique names.
Column "FactorValue" is always set to "Group1". If you want your array groups to be represented in the analyses and plots, rename the factor groups.

You may also prefer to enter directly this information from a file you have prepared. If this is the case, browse your description file in the second section. If you enter such a file the information contained in the previous table will be skipped. You'll find a presentation of the description file on the fourth section of this documentation: "Parameter description"

The last section of the second step form proposes you to reorder the arrays per groups, which is done by default.Thus all the arrays representing the same factor will be grouped together on the plots. If you untick the checkbox, arrays will be ordered as they were in the zip file.

Clicking on the "Next" button will direct to the last step if no error has been detected.

Third step: define your analysis

The contextual help is not any more given by the interrogation mark buttons: help messages will pop up as soon as you activate a field (for example if you click in a text field or tick a checkbox)

This last input form is divided into three main sections: the first part allows a quick launch, the second part defines in details the analysis parameters applied to the raw data and the last part is dedicated to the pre-processing (parameters for the normalization and re-annotaion) and its evaluation (definition of the analysis parameters applied to the normalized data).

First part of the input form
The following picture presents the first part; it recalls briefly what your dataset contains and asks you to enter an e-mail. This is optional: if you don't enter your e-mail, you will need to keep the browser opened and not close the page before the end of the calculation. On the contrary, if you enter your e-mail address - which is recommended - you can close the windows as soon as the next page appears and you will be inform of the end of the analysis by e-mail. You would just have to follow the links to the result files given in the e-mail.

step 3, first section

You may launch the analysis with the "Run" button right after this first section. In this case default parameters will be used.
Note that if the species was not deduced from the previous step, you will need to fill this field first, or to untick the "Custom annotation" checkbox.

Second part of the input form
This part contains four frames representing the four families of analysis applied to your raw data: 1) Sample quality, 2) Hybridization and overall signal quality 3) Signal comparability and bias diagnostic and 4) Array correlation.
Most of the parameters are checkboxes that you would tick or untick to indicate whether a certain plot or table has to be computed or not. The analyses and plots are described in the module description page, which is reachable also from the left vertical menu (we recommend you to open the pages in a new tab to not lose the information entered in the input form you are filling).
Some analyses or plot contruction, such as the MA-plot and the hierachical clustering, need paraticular parameters. You may modify the default values.
The following picture presents you this part of the input form, which defines the graphs built from the raw data:

step 3, second part

You may note that all the plots are not selected by default; you may select all of them with the first checkbox: [toogle select all].
You may also note that some plots cannot be selected, such as the "Sample prep controls", the "Background intensity" or the "Scale factors". This is because the dataset used for this example (puclic dataset available on ArrayExpress: E-GEOD-13278), was built with PM-only arrays and the construction of these particular graphs uses the MAS5 algorithm which cannot be applied to PM-only arrays.
Be aware that the generation of 2D PLM-based images for spatial biases are highly time-consuming; the generation of the complete set of images (4 different images representing the raw data, the PLM weights, residuals and residual signs) is not computed by default. See examples of these images on the description page or on Bolstad PLM page.

Third part of the input form
The following picture presents the part of the input form concerning the pre-processing step and its evaluation:

step 3, third section

Use the "Normalization method" drop-down menu to define the pre-processing step. You may chose "none" and keep the raw data. In this case, further paramters will be skipped. By default, the GC-RMA is applied to arrays containing both PM and MM probes and RMA is applied to PM-only arrays.
If the species could have been deduced from the CEL files in the previous steps, the "Species" field is already filled, as shown in this example. Otherwise, you would need to fill this field yourself or to untick the "Custom annotation" checkbox.
Indeed, the probesets will be re-annotated by default, using one of the gene annotation databases (see "Annotation type" drop-down menu) and the "Species" is required for the re-annotation.
After defining the pre-processing, you chose the analyses you want to apply to the normalized data. Only six graphs are proposed (other graphs are not meaningful on normalized data) and the parameters entered for the MA-plot and hierachical clustering applied on raw data will be also used for the normalized data.
Once the input form is completely filled, you can launch the analysis with the "Run" button. Don't click any button after clicking on the "Run" button and before being automatically redirected to the excecution page, otherwise you may compromise your analysis.

Excecution step

After the third step, affyAnalysisQC has all it needs to launch the analysis. The page become grey with a message telling you that the analysis is running. If you entered your e-mail address in the previous step, you can now close the window.
You will find on this page a recalling of the choices you made for this analysis: which files were loaded or created, which plots you decided to create for raw and normalized data and how you managed the pre-processing step.
The following picture shows the screen for the execution step:

execution step

Getting the results

If you entered your e-mail address during the third step, you will receive an e-mail such as the one presented on the following picture:

result step: e-mail

The e-mail contains direct links to the log file, PDF report, ZIP file containing the resulting files (png images, usable for your presentations, and result files such as the PMA table) and normalized dataset (presented as a tab-delimited file). If you closed the browser once you analysis was launched, you can only reach these result files through the links given in the e-mail. You cannot access your results from the arrayanalysis.org portal anymore.

On the contrary, if you did not close the browser, the result page presented in the following pictures shows up when the calculation are ended.

A firt section gives you the same links to the result files than the e-mail: we recommend you either to save these links or to save the result files because if you did not enter your e-mail, once you close this result page, you will not be able to reach them again.
You can download your result files during one week from the links given by e-mail or by the result page. Make sure you download the files before this period elapses.
This section ends with a frame in which the PDF report is opened. You can visualize the document and save it from this frame.

result step: report

A second section of the result page shows the log file content. This information is important when you encountered a bug in the execution: you can report the bug in our internal tracking system or . If you do so, please send us the log information by:
- either saving the log file on your computer (see previous links) and attach it to the ticket/e-mail
- or copy and paste the text from the screen.

result step: report

[Top]

Install R and the required libraries for local use of affyAnalysisQC

Installing R

The R software can be downloaded from http://www.r-project.org. From this website, follow the link to a local CRAN mirror in order to download the program. affyAnalysisQC is compatible with R version 2.12.0 and higher

Installing R libraries

Several libraries (packages) need to be installed before affyAnalysisQC can be executed. For your convenience, we prepared a script that loads almost all of the libraries needed. You can remove from the list libraries that are already included on your R installation.
The next line could be entered in R to execute the script (installing libraries will take a while):
source("http://svn.bigcat.unimaas.nl/arrayanalysis/tags/version_1.1.0/src/install_libraries.R")

Note that this script does not install some of the annotation packages needed. These are chiptype specific, and installing all of them would take a large amount of disk space. Depending on your system, R may automatically install these when running the script. If not, it will be needed to install these libraries manually (c.f. the instructions that follow).

Installing required R libraries under the R GUI for Windows

In the Windows GUI, select the packages menu, followed by Select repositories... A dialog window will pop up, from where the 'BioC software' is selected. Then, go back to packages and select install package(s). Select the required packages from the list below and click OK.

Installing required R libraries using the R terminal

In the R terminal, you can do exactly the same as above, by using the following procedure. Use:
setRepositories()
and make sure that at least 'BioC software' is selected. Next, use:
install.packages("libraryname")
to install a package or:
install.packages(c("libraryname1", "libraryname2", "libraryname3"))
to install - for example - three packages named libraryname1, libraryname2 and libraryname3 respectively.

Alternatively, the required packages can be installed using BioConductor, using the following command:
source("http://www.bioconductor.org/biocLite.R")
biocLite(c("libraryname1", "libraryname2", "libraryname3"))

[Top]

[Top]

Parameter settings for a usage on R (affyAnalysisQC.R script)

This section contains a description of the settings to be provided in the affyAnalysisQC.R script. By the way comparable settings and parameters are requested and used from the webportal and the GenePattern module.

Directories description

First of all, three directories need to be defined:

- a directory containing the CEL files (DATA.DIR)
- a directory containing the affyAnalysisQC.R helper scripts (SCRIPT.DIR)
- a directory to which the output tables and images should be written (WORK.DIR)

The SCRIPT.DIR is set to http://svn.bigcat.unimaas.nl/arrayanalysis/tags/version_1.1.0/src/ by default, to collect the up-dated script files (mainly functions_processingQC.R and functions_imagesQC.R) from the repository. Of course, if you would like to make changes to these functions, you can download them to your local machine, and set the SCRIPT.DIR to the correct location.

Parameters description

Description file
AffyAnalysisQC can use a description file containing information about the arrays (samples) in the dataset. This file is retrieved from the arrayGroup parameter. It require the entire path to the file. If arrayGroup is not set, the CEL file names are used as sample names, and no distinctive groups colours will be used in the images produced.
The description file is a tab-delimited text file containing three columns with the following layout:

ArrayDataFileSourceNameFactorValue
Array1.CELpatient1patient
Array14.CELcontrol1control
Aray23.CELpatient2patient
Array7.CELpatient3patient
.........

The first column contains the names of the CEL files (or any type that can be read by the ReadAffy (affy) function, e.g. CEL.gz) that are in the DATA.DIR. The second column contains the names to be used for each array in the plots and tables produced. The third column contains the names of the groups the samples belong to.
The column headers should be present, but may be named otherwise (as long as the order is the same, and no spaces are used in the names). If there are more than three columns, all further ones are ignored.

"reorder" parameter
The next parameter, reorder, indicates whether for the images and tables produced, the arrays have to be reordered by experimental group first, as this may ease interpretation.

Choice of the plots to be computed
All further parameters are mostly Booleans that indicate whether a certain plot or table has to be computed or not. Information on these plots can be found in the comment lines in the affyAnalysisQC.R file itself, and on the arrayanalysis.org website (see: "Module description").

Options required for some plots
A few other parameters provide options for the plots:
MAOption1 and normOption1 indicate whether MA plots and normalization should be computer for the whole dataset (“dataset”) or per experimental group (“group”).
The clusterOption1, clusterOption2, and normMeth parameters give settings for clustering and normalization, respectively (c.f. help given in the script itself).

Array re-annotation
Finally customCDF indicates whether, before normalization, the array annotation has to be updated – as is advisable – with a custom cdf environment from the BrainArray lab. This is made by default.
CDFtype and species are two settings needed when an updated cdf is requested: the first indicates the database for which the updated cdf should be chosen (when selecting “ENSG”, the common gene name and description will also be added to the normalized data table), the second indicates the species (if not given, the script will try to deduce it from the chiptype).

The last line of the affyAnalysisQC.R script, starting with source, loads the run_affyAnalysisQC.R script, that creates all the images and output tables.

How to run the workflow after adjusting the settings?

After opening R (by either running the R GUI or typing R in a command shell), affyAnalysisQC can be initiated by entering:
> source("affyAnalysisQC.R")

[Top]

Scripts and functions description

This section contains a description of run_affyAnalysisQC.R script, which is the core script of the module. Indeed this script is used by the three available versions (on-line, GenePattern module and R fucntion)

Description of the excecution steps

After loading the required Bioconductor libraries, a reload function is defined and called, to load all needed scripts to the system. These scripts are in functions_processingQC.R and functions_imagesQC.R files.
After that step, the dataset is loaded using the ReadAffy (affy) function. Note that in the entire script some checks are done on parameters that have been set (e.g. before loading the data there is a check whether it already exists). These checks seem superfluous but are related to the fact that the same script is run in automated calls from the arrayanalysis.org webserver form or from the GenePattern module input form. This will lead to some things already be set or defined, which is checked for that reason.

After loading the data, a cdf annotation is loaded for the data, in case this has not already been done when reading the data (c.f. addStandardCDFenv function). Then the type of array: perfect match and mismatch probes or mismatch probes only, is determined (c.f. the getArrayType function).

Next the description file (as given in arrayGroup parameter) is loaded and relevant variables are set. In case no file is given, default names (the CEL file names) are used, and no grouping is assigned. Also a vector of colours is created for the arrays, and another for the groups (c.f. colorsByFactor function).

After setting the data and relevant variables, a cover sheet png file is created. This can be used as an ‘opening image’ when all images are shown in a presentation. Also one or more images representing the description file, 35 array descriptions per image, are provided for reference (c.f. coverAndKeyPlot function)

Then, when any image requiring these data objects is requested, a qc object (created with simpleaffy ) and some variables based on a yaqc object (created with yaqcaffy) are computed. This is done in the main script, as these computations are relatively intensive, and the objects can better be passed to the several functions needing them, instead of being computed anew within each function (though they are prepared to do so, when the objects are not passed). Then a table of some QC statistics is produced and saved to png file (c.f. plotQCtable function)

Thereafter the script starts plotting the QC images that have been requested by the user (c.f. the several functions described below).
All images generated by affyAnalysisQC are png formatted and their dimensions are closed to A4 sheet format. These dimension are set by two variables, WIDTH and HEIGHT, used by all functions returning images, and set by default to 1000 by 1414. The variable POINTSIZE is set by default to 24 and is adapted to the 1000x1414 format.
Another variable, MAXARRAY, is used to optimise the picture according to the number of arrays in the dataset. It is set by default to 41 and is used in most of the functions.

After plotting all raw data QC images, the data object is normalized using the method indicated by the user (or if this is not suitable for the array type, using a similar applicable method, in which case a warning is given). Normalization can be done for the whole dataset (generally applicable) or per experimental group (in specific cases, e.g. overall differences expected between the groups). Normalization is run using a data object annotated with either the standard affy cdf file, or an updated custom cdf file by BrainArray (preferred and proposed by default, c.f. normalizeData function). Hereafter, QC images of the normalized data are plotted.

A final step is the saving of the normalized data table. First a table suited for saving is created, to which some annotation is added in specific cases (c.f. createNormDataTable function). Thereafter this table is saved to a tab delimited text file, e.g. for viewing in Excel or as input for further computational or evaluative tools.

Description of each function

Now each of the functions in functions_processingQC.R and functions_imagesQC.R scripts will be described in a structured format. First a brief description of what the function does, and possible relevant considerations, is given e.g a link to the description of the module for explanations on the plots and their interpretation.
Next, the default call to the function by the run_affyAnalysisQC.R script is given.
Finally a table listing all input parameters (name, type, required or not, description, default), and output values or images is provided.

The functions are divided into six sub-sections. The following tables present the functions of each category and give the main BioConductor librairies and function calls made by these functions:

1) Preparation of the data
FUNCTIONLIBRARIESNOTED CALLS
addStandardCDFenvaffygetCdfInfo
getArrayTypeaffymm
colorsByFactor//
coverAndKeyPlot//
plotQCtablesimpleaffy, yaqcaffyqc, detection.p.val, yaqc

2) Control of the Sample Quality
FUNCTIONLIBRARIESNOTED CALLS
samplePrepPlotsimpleaffy, yaqcaffydetection.p.val, yaqc
ratioPlotsimpleaffyqc
RNAdegPlotaffyAffyRNAdeg, plotAffyRNAdeg

3) Hybridization and overall signal quality
FUNCTIONLIBRARIESNOTED CALLS
hybridPlotsimpleaffyqc
backgroundPlotsimpleaffyqc
percPresPlotsimpleaffyqc
computePMAtablesimpleaffydetection.p.val
PNdistrPlotaffyQCReportborderQC1
controlPlotsaffy, ArrayTools 

4) Signal comparability and biases diagnostic
FUNCTIONLIBRARIESNOTED CALLS
scaleFactPlotsimpleaffyqc
boxplotFunaffyboxplot
densityFunaffyhist
densityFunUnsmoothedaffyexprs
maFunaffyMAplot
plotArrayLayoutaffyexprs
PNposPlotaffyQCReportborderQC2
spatialImagesaffyPLMfitPLM, image
array.imageaffyexprs
nuseFunaffyPLMfitPLM, NUSE
rleFunaffyPLMfitPLM, Mbox

5) Correlation between arrays
FUNCTIONLIBRARIESNOTED CALLS
correlFunaffyQCReportcorrelationPlot
pcaFunaffy, baseexprs, prcomp
clusterFunaffy, base, bioDistexprs, hclust, cor.dist, spearman.dist, euc

6) Re-annotation and normalization
FUNCTIONLIBRARIESNOTED CALLS
deduceSpecies//
normalizeDataaffy, gcrma, pliermas5, rma, gcrma, justPlier
addUpdatedCDFenvaffygetCdfInfo
createNormDataTableaffy, biomaRtexprs, useMart, getBM

Sub-sections 2) to 5) are also present in the module description; they describes the functions returning plots and quality control indicators. For these sub-sections, you will find links between technical documentation and the module description.

References for the packages and databases used by affyAnalysisQC:

BioConductor
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, and Zhang J. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10):R80 Full text

[affy] package contains functions for exploratory oligonucleotide array analysis.
Authors: Rafael A. Irizarry, Laurent Gautier, Benjamin Milo Bolstad and Crispin Miller

[affycomp] package contains functions to compare expression measures for Affymetrix arrays.
Authors: Rafael A. Irizarry and Zhijin Wu with contributions from Simon Cawley

[affypdnn] package contains functions to perform the PDNN method described by Li Zhang et al.
Authors: H. Bjorn Nielsen and Laurent Gautier.

[affyPLM] package extends the base affy package, mainly by implementing methods for fitting probe-level models.
Author: Ben Bolstad

[affyQCReport] package creates a QC report for an AffyBatch.
Authors: Craig Parman, Conrad Halling , Robert Gentleman

[simpleaffy] package provides high level functions for reading CEL files, phenotypic data, and then computing simple things with it.
Author: Crispin J Miller

[yaqcaffy] package computes Quality control of Affymetrix GeneChip expression data with the MAQC reference datasets.
Author: Laurent Gatto

[ArrayTools] package provides solutions for quality assessment and detection of differentially expressed genes for Affymetrix arrays.
Authors: Xiwei Wu, Arthur Li

[bioDist] package offers a collection of software tools for calculating distance measures.
Authors: B. Ding, R. Gentleman and Vincent Carey

[biomaRt] package enalbles an easy access to biological databases implementing the BioMart software suite.
Authors: Steffen Durinck, Wolfgang Huber

Brainarray
Manhong Dai, Pinglang Wang, Andrew D. Boyd, Georgi Kostov, Brian Athey, Edward G. Jones, William E. Bunney, Richard M. Myers, Terry P. Speed, Huda Aki, Stanley J. Watson and Fan Meng. (2005) Evolving Gene/Transcript Definitions Significantly Alter the Interpretation of GeneChip Data. Nucleic Acid Research 33 (20), e175 Full text

BioMart
Haider S, Ballester B, Smedley, D, Zhang J, Rice P, Kasprzyk A. BioMart Central Portal-unified access to biological data. Nucliec Acids Res. 2009 July 1;37(Web Server issue):W23-7. Full text

[Top]


Preparation of the data

The addStandardCDFenv function

DESCRIPTION
This function (from functions_processing) makes sure that a cdf environment is loaded for the current chiptype. In some cases a cdf environment will already be available after reading the data with the ReadAffy function, then the function will detect this and return the object as is (unless overwrite is set to TRUE). In case no cdf environment has been assigned, it will try to search for a suitable one, and add this if found. If no suitable cdf can be found, a warning will be generated and the object returned as is.

USAGE
By default, the script will call:

rawData <- addStandardCDFenv(rawData)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
overwritelogicaloptionalShould the cdfName be overwritten if
there is already a value assigned to
the object passed to the function
FALSE

OUTPUT VALUE
TYPEDESCRIPTION
AffyBatchThe object with a cdf annotation assigned if found

The getArrayType function

DESCRIPTION
This function (from functions_processing) detects whether the chip at hand is a classic chiptype with perfect match and mismatch probes, or one with perfect match probes only.

USAGE
By default, the script will call:

aType <- getArrayType(rawData)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTION
DataAffyBatchrequiredThe raw data object

OUTPUT VALUE
TYPEDESCRIPTION
characterString indicating the type of the current chip, either
“PMMM” for chips with perfect match and mismatch probes,
or “PMonly” for chips with perfect match probes only

The colorsByFactor function

DESCRIPTION
This function (from functions_processing) creates a list with two elements: a vector of colors, one for each array and a vector of one representative color for each group within the experiment, for use in legends. The colors are based on groups present in the dataset (as provided by the user in experimentFactor). If there is only one group, colors are chosen randomly over the rainbow palette. Otherwise arrays belonging to the same group get different shades of a similar color.

USAGE
By default, the script will call:

colList <- colorsByFactor(experimentFactor)
plotColors <- colList$plotColors
legendColors <- colList$legendColors
rm(colList)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTION
experimentFactorfactorrequiredThe factor of groups

OUTPUT VALUE
TYPEDESCRIPTION
list  A list with two fields: plotColors contains a vector
of colors, one for each array; legendColors contains
a representative color for each group, for use in legends

The coverAndKeyPlot function

DESCRIPTION
This function (from functions_imagesQC.R) plots a cover sheet, and one or more key sheet indicating the links between CEL file names, array names used in the plots, and to which experimental group they belong based on the description file provided by the user, which has been loaded into the description variable earlier in the main script, passed as an argument (arrayGroup). For the key sheets, one sheet will be created for every 35 arrays in the experiment.

USAGE
By default, the script will call:

coverAndKeyPlot(description)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
descriptiondata.framerequiredThe data.frame containing the description file
information (column 1: file names; column 2: names
to be used in the plots; column 3: experimental
groups the samples belong to)
 
refNamecharacteroptionaldataset name. It is deduced from the name of the
zip file containing the CEL files when used from
arrayanalysis.org or as a GenePattern module.
""
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414

OUTPUT IMAGES
TYPEDESCRIPTION
png fileFile called ‘Cover_1’ that can be used as an opening image
png file(s)File(s) called ‘Description’, if needed followed by an index number,
that represent(s) the information as given in the input parameter.

The plotQCtable function

DESCRIPTION
This function (from functions_imagesQC.R) computes and plots a table of QC statistics based on the qc (simpleaffy Bioconductor package) and yaqc (yaqcaffy Bioconductor package) functions, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.

USAGE
By default, the script will call:

computeQCtable(rawData, quality, sprep, lys, samplePrep = samplePrep, ratio = ratio, hybrid = hybrid, percPres = percPres, bgPlot = bgPlot, scaleFact = scaleFact)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
qualityQCStatsoptionalobject obtained by calling the qc function (simpleaffy).
When not provided, it is computed within the function.
NULL
sprepmatrixoptionalA matrix of 3’probe intensities for dap, thr, lys, and
phe, taken from an object of class YAQCStats (yaqc
function, yaqcaffy). When not provided, it is computed
within the function.
NULL
lysmatrixoptionalMatrix of A, M, P calls for the 3’ probeset of Lys on
each array, based on results from the detection.p.val
function (simpleaffy) . When not provided, it is computed
within the function.
NULL
samplePreplogicaloptionalDoes the table have to contain sample preparation QC
statistics?
TRUE
ratiologicaloptionalDoes the table have to contain 3’/5’ ratio statistics?TRUE
hybridlogicaloptionalDoes the table have to contain hybridisation QC
statistics?
TRUE
percPreslogicaloptionalDoes the table have to contain percentage present QC
statistics?
TRUE
bgPlotlogicaloptionalDoes the table have to contain background signal
intensity QC statistics?
TRUE
scaleFactlogicaloptionalDoes the table have to contain scale factor QC statistics?TRUE
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE
TYPEDESCRIPTION
png fileFile called ‘QCtable’ with the requested statistics

[Top]


Control of the Sample Quality

The samplePrepPlot function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image of sample preparation controls based on the yaqc (yaqcaffy Bioconductor package) function, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

samplePrepPlot(rawData,sprep,lys,plotColors)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
sprepmatrixoptionalA matrix of 3’probe intensities for dap, thr,
lys, and phe, taken from an object of class
YAQCStats (yaqc function, yaqcaffy). When not
provided, it is computed within the function.
NULL
lysmatrixoptionalMatrix of A, M, P calls for the 3’ probeset of
Lys on each array, based on results from the
detection.p.val function (simpleaffy) . When not
provided, it is computed within the function.
NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image of sample preparation controls, called ‘RawDataSamplePrepControl’

The ratioPlot function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image of beta-actin and GAPDH 3'/5' ratios based on the qc (simpleaffy Bioconductor package) function, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

ratioPlot(rawData, quality=quality, experimentFactor, plotColors, legendColors)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
qualityQCStatsoptionalobject obtained by calling the qc function
(simpleaffy). When not provided, it is computed
within the function.
NULL
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGES
TYPEDESCRIPTION
png imagAn image of 3’/5’ratios for beta actin, called ‘RawData53ratioPlot_beta-actin’
png imageAn image of 3’/5’ ratios for GAPDH, called ‘RawData53ratioPlot_GAPDH’

The RNAdegPlot function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image of overall RNA degradation for all arrays. It calls the function AffyRNAdeg (affy Bioconductor package).
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

RNAdegPlot(rawData,plotColors=plotColors)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
Data.rnadeglistoptionalList as obtained by calling the AffyRNAdeg function
(affy). When not provided, it is computed internally
within this function.
NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of arrays41

OUTPUT IMAGES
TYPEDESCRIPTION
png imageAn image of overall RNA degradation, called ‘RawDataRNAdegradationPlot’

[Top]


Hybridization and overall signal quality

The hybridPlot function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image of hybridisation controls based on the qc (simpleaffy Bioconductor package) function, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

hybridPlot(rawData,quality=quality,plotColors)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
qualityQCStatsoptionalobject obtained by calling the qc function (simpleaffy).
When not provided, it is computed within the function.
NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of arrays41

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image of hybridisation controls, called ‘RawDataSpike-inPlot’

The backgroundPlot function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image of the background intensities and deviations for each array based on the qc (simpleaffy Bioconductor package) function, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

backgroundPlot(rawData, quality=quality, experimentFactor, plotColors,legendColors)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
qualityQCStatsoptionalobject obtained by calling the qc function
(simpleaffy). When not provided, it is computed
within the function.
NULL
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of arrays41

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image of background intensities, called ‘RawDataBackgroundPlot’

The percPresPlot function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image of the percentage present values of each array based on the qc (simpleaffy Bioconductor package) function, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

percPresPlot(rawData, quality=quality, experimentFactor, plotColors, legendColors)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
qualityQCStatsoptionalobject obtained by calling the qc function
(simpleaffy). When not provided, it is computed
within the function.
NULL
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image of percent present values, called ‘RawDataPercentPresentPlot’

The computePMAtable function

DESCRIPTION
This function (from functions_processing) computes a table of Absent (A), Marginal (M), Present (P) calls based on the MAS5 function (affy package). This function will only work for chiptypes that have mismatch probes and for which the detection.p.val ( simpleaffy Bioconductor package) function that it calls works. A try construction will be used and if no table can be created a warning is given.
Note that this function will always use the MAS5 algorithm, regardless of the normalization method used in the normalizeData function. In case customCDF is TRUE, annotation is updated using BrainArray custom cdf environments, before proceeding with the normalization (and summarisation of probe expressions into probeset expressions). To update the annotation, a sub call is made to the addUpdatedCDFenv function.
[Description of the output image and its interpretation]

USAGE
By default, the script will call, if customCDF is TRUE:

PMAtable <- computePMAtable(rawData,customCDF,species,CDFtype)

or, if customCDF is FALSE:

PMAtable <- computePMAtable(rawData,customCDF)

After this call, the main script will save the result to a tab-delimited text file, called PMAtable.txt

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
customCDFlogicaloptionalShould annotation of the chip be updated before
computing the calls (and building the probesets
out of the separate probes)? If requested, this
is done using BrainArray updated cdf environments,
c.f. addUpdatedCDFenv.
TRUE
speciescharacterRequired when
customCDF is
TRUE, c.f.
addUpdatedCDFenv
The species associated with the chip type. NULL
CDFtypecharacterRequired when
customCDF is
TRUE, c.f.
addUpdatedCDFenv
The type of custom cdf requested. NULL

OUTPUT VALUE
TYPEDESCRIPTION
data.frameTable called ‘PMAtable’ containing the PMA values for each probeset and array

The PNdistrPlot function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image with boxplots of positive and negative control intensities for each array. It calls the borderQC1 function (affyQCReport Bioconductor package).
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

PNdistrPlot(rawData)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE
TYPEDESCRIPTION
png imagAn image of boxplots of positive and negative control intensities,
called ‘RawDataPosNegDistribPlot’

The controlPlots function

DESCRIPTION
This function (from functions_imagesQC.R) creates images of the expression values of the affx controls, if present. One image shows the expression profiles over all samples, for each control separately. The other image shows the boxplots of all controls (together), for each sample.
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

controlPlots(rawData,plotColors)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
plotColorscharacterrequiredVector of colors assigned to each array.NULL
experimentFactorfactorrequiredThe factor of groups.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental group.NULL
affxplotslogicaloptionalDoes the AFFX expression profiles have to be plotted?TRUE
boxplotslogicaloptionalDoes the AFFX and other controls boxplots have to be
plotted?
TRUE
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of arrays41

OUTPUT IMAGES
TYPEDESCRIPTION
png imageAn image of the expression profiles of the affx controls,
called ‘Profiles_affx_controls’
png imageAn image of the boxplots of the affx controls, called
‘Boxplots_affx_controls’

[Top]


Signal comparability and biases diagnostic

The scaleFactPlot function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image of the scale factors of the array based on the qc (simpleaffy Bioconductor package) function, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

scaleFactPlot(rawData, quality=quality, experimentFactor, plotColors,legendColors)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
qualityQCStatsoptionalobject obtained by calling the qc function
(simpleaffy). When not provided, it is computed
within the function.
NULL
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image of scale factors, called ‘RawDataScaleFactorsPlot’

The boxplotFun function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image with intensity boxplots for all arrays in the raw or normalized dataset (depending on the object passed). It calls the boxplot function applied to an AffyBatch object.
[Description of the output image and its interpretation]

USAGE
By default, before the normalization the script will call:

boxplotFun(Data=rawData, experimentFactor, plotColors, legendColors)

and after normalization:

boxplotFun(Data=normData, experimentFactor, plotColors, legendColors, normMeth=normMeth)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
normMethcharacterrequired
when
Data is a
normalized
data object
String indicating the normalization method used
(see normalizeData function for more information
on the possible values).
“”
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image of the raw or normalized boxplots of the arrays, called ‘DataBoxplot’

The densityFun function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image with a density curve of the intensities for all arrays in the raw or normalized dataset (depending on the object passed).
[Description of the output image and its interpretation]

USAGE
By default, before the normalization the script will call:

densityFun(Data=rawData, plotColors)

and after normalization:

densityFun(Data=normData, plotColors, normMeth=normMeth)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
plotColorscharacterrequiredVector of colors assigned to each array.NULL
normMethcharacterrequired when
Data is a
normalized
data object.
String indicating the normalization method used
(see normalizeData function for more information
on the possible values).
“”
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image of the raw or normalized density plots of the arrays, called ‘DensityHistogram’

The densityFunUnsmoothed function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image with an unsmoothed density curve of the intensities for all arrays in the raw or normalized dataset (depending on the object passed).
[Description of the output image and its interpretation]

USAGE
By default, the script will not call this function, it could be called as such before normalization:

densityFunUnsmoothed(Data=rawData, plotColors)

and after normalization:

densityFunUnsmoothed(Data=normData, plotColors, normMeth=normMeth)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
plotColorscharacterrequiredVector of colors assigned to each array.NULL
normMethcharacterrequired
when
Data is a
normalized
data object
String indicating the normalization method used
(see normalizeData function for more information
on the possible values).
“”
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image of the raw or normalized unsmoothed density plots of the arrays,
called ‘DensityHistogramUnsmoothed’

The maFun function

DESCRIPTION
This function (from functions_imagesQC.R) creates MA plots for each array versus the median array for the raw or normalized dataset. The median array is computed for the whole data set (if perGroup is FALSE) of per experimental group (perGroup is TRUE). In the script this setting will depend on the setting op the MAOption1 parameter, which can have the values “dataset” or “group”.
[Description of the output image and its interpretation]

USAGE
By default, before the normalization the script will call:

maFun(Data=rawData, experimentFactor, perGroup=(MAOption1=="group"), aType=aType)

and after normalization:

maFun(Data=normData, experimentFactor, perGroup=(MAOption1=="group"), normMeth=normMeth)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
experimentFactorfactorrequired
when
perGroup
is TRUE
The factor of groups.NULL
perGrouplogicaloptionalAre MA plots to be made for each experimental
group separately or not?
FALSE
normMethcharacterrequired
when
Data is a
normalized
data object
String indicating the normalization method used
(see normalizeData function for more information
on the possible values).
“”
aTypecharacteroptionalString indicating the type of the current chip,
either “PMMM” for chips with perfect match and
mismatch probes, or “PMonly” for chips with
perfect match probes only.
Required when Data is a raw data object.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE(S)
TYPEDESCRIPTION
png image(s)Images of the MA plots of each array versus the median array, each file
contains MA plots for six arrays. The files contain the string ‘MAplot” and a
number if more than one are needed; in case of groupwise computation, the name
of the group is also included in the filename.

The plotArrayLayout function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image of the layout of the current chiptype. Thus, this plot does not plot any data, but shows where control and regular perfect match (PM) and (if applicable) mismatch (MM) probes are present on the array. Note: due to resolution issues, banding may seem different from the real situation, e.g. normally on classical chiptypes, PM and MM are present in alternate lines, but patterns may appear due to image resolution. This function tries to load annotation libraries, depending on the chiptype.
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

plotArrayLayout(rawData,aType)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
aTypecharacterrequiredString indicating the type of the current chip, either
“PMMM” for chips with perfect match and mismatch probes,
or “PMonly” for chips swith perfect match probes only.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image of the expression profiles of the affx controls, called ‘Array_layout_plot’

The PNposPlot function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image showing the centres of intensity of the positive and negative border elements for each array. It calls the borderQC2 function ( affyQCReport Bioconductor package).
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

PNposPlot(rawData)

INPUT PARAMETER
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image of centres of intensity of positive and negative border
elements for each array, called ‘RawDataPosNegPositionPlot’

The spatialImages function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image per array, containing one to four spatial images. For any but the raw images, an object obtained by calling fitPLM (affyPLM Bioconductor package) is used. If this object is not provided, it will be computed internally within this function.
[Description of the output image and its interpretation]

USAGE
There are two default calls by the script:

1/ Compute only the images showing the residuals of the PLM (if spatialImage parameter is TRUE):
spatialImages(rawData, Data.pset=rawData.pset, Resid=TRUE, ResSign=FALSE, Raw=FALSE, Weight=FALSE)

2/ Compute the four images for all arrays (if PLMimage parameter is TRUE):
spatialImages(rawData, Data.pset=rawData.pset)

where rawData.pset has been constructed by calling:

rawData.pset <- fitPLM(rawData)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
Data.psetPLMsetoptionalAn object obtained by calling fitPLM (affyPLM),
used for each but the raw plot. When not provided,
it is computed within the function.
NULL
ResidlogicaloptionalShould a residual plot be made?TRUE
ResSignlogicaloptionalShould a residual sign plot be made?TRUE
RawlogicaloptionalShould a raw plot be made?TRUE
WeightlogicaloptionalShould a weight plot be made?TRUE
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE(S)
TYPEDESCRIPTION
png image(s)Images containing each of the requested plots, one file per array. Naming
is ‘virtual_image’ followed by the (sample)name of the array.

The array.image function

DESCRIPTION
This function (from functions_imagesQC.R) creates several types of spatial images. It does not make use of a PLMset object and is called when the PLM images cannot be computed due to time or memory constaints. Also, it offers more flexibility. By default, it creates a relative intensity plot versus the median array on a blue to red colour scale. Note that the median array will always be computed based on the complete data set, also when plots for a subset of arrays are requested. If the median array has to be computed on subsets (e.g. on experimental group), a data object with only those arrays can be provided to the function (without setting the arrays parameter). The color ranges will saturate at pcut percentage(s) of the data range. The color ranges can be modified by tuning col.mod. By default, for the relative plots, the arrays are first balanced for their overall intensity and a symmetrical color range is used. When there is less than 6 arrays, the median array is not used. Intensities are plotted using a virtual symetric color scale, from blue to red. [Description of the output image and its interpretation]

USAGE
There are two default calls by the script:

1/ relative intensity plot versus the median array on a blue to red colour scale:
array.image(rawData)

2/ absolute intensity plot when there are less than 6 arrays in the dataset:
array.image(rawData,relative=FALSE,col.mod=4,symm=TRUE)

Other calls could be made, such as:

# spatial intensity plot on a virtual colour scale (red to yellow)
array.image(rawData,relative=FALSE)

# signs of the relative intensities versus the median array (e.g. lower or higher) with blue as lower and red as higher.
array.image(rawData,quantitative=FALSE)

# similar to the default plot, but not balanced within arrays
array.image(rawData,balance=FALSE)

# or: per experimental group
for (i in levels(experimentFactor)) {
  array.image(rawData[, experimentFactor == i], postfix = paste("_",i), balance=FALSE)
}

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
pcutnumericoptionalEither a numeric value or a vector of two numeric
values, both in the interval [0, 0.5], used a
color saturation limits as a percentage of the
data range. If one value is provided, this is
taken both a lower and upper saturation limit.
NULL. This means that
for relative plots a
value of 0.001 will
be used, and for other
plots values of
c(0.01, 0.05).
relativelogicaloptionalIs the plot to be created a plot of each array
relative to the median array, or not (i.e. an
absolute plot)?
TRUE
symmlogicaloptionalShould a symmetric color scale be used? TRUE if "relative" is
TRUE, FALSE otherwise.
balancelogicaloptionalShould the arrays first be balanced for their
average intensities?
TRUE if "relative" is
TRUE, FALSE otherwise.
quantitativelogicaloptionalShould the plot be quantitative or qualitative
(i.e. only indicate the sign of the value)?
TRUE if "relative"is
TRUE, FALSE otherwise.
Setting "quantitative"
to TRUE has no effect
if "relative" is FALSE,
a warning will be
produced.
col.modnumericoptionalA numeric value used as a modifier for the
color range. A value of 1 means no modification
(linear), smaller leads to faster saturation,
larger to slower saturation.
1
postfixcharacteroptionalString to be attached to the file names produced.""
arraysnumericoptionalWhich arrays are to be plotted. NOTE: for
relative plot types, the median array is still
computed using all arrays in the dataset.
NULL (in which case
all arrays are plotted)
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE(S)
TYPEDESCRIPTION
png image(s)Images containing each of the requested plots, one file per six arrays. Naming is
‘virtual_array_plots’ followed by the postfix given by the user.

The nuseFun function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image with boxplots of normalized Unscaled Standard Errors (NUSE) for each array. It calls the NUSE function (affyPLM Bioconductor package) .
An object obtained by calling fitPLM (affyPLM) is needed. If this object is not provided, it will be computed internally within this function.
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

nuseFun(rawData, Data.pset=rawData.pset, experimentFactor, plotColors, legendColors

where rawData.pset has been constructed by calling:

rawData.pset <- fitPLM(rawData)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
Data.psetPLMsetoptionalAn object obtained by calling fitPLM (affyPLM).
When not provided, it is computed within the
function.
NULL
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image containing boxplots of the NUSE values per array, called ’RawDataNUSEplot’

The rleFun function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image with boxplots of Relative Log Expression (RLE) values for each array. It calls the RLE function (affyPLM Bioconductor package) .
An object obtained by calling fitPLM (affyPLM) is needed. If this object is not provided, it will be computed internally within this function.
[Description of the output image and its interpretation]

USAGE
By default, the script will call:

rleFun(rawData, Data.pset=rawData.pset, experimentFactor, plotColors, legendColors

where rawData.pset has been constructed by calling:

rawData.pset <- fitPLM(rawData)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
Data.psetPLMsetoptionalAn object obtained by calling fitPLM (affyPLM).
When not provided, it is computed within the
function.
NULL
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image containing boxplots of the NUSE values per array, called ’RawDataRLEplot’

[Top]

Correlation between arrays

The correlFun function

DESCRIPTION
This function (from functions_imagesQC.R) creates an image of the intensity correlation values of the arrays in the raw or normalized dataset (depending on the object passed). It calls correlationPlot ( affyQCReport Bioconductor package).
[Description of the output image and its interpretation]

USAGE
By default, before the normalization the script will call:

correlFun(Data=rawData)

and after normalization:

correlFun(Data=normData, normMeth=normMeth)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
normMethcharacterrequired when
Data is a
normalized
data object.
String indicating the normalization method used
(see normalizeData for more information on the
possible values).
""
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image of the intensity correlation values of the arrays,
called ‘DataArrayCorrelationPlot’

The pcaFun function

DESCRIPTION
This function (from functions_imagesQC.R) creates a Principal Component Analysis (PCA) plot of the arrays in the raw or normalized dataset (depending on the object passed). When the dataset consists of less than three arrays, no PCA plot is generated and a warning is given. Before computing the PCA each probeset’s expression values are centred on zero. If scaled_pca is TRUE, they will also be rescaled to unit variance. When the maximum length of an array (sample)name is ten characters, and there are no more than 16 samples, the array (sample)names are put within the plot, otherwise they are put in the legend.
Since computing a PCA (using the prcomp function) can be memory intensive, a try is used. Furthermore, in cases where scaling is not possible due to loss of any variation, a second attempt is done using no scaling (when scaled_pca had been set to TRUE), and a warning is given. When no PCA can be computed the image is not created, and a warning is given.
[Description of the output image and its interpretation]

USAGE
By default, before the normalization the script will call:

pcaFun(Data=rawData, experimentFactor=experimentFactor,
plotColors=plotColors, legendColors=legendColors,
namesInPlot=((max(nchar(sampleNames(rawData)))<=10)&&
(length(sampleNames(rawData))<=16))


and after normalization:

pcaFun(Data=normData, experimentFactor=experimentFactor,
normMeth=normMeth, plotColors=plotColors,
legendColors=legendColors,
namesInPlot=((max(nchar(sampleNames(rawData)))<=10)&&
(length(sampleNames(rawData))<=16))

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
experimentFactorfactorrequiredThe factor of groups.NULL
normMethcharacterrequired when
Data is a
normalized
data object.
String indicating the normalization method
used (see normalizeData for more information
on the possible values).
""
scaled_pcalogicaloptionalShould each probeset’s expression be scaled
to unit variance before proceeding? Note that
the expression is centred on zero in any case.
TRUE
plotColorscharacterrequiredVector of colors assigned to each array. NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
namesInPlotlogicaloptionalShould the array (sample)names be put within
the plot, or in the legend?
FALSE
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image with PCA plot of the arrays. Naming is either ‘Raw’ or
the normalization method, followed by ‘DataPCAanalysis’

The clusterFun function

DESCRIPTION
This function (from functions_imagesQC.R) creates a hierarchical clustering dendrogam of the arrays in the raw or normalized dataset (depending on the object passed). When the dataset consists of less than three arrays, no dendrogram is generated and a warning is given.
[Description of the output image and its interpretation]

USAGE
By default, before the normalization the script will call:

clusterFun(Data=rawData, clusterOption1=clusterOption1, clusterOption2=clusterOption2)

and after normalization:

clusterFun(Data=normData, clusterOption1=clusterOption1, clusterOption2=clusterOption2, normMeth=normMeth)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
clusterOption1characteroptionalString indicating the distance function to be
used. Possible values are “pearson”, “spearman”,
or “euclidean”.
“Pearson”
clusterOption2characteroptionalString indicating the hierarchical clustering
function to be used. Possible values are "ward",
"single", "complete", "average", "mcquitty",
"median" or "centroid".
“ward”
normMethcharacterrequired when
Data is a
normalized
data object.
String indicating the normalization method used
(see normalizeData for more information on the
possible values).
""
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE
TYPEDESCRIPTION
png imageAn image with the clustering dendrogram of the arrays.
Naming is ‘DataCluster’ followed by the name of the distance function used

[Top]

Re-annotation and normalization

The deduceSpecies function

DESCRIPTION
This function (from functions_processing) tries to determine the species related to the current chiptype, if the species has not been provided by the user (and as such is set to ""). If the descr parameter is not provided or is empty, an empty string is returned as species. In other cases the function tries to load an annotation library depending on the chiptype to find the species. If not successful, it will be set by hand for some predefined chiptypes. If still not successful, the empty string is returned.

USAGE
By default, the script will call (if customCDF is TRUE and species is ""):

species <- deduceSpecies(rawData@annotation)

INPUT PARAMETER
NAMETYPESTATUSDESCRIPTIONDEFAULT
descrcharacterrequiredA string indicating the chiptype, which can be obtained
by getting the @annotation slot from an AffyBatch object.
NULL

OUTPUT VALUE
TYPEDESCRIPTION
characterThe species associated with the current chiptype, or ""
if detection was unsuccessful

The normalizeData function

DESCRIPTION
This function (from functions_processing) normalizes the data in the AffyBatch object provided. Currently, GCRMA, RMA, and PLIER normalization are supported. For GCRMA, fast normalization is not used, as this gives unreliable results. For PLIER, justPlier (plier Bioconductor package) is used, with the "together" option for arrays having perfect match and mismatch probes, and the "PMonly" option for arrays with perfect match probes only.
When normalization per experimental group is selected, the function makes sure that still one normalized data object including all arrays is returned. In case customCDF is TRUE, annotation is updated using BrainArray custom cdf environments, before proceeding with the normalization (and summarisation of probe expressions into probeset expressions). To update the annotation, a sub call is made to the addUpdatedCDFenv function.

USAGE
By default, the script will call, if customCDF is TRUE:

normData <- normalizeData(rawData, normMeth,
perGroup=(normOption1=="group"), experimentFactor, customCDF,
species, CDFtype)


or, if customCDF is FALSE:

normData <- normalizeData(rawData, normMeth,
perGroup=(normOption1=="group"), experimentFactor, customCDF)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
normMethcharacterrequiredString indicating the normalization method used.
Possible values: RMA, GCRMA, PLIER or none.
""
perGrouplogicaloptionalShould normalization be performed per experimental
group (e.g. when global differences are expected
between groups) or for the dataset as a whole?
FALSE
experimentFactorfactorrequired if
perGroup is TRUE
The factor of groups.NULL
customCDFlogicaloptionalShould annotation of the chip be updated before
normalizing the data (and building the probesets
out of the separate probes)? If requested, this
is done using BrainArray updated cdf environments,
c.f. addUpdatedCDFenv.
TRUE
speciescharacterrequired when
customCDF is
TRUE, c.f.
addUpdatedCDFenv
The species associated with the chip type. NULL
CDFtypecharacterrequired when
customCDF is
TRUE, c.f.
addUpdatedCDFenv
The type of custom cdf requested.NULL
aTypecharacteroptionalString indicating the type of the current chip,
either “PMMM” for chips with perfect match and
mismatch probes, or “PMonly” for chips with
perfect match probes only. Required if normMeth
is “PLIER”.
NULL

OUTPUT IMAGE AND VALUE and array
TYPEDESCRIPTION
png fileA reference sheet indicating the cdf annotation (cdfName) used,
the normalization method used, and if this is the case stating
that normalization has been performed per experimental group
ExpressionSetnormalized data object

The addUpdatedCDFenv function

DESCRIPTION
This function (from functions_processing) tries to find and load an updated cdf environment from BrainArray for the current chiptype. If this is not successful, a warning will be generated and the raw data object returned as is, i.e. with the cdf annotation that it already had, if any. Note that the IDs given to the reporters are artificially created by adding “_at” as a postfix to the ID from the entry in the database that the probeset corresponds to (c.f. also createNormDataTable).
Note also that reannotation takes places at a probe level, not at a probeset level. This means that completely new probesets are constructed, not necessarily equal in size.

USAGE
By default, the script will call, if customCDF is TRUE (from within the normalizeData and computePMAtable functions, leaving the original Data intact and using Data.copy to further process):

Data.copy <- Data
Data.copy <- addUpdatedCDFenv(Data.copy, species, CDFtype)

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
speciescharacterrequiredThe species associated with the chip type *NULL
typecharacteroptionalThe type of custom cdf requested **
This parameter indicates the database based on which
an updated cdf environment should be selected.
“ENSG”

OUTPUT VALUE
TYPEDESCRIPTION
variable of class AffyBatchThe object with a updated cdf annotation assigned if found

* Possible values are:
- Abbreviated: "Ag", "At", "Bt", "Ce", "Cf", "Dr", "Dm", "Gg", "Hs", "MAmu", "Mm", "Os", "Rn", "Sc", "Sp", "Ss"
- Full names: "Anopheles gambiae", "Arabidopsis thaliana", "Bos taurus", "Caenorhabditis elegans",
"Canis familiaris", "Danio rerio", "Drosophila melanogaster", "Gallus gallus", "Homo sapiens",
"Macaca mulatta", "Mus musculus", "Oryza sativa", "Rattus norvegicus", "Saccharomyces
cerevisiae", "Schizosaccharomyces pombe", "Sus scrofa"

** Possible values are:
"ENTREZG", "REFSEQ", "ENSG", "ENSE", "ENST", "VEGAG", "VEGAE", "VEGAT", "TAIRG", "TAIRT",
"UG", "MIRBASEF", "MIRBASEG"

The createNormDataTable function

DESCRIPTION
This function (from functions_processing) prepares a table suitable for visualisation and saving to disk from a normalized data object. In case of use of an updated cdf annotation (customCDF is TRUE), it will remove the artificial “_at” postfixes from all probeset IDs, apart from the affx controls, in order to get the real IDs from the database used to create the update. Also, in case an updated cdf environment based on Ensemble Gene ID (“ENSG”) has been used, the function tries to connect to BioMart (using the biomaRt BioConductor package) in order to add two extra columns of information to the table: the common gene name, and a gene description. This will (for now) not be done for other updated cdf types, as there is no one-to-one mapping between BioMart entries and these IDs or it has not been sufficiently tested. Note that the BioMart connection may sometimes not be established (e.g. if the service is down or busy), it may in such cases be worthwhile to try again.

USAGE
By default, the script will call:
normDataTable <- createNormDataTable(normData, customCDF, species, CDFtype)

After this function has been called, the normDataTable object is saved to a tab-delimited text file by the main script.

INPUT PARAMETERS
NAMETYPESTATUSDESCRIPTIONDEFAULT
normDataExpressionSetrequiredThe normalized data object  
customCDFlogicalrequiredHas the normalized data object been created after
updating the cdf annotation using BrainArray
updated cdf environments, c.f. normalizeData above.
NULL
speciescharacterrequired when
customCDF is
TRUE
The species associated with the chip type.NULL
CDFtypecharacterrequired when
customCDF is
TRUE
The type of custom cdf requested.NULL

OUTPUT VALUE
TYPEDESCRIPTION
data.frameTable with normalized data, and possible extra annotation

The following figure shows an example of the table generated for normalized data. it was obtained with a GC-RMA normalization applied on the example_dataset1 (see Download page), using the ensemble annotation (CDF was customized). As you can see, the first column (ENSG_ID) contains ensembl ids that were mapped using biomaRt package; two extra columns were added (external_gene_id and description) to give details on the genes.

NormDataTable

[Top]