Author
Listed:
- Morten Hansen
- Thomas Alexander Gerds
- Ole Haagen Nielsen
- Jakob Benedict Seidelin
- Jesper Thorvald Troelsen
- Jørgen Olsen
Abstract
Analyzing data obtained from genome-wide gene expression experiments is challenging due to the quantity of variables, the need for multivariate analyses, and the demands of managing large amounts of data. Here we present the R package pcaGoPromoter, which facilitates the interpretation of genome-wide expression data and overcomes the aforementioned problems. In the first step, principal component analysis (PCA) is applied to survey any differences between experiments and possible groupings. The next step is the interpretation of the principal components with respect to both biological function and regulation by predicted transcription factor binding sites. The robustness of the results is evaluated using cross-validation, and illustrative plots of PCA scores and gene ontology terms are available. pcaGoPromoter works with any platform that uses gene symbols or Entrez IDs as probe identifiers. In addition, support for several popular Affymetrix GeneChip platforms is provided. To illustrate the features of the pcaGoPromoter package a serum stimulation experiment was performed and the genome-wide gene expression in the resulting samples was profiled using the Affymetrix Human Genome U133 Plus 2.0 chip. Array data were analyzed using pcaGoPromoter package tools, resulting in a clear separation of the experiments into three groups: controls, serum only and serum with inhibitor. Functional annotation of the axes in the PCA score plot showed the expected serum-promoted biological processes, e.g., cell cycle progression and the predicted involvement of expected transcription factors, including E2F. In addition, unexpected results, e.g., cholesterol synthesis in serum-depleted cells and NF-κB activation in inhibitor treated cells, were noted. In summary, the pcaGoPromoter R package provides a collection of tools for analyzing gene expression data. These tools give an overview of the input data via PCA, functional interpretation by gene ontology terms (biological processes), and an indication of the involvement of possible transcription factors.
Suggested Citation
Morten Hansen & Thomas Alexander Gerds & Ole Haagen Nielsen & Jakob Benedict Seidelin & Jesper Thorvald Troelsen & Jørgen Olsen, 2012.
"pcaGoPromoter - An R Package for Biological and Regulatory Interpretation of Principal Components in Genome-Wide Gene Expression Data,"
PLOS ONE, Public Library of Science, vol. 7(2), pages 1-13, February.
Handle:
RePEc:plo:pone00:0032394
DOI: 10.1371/journal.pone.0032394
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0032394. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.