Title: | Bayesian Analysis of qRT-PCR Data |
---|---|
Description: | Quantitative RT-PCR data are analyzed using generalized linear mixed models based on lognormal-Poisson error distribution, fitted using MCMC. Control genes are not required but can be incorporated as Bayesian priors or, when template abundances correlate with conditions, as trackers of global effects (common to all genes). The package also implements a lognormal model for higher-abundance data and a "classic" model involving multi-gene normalization on a by-sample basis. Several plotting functions are included to extract and visualize results. The detailed tutorial is available here: <https://matzlab.weebly.com/uploads/7/6/2/2/76229469/mcmc.qpcr.tutorial.v1.2.4.pdf>. |
Authors: | Mikhail V. Matz |
Maintainer: | Mikhail V. Matz <[email protected]> |
License: | GPL-3 |
Version: | 1.2.4 |
Built: | 2025-03-30 02:54:37 UTC |
Source: | https://github.com/cran/MCMC.qpcr |
This package implements generalized linear mixed model analysis of qRT-PCR data so that the increase of variance towards higher Ct values is properly dealt with, and the lack of amplification is informative (function mcmc.qpcr). Sample-loading effects, gene-specific variances, and responses of all genes to each factor combination are all jointly estimated within a single model. The control genes can be specified as priors, with adjustable degree of expected stability. The analysis also works well without any control gene specifications.
For higher-abundance datasets, a lognormal model is implemented that does not require Cq to counts conversion (function mcmc.qpcr.lognormal).
For higher-abundance datasets datasets in which the quality and/or quantity of RNA samples varies systematically (rather than randomly) across conditions, the analysis based on multigene normalization is implemented (function mcmc.qpcr.classic).
The package includes several functions for plotting the results and calculating statistical significance (HPDplot, HPDplotBygene, HPDplotBygeneBygroup).
The detailed step-by-step tutorial is here: http://www.bio.utexas.edu/research/matz_lab/matzlab/Methods_files/mcmc.qpcr.tutorial.pdf.
Package: MCMC.qpcr |
Type: Package |
Version: 1.2.3 |
Date: 2016-11-07 |
License: GPL-3 |
Mikhail V. Matz, University of Texas at Austin <[email protected]>
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
data(beckham.data) data(beckham.eff) # analysing the first 5 genes # (to try it with all 10 genes, change the line below to gcol=4:13) gcol=4:8 ccol=1:3 # columns containing experimental conditions # recalculating into molecule counts, reformatting qs=cq2counts(data=beckham.data,genecols=gcol, condcols=ccol,effic=beckham.eff,Cq1=37) # creating a single factor, 'treatment.time', out of 'tr' and 'time' qs$treatment.time=as.factor(paste(qs$tr,qs$time,sep=".")) # fitting a naive model naive=mcmc.qpcr( fixed="treatment.time", data=qs, nitt=3000,burnin=2000 # remove this line in actual analysis! ) #summary plot of inferred abundances #s1=HPDsummary(model=naive,data=qs) #summary plot of fold-changes relative to the global control s0=HPDsummary(model=naive,data=qs,relative=TRUE) #correcting p-values for multiple comparisons s0.adj=padj.hpdsummary(s0,controls=c("gapdh")) # pairwise differences and their significances for each gene: s0.adj$geneWise
data(beckham.data) data(beckham.eff) # analysing the first 5 genes # (to try it with all 10 genes, change the line below to gcol=4:13) gcol=4:8 ccol=1:3 # columns containing experimental conditions # recalculating into molecule counts, reformatting qs=cq2counts(data=beckham.data,genecols=gcol, condcols=ccol,effic=beckham.eff,Cq1=37) # creating a single factor, 'treatment.time', out of 'tr' and 'time' qs$treatment.time=as.factor(paste(qs$tr,qs$time,sep=".")) # fitting a naive model naive=mcmc.qpcr( fixed="treatment.time", data=qs, nitt=3000,burnin=2000 # remove this line in actual analysis! ) #summary plot of inferred abundances #s1=HPDsummary(model=naive,data=qs) #summary plot of fold-changes relative to the global control s0=HPDsummary(model=naive,data=qs,relative=TRUE) #correcting p-values for multiple comparisons s0.adj=padj.hpdsummary(s0,controls=c("gapdh")) # pairwise differences and their significances for each gene: s0.adj$geneWise
Fold-amplification within each PCR cycle for each qRT-PCR assay. Determined by qPCR of serial dilutions of the template (see Pfaffl 2001, Nucleic Acids Res 29:00)
data(amp.eff)
data(amp.eff)
A data frame with 17 observations on the following 3 variables.
gene
a factor with levels actin
adk
c3
chrom
clect
eif3h
g3pdh
gsp2
hsp16
hsp60
hsp90
nd5
r18s
rpl11
spon2
tgoln
ubl3
efficiency
a numeric vector
Cq1
a numeric vector
data(amp.eff)
data(amp.eff)
A typical qRT-PCR dataset where a series of treatments are compared to the global control.
data(beckham.data)
data(beckham.data)
A data frame with 45 observations on the following 13 variables.
sample
cDNA sample (biological replicate): a factor with levels a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
tr
treatment: a factor with levels ctl
:control hs
:heat stress pshs
:pre-stress followed by heat stress
time
hours of heat stress exposure: a numeric vector
gapdh
a numeric vector
hsp110
a numeric vector
hspb
a numeric vector
egr
a numeric vector
gadd
a numeric vector
dnajb1
a numeric vector
dnajb4
a numeric vector
atf
a numeric vector
dnaja4
a numeric vector
fos
a numeric vector
Includes global control (ctl), heat stressed cells (hs), and cells pre-stressed by mild heating prior to heat stress (pshs). Two time points: one hour and three hours.
data for Figure 6 in Beckham et al. Microarray analysis of cellular thermotolerance. Lasers Surg Med. 2010 Dec;42(10):752-65. doi: 10.1002/lsm.20983
data(beckham.data)
data(beckham.data)
amplification efficiencies for beckham.data
data(beckham.eff)
data(beckham.eff)
A data frame with 11 observations on the following 2 variables.
gene
a factor with levels atf
dnaja4
dnajb1
dnajb4
egr
fos
gadd
gapdh
hsp110
hspb
r18s
eficiency
a numeric vector
Beckham et al. Microarray analysis of cellular thermotolerance. Lasers Surg Med. 2010 Dec;42(10):752-65. doi: 10.1002/lsm.20983
data(beckham.eff)
data(beckham.eff)
Timepoint one is one day of heat-light stress; timepoint two is recovery from it the next day. Controls are the fragments never exposed to stress. Individual denotes a coral colony that yielded several fragments for different treatments.
data(coral.stress)
data(coral.stress)
A data frame with 64 observations on the following 19 variables.
sample
a numeric vector
individual
a factor with levels s1
s12
s13
s15
s2
s4
s5
s8
condition
a factor with levels control
heat
timepoint
a factor with levels one
two
hsp16
a numeric vector
actin
a numeric vector
adk
a numeric vector
c3
a numeric vector
chrom
a numeric vector
clect
a numeric vector
eif3h
a numeric vector
g3pdh
a numeric vector
gsp2
a numeric vector
hsp60
a numeric vector
hsp90
a numeric vector
nd5
a numeric vector
rpl11
a numeric vector
spon2
a numeric vector
ubl3
a numeric vector
data(coral.stress)
data(coral.stress)
Converts Cq values into molecule counts, and stacks the dataset
cq2counts(data, genecols, condcols, effic, Cq1 = NULL)
cq2counts(data, genecols, condcols, effic, Cq1 = NULL)
data |
Raw qRT-PCR dataset, one Cq column per gene, plus columns containing factors. The Cq columns, in addition to the proper Cq values, may contain NA (missing data) and -1, which means no amplification observed (i.e., zero target molecules at the start of qPCR reaction). Column headers are either gene names or factor names. Any number of fixed factors is allowed; any number of random factors that are gene-specific scalars (such as effect of genotype, or block) Must have a column called "sample", denoting individual cDNA preps. Technical replicates should not be averaged, they should be represented as independent rows with the same sample ID. |
genecols |
columns that contain Cq data |
condcols |
columns corresponding to factors, including "sample" factor |
effic |
The PCR efficiency data for each of the analyzed genes. This is data frame with two columns: gene name (must exactly match the headers of gene columns in Cq data table!) and efficiency (fold- amplification per PCR cycle, determined from qPCR of serial dilutions; see PrimEff() function ) |
Cq1 |
The Cq of a single molecule. If left unspecified, it will be calculated from the efficiency (E) using approximate formula Cq1=51.6-7.56*E, derived empirically for Roche's LightCycler 480. Cq1 does not seem to have much effect on relative quantification results unless it is wildly off (by 2-3 cycles). For an unknown qPCR instrument a single Cq1=37 could be assumed for all genes. Note: If all experimental Cq values are less than 30, Cq1 variation (within a reasonable range, 35-39) will not have any effect on the results whatsoever, so just go for Cq1=37. |
Returns a dataframe with a single response variable column ("count"), gene column ("gene") and several columns containing factors.
Note: The purpose of Cq to counts conversion is to enable generalized linear model analysis, which would take care of the heteroscedasticity and occasional 'empty' amplification trials for low-abundant targets. Although this works well, the absolute values of molecule counts returned by cq2counts are still approximate, so they should not be relied upon for true single-molecule analysis.
Mikhail V. Matz, UT Austin <[email protected]>
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
data(coral.stress) data(amp.eff) genecolumns=c(5:19) # specifying where the Ct data are in the data table conditions=c(1:4) # specifying data table columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=coral.stress,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37)
data(coral.stress) data(amp.eff) genecolumns=c(5:19) # specifying where the Ct data are in the data table conditions=c(1:4) # specifying data table columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=coral.stress,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37)
Does similar procedures as cq2log, but in the end converts the data into relative expression values, leaves it unstacked, and removes factor columns.
cq2genorm(data, genes, effic, noamp=38)
cq2genorm(data, genes, effic, noamp=38)
data |
raw qRT-PCR data; see help for cq2counts for details on formatting |
genes |
Vector of names of the potential control genes. Make sure you select only the potential control genes for this analysis, otherwise geNorm might prefer actual responsive genes if they co-vary and have very low variance. |
effic |
PCR efficiency data for each of the analyzed genes; see help for cq2counts for details on formatting. |
noamp |
what to do about Ct values '-1', denoting no amplification. By default, these will be assigned an arbitrarily low expression value corresponding to slightly less than one molecule. Specify 'noamp=NA' if you want to skip samples containing any of these. |
See cq2counts help page for details.
A dataset to be fed into the function selectHKgenes (package SLqPCR).
Mikhail V. Matz, UT Austin
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
Log-transforms and efficiency-corrects Cq values, converting them to natural logs fo relative abundances, and stacks the dataset
cq2log(data, genecols, condcols, effic, noamp = 38, stacked=TRUE)
cq2log(data, genecols, condcols, effic, noamp = 38, stacked=TRUE)
data |
Raw qRT-PCR dataset, one Cq column per gene, plus columns containing factors. The Cq columns, in addition to the proper Cq values, may contain NA (missing data) and -1, which means no amplification observed (i.e., zero target molecules at the start of qPCR reaction). Column headers are either gene names or factor names. Any number of fixed factors is allowed; any number of random factors that are gene-specific scalars (such as effect of genotype, or block) Must have a column called "sample", denoting individual cDNA preps. Technical replicates should not be averaged, they should be represented as independent rows with the same sample ID. |
genecols |
columns containing Cq data |
condcols |
columns corresponding to factors, including "sample" factor |
effic |
The PCR efficiency data for each of the analyzed genes. This is data frame with two columns: gene name (must exactly match the headers of gene columns in Cq data table!) and efficiency (fold- amplification per PCR cycle, determined from qPCR of serial dilutions; see PrimEff() function ) |
noamp |
Value to replace instances of no amplification with. These instances would be coded by -1 in the data table. Specify 'noamp=NA' if you want to disregard them, but by default they will be converted into an arbitrarily low value, 38 |
stacked |
Logical: whether to return stacked data for mcmc.qpcr modeling, or the originally-formatted table. |
The models that process cq2log output are expected to work well for datasets that don't have too many Cq values above 30 and don't have instances of no amplification. For examples, see cq2counts function.
Returns a dataframe with a single response variable column ("count", even though it is actually a log-transformed relative abundance value), gene column ("gene") and several columns containing factors.
Mikhail V. Matz, UT Austin
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
Predicted vs observed plot tests for linearity, Scale-location plot tests for homoscedasticity, and Normal QQ plot tests for normality of the residuals.
diagnostic.mcmc(model, ...)
diagnostic.mcmc(model, ...)
model |
MCMCglmm object (a model fitted by mcmc.qpcr or mcmc.qpcr.gauss), obtained with additional options, 'pl=T, pr=T' |
... |
Various plot() options to modify color, shape and size of the plotteed points. |
A plot with three panels.
Mikhail V. Matz, UT Austin <[email protected]>
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
# loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) # extracting a subset of data cs.short=subset(coral.stress, timepoint=="one") genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=cs.short,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the model mm=mcmc.qpcr( fixed="condition", data=dd, controls=c("nd5","rpl11"), pr=TRUE,pl=TRUE, # these flags are necessary for diagnostics nitt=4000 # remove this line when analyzing real data! ) diagnostic.mcmc(mm)
# loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) # extracting a subset of data cs.short=subset(coral.stress, timepoint=="one") genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=cs.short,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the model mm=mcmc.qpcr( fixed="condition", data=dd, controls=c("nd5","rpl11"), pr=TRUE,pl=TRUE, # these flags are necessary for diagnostics nitt=4000 # remove this line when analyzing real data! ) diagnostic.mcmc(mm)
Cq data for a series of four-fold dilutions for two targets; the input for PrimEff()
data(dilutions)
data(dilutions)
A data frame with 76 observations on the following 3 variables.
dna
a numeric vector
cq
a numeric vector
gene
a factor with levels chrom
eif3h
data(dilutions) PrimEff(dilutions)
data(dilutions) PrimEff(dilutions)
Generates a table of model-derived log2-transformed transcript abundances without global sample effects (i.e., corresponding to efficiency-corrected and normalized qPCR data)
getNormalizedData(model, data, controls=NULL)
getNormalizedData(model, data, controls=NULL)
model |
qPCR model: the output of mcmc.qpcr or mcmc.qpcr.lognormal function fitted with two additional options: random="sample", pr=TRUE . These options do not change the inferences of main effects but make it possible to retain among-sample variation of expression for each gene while still subtracting the global sample effects (i.e., perform "normalization") |
data |
The dataset that was analysed to generate the model (output of cq2counts or cq2log functions) |
controls |
List of control genes; required if the mcmc.qpcr model was fit with the option normalize=TRUE |
The function returns a list of two data frames. The first one, normData, is the model-predicted log2-transformed transcript abundances table. It has one column per gene and one row per sample. The second data frame, conditions, is a table of experimental conditions corresponding to the normData table.
Mikhail V. Matz, University of Texas at Austin <[email protected]>
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
library(MCMC.qpcr) # loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=coral.stress,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the model (must include random="sample", pr=TRUE options) mm=mcmc.qpcr( fixed="condition", data=dd, controls=c("nd5","rpl11"), nitt=4000, pr=TRUE, random="sample" ) # extracting model predictions pp=getNormalizedData(mm,dd) # here is the normalized data: pp$normData # and here are the corresponding conditions: pp$conditions # putting them together for plotting: ppcombo=cbind(stack(pp$normData),rep(pp$conditions)) names(ppcombo)[1:2]=c("expression","gene") # plotting boxplots of normalized data: ggplot(ppcombo,aes(condition,expression,colour=timepoint))+ geom_boxplot()+ facet_wrap(~gene,scales="free")+ theme_bw()
library(MCMC.qpcr) # loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=coral.stress,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the model (must include random="sample", pr=TRUE options) mm=mcmc.qpcr( fixed="condition", data=dd, controls=c("nd5","rpl11"), nitt=4000, pr=TRUE, random="sample" ) # extracting model predictions pp=getNormalizedData(mm,dd) # here is the normalized data: pp$normData # and here are the corresponding conditions: pp$conditions # putting them together for plotting: ppcombo=cbind(stack(pp$normData),rep(pp$conditions)) names(ppcombo)[1:2]=c("expression","gene") # plotting boxplots of normalized data: ggplot(ppcombo,aes(condition,expression,colour=timepoint))+ geom_boxplot()+ facet_wrap(~gene,scales="free")+ theme_bw()
Calculates and plots posterior means with 95% credible intervals for specified fixed effects (or their combination) for all genes
HPDplot(model, factors, factors2 = NULL, ylimits = NULL, hpdtype = "w", inverse = FALSE, jitter = 0, plot = TRUE, grid = TRUE, zero = TRUE, ...)
HPDplot(model, factors, factors2 = NULL, ylimits = NULL, hpdtype = "w", inverse = FALSE, jitter = 0, plot = TRUE, grid = TRUE, zero = TRUE, ...)
model |
The output of mcmc.qpcr function. |
factors |
A vector of names of fixed effects of interest; see details. |
factors2 |
A second vector of fixed effect names to be subtracted from the first; see details. |
ylimits |
Y-limits for the plot such as c(-3,6); autoscale by default. |
hpdtype |
Specify hpdtype="l" to plot the upper and lower 95% credible limits as a continuous dashed line across all genes. This is useful to compare credible intervals among several models on the same plot. By default (hpdtype="w") the limits are plotted as whiskers around each point. |
inverse |
Plot the inverse of the result. |
jitter |
For hpdtype="w", shifts the plotted values and whiskers by the specified distance along the x axis (reasonable jitter values are 0.15 or -0.15, for example). This helps plot several results (different models or factor combinations) on the same plot (use HPDpoints to add to existing plot) |
plot |
if plot = FALSE the function returns a table of calculated posterior modes, means, upper and lower 95% credible limits (all on log(2) scale), and two types of p-values: derived from Bayesian z-scores, and derived directly from MCMC sample. All such outputs for a given experiment should be concatenated with rbind and processed by padj.qpcr() function to adjust the p-values for multiple comparisons (disregarding the entries corresponding to control genes) |
grid |
Whether to draw vertical grid lines to separate genes. |
zero |
Whether to draw a horizontal line at 0. |
... |
Various plot() options; such as col (color of lines and symbols), pch (type of symbol), main (plot title) etc. |
Use summary(MCMCglmm object) first to see what fixed effect names are actually used in the output. For example, if summary shows:
gene1:conditionheat
gene2:conditionheat
....
gene1:timepointtwo
gene2:timepointtwo
....
gene1:conditionheat:timepointtwo
gene2:conditionheat:timepointtwo
, it is possible to specify factors="conditionheat" to plot only the effects of the heat.
If a vector of several fixed effect names is given, for example: factors=c("timepointtwo","treatmentheat:timepointtwo") the function will plot the posterior mean and credible interval for the sum of these effects.
If a second vector is also given, for example,
factors=c("f1","f2"), factors2=c("f3","f4")
the function will plot the difference between the sums of these two groups of factors.
This is useful for pairwise analysis of differences in complicated designs.
A plot or a table (plot = F).
Use the function HPDpoints() if you need to add graphs to already existing plot.
Mikhail V. Matz, UT Austin <[email protected]>
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
# loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) # extracting a subset of data cs.short=subset(coral.stress, timepoint=="one") genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=cs.short,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the model mm=mcmc.qpcr( fixed="condition", data=dd, controls=c("nd5","rpl11"), nitt=3000,burnin=2000 # remove this line when analyzing real data! ) # plotting log2(fold change) in response to heat stress for all genes HPDplot(model=mm,factors="conditionheat",main="response to heat stress")
# loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) # extracting a subset of data cs.short=subset(coral.stress, timepoint=="one") genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=cs.short,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the model mm=mcmc.qpcr( fixed="condition", data=dd, controls=c("nd5","rpl11"), nitt=3000,burnin=2000 # remove this line when analyzing real data! ) # plotting log2(fold change) in response to heat stress for all genes HPDplot(model=mm,factors="conditionheat",main="response to heat stress")
For a particular gene, plots model-predicted values (and credible intervals) for a series of specified fixed effect combinations ("conditions").
HPDplotBygene(model, gene, conditions, pval = "mcmc", newplot = T, ylimits = NULL, inverse = F, jitter = 0, plot = T, yscale = "log2", interval = "ci", grid = F, zero = F, ...)
HPDplotBygene(model, gene, conditions, pval = "mcmc", newplot = T, ylimits = NULL, inverse = F, jitter = 0, plot = T, yscale = "log2", interval = "ci", grid = F, zero = F, ...)
model |
model object produced by MCMC.qpcr() |
gene |
name of the gene to plot |
conditions |
A list naming the conditions to plot and defining them as combination of fixed effects (See example below). '0' denotes gene-specific intercept. |
pval |
Type of p-value to report: 'mcmc' - MCMC-based (default), 'z' - based on Bayesian z-score. Use 'z' to approximate p-values lower than 2/[MCMC sample size] (with default MCMC.qpcr settings, this comes to 0.002) |
newplot |
When TRUE, a new plot should be created. When FALSE, or a graph will be added to an existing plot. |
ylimits |
Y-limits for the plot such as c(-3,6); autoscale by default. |
inverse |
When TRUE, the inverse of the data will be plotted. |
jitter |
Shifts the plotted values and whiskers by the specified distance along the x axis (reasonable jitter values are 0.15 or -0.15, for example). This helps plot several graphs on the same plot without overlapping. |
plot |
When FALSE, no plot will be generated; the function will just list mean pairwise differences and p-values. |
yscale |
Scale on which to represent the data. In all mcmc.qpcr models the model scale is natural logarithm, which I prefer to translate into log2 or log10 (if the differences are orders of magnitude) for better human readability. The default is 'log2'; other options are 'log10' and 'native' (no rescaling of the model data). There is also a beta-option 'proportion', which is not useful for qPCR. It was added to cannibalize HPDplotBygene function for plotting results of the model operating with arcsin-square root transfromed proportions. With yscale="proportions", the plot will be on the original proportion scale but the tukey-like differences will still be reported on the asin(sqrt()) transformed scale. |
interval |
'ci' (default) will plot 95% credible limits of the posterior distribution, 'sd' will plot the mean plus/minus one standard deviation of the posterior. |
grid |
When TRUE, a vertical grid separating conditions will be added |
zero |
When TRUE, a y=0 line will be added. |
... |
Various plot() options. |
Generates a point-whiskers plot, lists pairwise mean differenes between all conditions, calculates and lists pairwise p-values (not corrected for multiple testing).
Mikhal V. Matz, UT Austin, [email protected]
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
# loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) # extracting a subset of data cs.short=subset(coral.stress, timepoint=="one") genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=cs.short,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the model mm=mcmc.qpcr( fixed="condition", data=dd, controls=c("nd5","rpl11"), nitt=3000,burnin=2000 # remove this line when analyzing real data! ) # plotting abundances of individual genes across all conditions # step 1: defining conditions cds=list( control=list(factors=0), # gene-specific intercept stress=list(factors=c(0,"conditionheat")) # multiple effects will be summed up ) # step 2: plotting gene after gene on the same panel HPDplotBygene(model=mm,gene="actin",conditions=cds,col="cyan3", pch=17,jitter=-0.1,ylim=c(-3.5,15),pval="z") HPDplotBygene(model=mm,gene="hsp16",conditions=cds, newplot=FALSE,col="coral",pch=19,jitter=0.1,pval="z") # step 3: adding legend legend(0.5,10,"actin",lty=1,col="cyan3",pch=17,bty="n") legend(0.5,7,"hsp16",lty=1,col="coral",pch=19,bty="n")
# loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) # extracting a subset of data cs.short=subset(coral.stress, timepoint=="one") genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=cs.short,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the model mm=mcmc.qpcr( fixed="condition", data=dd, controls=c("nd5","rpl11"), nitt=3000,burnin=2000 # remove this line when analyzing real data! ) # plotting abundances of individual genes across all conditions # step 1: defining conditions cds=list( control=list(factors=0), # gene-specific intercept stress=list(factors=c(0,"conditionheat")) # multiple effects will be summed up ) # step 2: plotting gene after gene on the same panel HPDplotBygene(model=mm,gene="actin",conditions=cds,col="cyan3", pch=17,jitter=-0.1,ylim=c(-3.5,15),pval="z") HPDplotBygene(model=mm,gene="hsp16",conditions=cds, newplot=FALSE,col="coral",pch=19,jitter=0.1,pval="z") # step 3: adding legend legend(0.5,10,"actin",lty=1,col="cyan3",pch=17,bty="n") legend(0.5,7,"hsp16",lty=1,col="coral",pch=19,bty="n")
For a specified gene, makes overlayed plots such as produced by HPDplotBygene()
HPDplotBygeneBygroup(model, gene, group1, group2, group3 = NULL, interval = "ci", colors = c("coral", "cyan3", "grey50"), symbols = c(19, 17, 15), jitter = 0.16, yscale = "log2", ...)
HPDplotBygeneBygroup(model, gene, group1, group2, group3 = NULL, interval = "ci", colors = c("coral", "cyan3", "grey50"), symbols = c(19, 17, 15), jitter = 0.16, yscale = "log2", ...)
model |
model object produced by mcmc.qpcr() |
gene |
name of the gene to plot |
group1 |
Combination of factors defining the first group (see HPDplotBygene() for details). |
group2 |
Combination of factors defining the second group. |
group3 |
(optional) Combination of factors defining the third group. |
interval |
'ci' (default) will plot 95% credible limits of the posterior distribution, 'sd' will plot the mean plus/minus one standard deviation of the posterior. |
colors |
Colors to use for different groups (see ?par -> col). |
symbols |
Symbols to use for different groups (see ?par -> pch). |
jitter |
Jitter distance between groups. |
yscale |
Scale on which to represent the data. In all mcmc.qpcr models the model scale is natural logarithm, which I prefer to translate into log2 or log10 (if the differences are orders of magnitude) for better human readability. The default is 'log2'; other options are 'log10' and 'native' (no rescaling of the model data). There is also a beta-option 'proportion', which is not useful for qPCR. It was added to cannibalize HPDplotBygene function for plotting results of the model operating with arcsin-square root transfromed proportions. With yscale="proportions", the plot will be on the original proportion scale but the tukey-like differences will still be reported on the asin(sqrt()) transformed scale. |
... |
additional parameters for HPDplotBygene() function, such as pval (see HPDplotBygene() help) |
Generates a point-whiskers plot, lists pairwise mean differenes between all conditions, calculates and lists pairwise p-values (not corrected for multiple testing).
Mikhal V. Matz, UT Austin <[email protected]>
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
Calculates and plots posterior means with 95% credible intervals for specified fixed effects (or their combination) for all genes. HPDpoints only adds graphs to an existing plot.
HPDpoints(model, factors, factors2 = NULL, ylimits = NULL, hpdtype = "w", inverse = F, jitter = 0, ...)
HPDpoints(model, factors, factors2 = NULL, ylimits = NULL, hpdtype = "w", inverse = F, jitter = 0, ...)
model |
The output of mcmc.qpcr function. |
factors |
A vector of names of fixed effects of interest; see details. |
factors2 |
A second vector of fixed effect names to be subtracted from the first; see details. |
ylimits |
Y-limits for the plot such as c(-3,6); autoscale by default. |
hpdtype |
Specify hpdtype="l" to plot the upper and lower 95% credible limits as a continuous dashed line across all genes. By default (hpdtype="w") the limits are plotted as whiskers around each point. |
inverse |
Plot the inverse of the result. |
jitter |
For hpdtype="w", shifts the plotted values and whiskers by the specified distance along the x axis (reasonable jitter values are 0.15 or -0.15, for example). |
... |
Various plot() options; such as col (color of lines and symbols), pch (type of symbol), main (plot title) etc. |
See details in HPDplot()
A graph added to a plot.
Mikhail V. Matz, UT Austin <[email protected]>
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
# loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) # extracting a subset of data cs.short=subset(coral.stress, timepoint=="one") genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=cs.short,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the model mm=mcmc.qpcr( fixed="condition", data=dd, controls=c("nd5","rpl11"), nitt=4000 # remove this line when analyzing real data! ) # plotting log2(fold change) in response to heat stress for all genes HPDplot(model=mm,factors="conditionheat",main="response to heat stress")
# loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) # extracting a subset of data cs.short=subset(coral.stress, timepoint=="one") genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=cs.short,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the model mm=mcmc.qpcr( fixed="condition", data=dd, controls=c("nd5","rpl11"), nitt=4000 # remove this line when analyzing real data! ) # plotting log2(fold change) in response to heat stress for all genes HPDplot(model=mm,factors="conditionheat",main="response to heat stress")
Calculates abundances of each gene across factor combinations; calculates pairwise differences between all factor combinations and their significances for each gene; plots results as bar or line graphs with credible intervals (ggplot2) NOTE: only works for experiments involving a single multi-level fixed factor or two fully crossed multi-level fixed factors.
HPDsummary(model, data, xgroup=NULL,genes = NA, relative = FALSE, log.base = 2, summ.plot = TRUE, ptype="z", ...)
HPDsummary(model, data, xgroup=NULL,genes = NA, relative = FALSE, log.base = 2, summ.plot = TRUE, ptype="z", ...)
model |
Model generated by mcmc.qpcr(),mcmc.qpcr.lognormal() or mcmc.qpcr.classic() |
data |
Dataset used to build the model (returned by cq2counts() or cq2log()) |
xgroup |
The factor to form the x-axis of the plot. By default the first factor in the model will be used. |
genes |
A vector of gene names to summarize and plot. If left unspecified, all genes will be summarized. |
relative |
Whether to plot absolute transcript abundances (relative = FALSE) or fold- changes relative to the sample that is considered to be "global control" (relative = TRUE). The "global control" is the combination of factors that served as a reference during model fitting, either because it is alphanumerically first (that happens by default) or because it has been explicitly designated as such using relevel() function (see tutorial). |
log.base |
Base of the logarithm to use. |
summ.plot |
By default, the function generates a summary plot, which is a line-points-95% credible intervals plot of log(absolute abundances) with 'relative=FALSE' and a more typical bar graph of log(fold change relative to the control), again with 95% credible intervals, with 'relative=TRUE'. Specify 'summ.plot=FALSE' if you don't want the summary plot. |
ptype |
Which type of p-values to use. By default p-values based on the Bayesian z-score are used. Specify 'ptype="mcmc"' to output more conventional p-values based on MCMC sampling (these will be limited on the lower end by the size of MCMC sample). |
... |
Additional options for summaryPlot() function. Among those, 'x.order' can be a vector specifying the order of factor levels on the x-axis. |
A list of three items:
summary |
Summary table containing calculated abundances, their SD and 95% credible limits |
geneWise |
A series of matrices listing pairwise differences between factor combinations (upper triangle) and corresponding p-values (lower triangle) |
ggPlot |
the ggplot2 object for plotting. See http://docs.ggplot2.org/0.9.2.1/theme.html for ways to modify it, such as add text, rotate labels, change fonts, etc. |
Mikhail V. Matz, University of Texas at Austin <[email protected]>
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
See function summaryPlot() for plotting the summary table in other ways.
data(beckham.data) data(beckham.eff) # analysing the first 5 genes # (to try it with all 10 genes, change the line below to gcol=4:13) gcol=4:8 ccol=1:3 # columns containing experimental conditions # recalculating into molecule counts, reformatting qs=cq2counts(data=beckham.data,genecols=gcol, condcols=ccol,effic=beckham.eff,Cq1=37) # creating a single factor, 'treatment.time', out of 'tr' and 'time' qs$treatment.time=as.factor(paste(qs$tr,qs$time,sep=".")) # fitting a naive model naive=mcmc.qpcr( fixed="treatment.time", data=qs, nitt=3000,burnin=2000 # remove this line in actual analysis! ) #summary plot of inferred abundances # s1=HPDsummary(model=naive,data=qs) #summary plot of fold-changes relative to the global control s0=HPDsummary(model=naive,data=qs,relative=TRUE) # pairwise differences and their significances for each gene: s0$geneWise
data(beckham.data) data(beckham.eff) # analysing the first 5 genes # (to try it with all 10 genes, change the line below to gcol=4:13) gcol=4:8 ccol=1:3 # columns containing experimental conditions # recalculating into molecule counts, reformatting qs=cq2counts(data=beckham.data,genecols=gcol, condcols=ccol,effic=beckham.eff,Cq1=37) # creating a single factor, 'treatment.time', out of 'tr' and 'time' qs$treatment.time=as.factor(paste(qs$tr,qs$time,sep=".")) # fitting a naive model naive=mcmc.qpcr( fixed="treatment.time", data=qs, nitt=3000,burnin=2000 # remove this line in actual analysis! ) #summary plot of inferred abundances # s1=HPDsummary(model=naive,data=qs) #summary plot of fold-changes relative to the global control s0=HPDsummary(model=naive,data=qs,relative=TRUE) # pairwise differences and their significances for each gene: s0$geneWise
A wrapper function for plot.MCMCglmm to plot diagnostic convergence plots for selected fixed effects
mcmc.converge.check(model, factors, ...)
mcmc.converge.check(model, factors, ...)
model |
output of mcmc.qpcr (or any MCMCglmm class object) |
factors |
A vector of names of fixed effects of interest; see details in HPDplot help page. |
... |
other options to pass to plot.MCMCglmm |
A series of plots for each gene-specific fixed effect.
The MCMC trace plot is on the left, to see if there is convergence (lack of systematic trend) and no autocorrelation (no low-frequency waves). If lack of convergence is suspected, try increasing number of iterations and burnin by specifying, for example, nitt=50000, burnin=5000, as additional options for mcmc.qpcr. If autocorrelation is present, increase thinning interval by specifying thin=20 in mcmc.qpcr (you might wish to increase the number of iterations, nitt, to keep the size of MCMC sample the same)
The right plot is posterior density distribution.
Mikhail V. Matz, UT Austin
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
Accessory function to HPD... function series
mcmc.pval(dat, testlim = 0, sided = 2, ptype="z")
mcmc.pval(dat, testlim = 0, sided = 2, ptype="z")
dat |
a table of MCMC samples |
testlim |
positive value to test whether the posterior crosses it (for variance components) |
sided |
sidedness of the test |
ptype |
'z' for z-score based p-values, 'mcmc' for standard sampling based p-values |
A vector of calculated Bayesian p-values. For standard 'mcmc' p=values, the minimal possible value is 1/(MCMC sample size). z-score based p-values are useful to approximate very low p-values with limited MCMC sample sizes.
Mikhail Matz, UT Austin <[email protected]>
Wrapper function for MCMCglmm by Jarrod Hadfield, designed for qRT-PCR data analysis.
mcmc.qpcr(fixed=NULL, globalFixed=NULL, random = NULL, globalRandom=NULL, data, controls = NULL, normalize=FALSE, include = NULL, m.fix = 1.2, v.fix = NULL, geneSpecRes=TRUE, Covar=FALSE, vprior="uninf", ...)
mcmc.qpcr(fixed=NULL, globalFixed=NULL, random = NULL, globalRandom=NULL, data, controls = NULL, normalize=FALSE, include = NULL, m.fix = 1.2, v.fix = NULL, geneSpecRes=TRUE, Covar=FALSE, vprior="uninf", ...)
fixed |
desired combination of fixed effects, as a text string. Do not use "*" symbol, list it fully, such as: 'factor1+factor2+factor1:factor2'. |
globalFixed |
Vector of fixed covariates (categorical or continuous) that are expected to affect all genes in the sample in the same way. These would be typically related to quality and/or quantity of RNA, such as RIN value. |
random |
A vector of names for gene-specific scalar random effects, such as 'c("effect1","effect2")'. |
globalRandom |
Random covariates (categorical only) affecting all genes, similar to globalFixed. |
data |
output of the cq2counts() function |
controls |
Vector of control gene names. These will be pushed to the back of the gene list during model fitting, in the reverse order. Controls are NOT NECESSARY for the analysis. |
normalize |
If controls are specified, requiring 'normalize=TRUE' will perform "soft normalization": the geometric mean of control genes will be used to infer global effects (common to all genes) across conditions; these will be subtracted when summarizing the data with HPDsummary(). Use this option when template abundances are correlated with conditions. Note that this is different from normalizing data within each sample, as per Vandesompele et al 2002; this would be "hard normalization" (use mcmc.qpcr.classic() if you want it). |
include |
How many of the control genes ('controls') to actually incorporate as priors during model fitting. If left unspecified, all the 'controls' will be used. If 'include=0', the model will be fitted without using any of the control genes as priors. If 'include' equals some number less than the number of 'controls', only the first 'include' of them will be used as priors. In all these cases, the 'controls' will appear in the same order in the output, in the end of the gene list rather than according to their alphabetical position among other genes. This is useful when visually comparing the results of models fitted with different number of control genes, using HPDplot and HPDpoints functions. |
m.fix |
Allowed average fold-change of the control genes in response to any fixed factor combination. |
v.fix |
Allowed residual fold-change deviation of the control genes. Applies to the residual variation term. |
geneSpecRes |
Whether the model should include gene-specific residuals. Keep it TRUE unless the model fails to converge. |
Covar |
Whether the random effects should be fitted with covariances. This option might require setting vprior="iw" or vprior="iw01" (see below) |
vprior |
The default prior is an uninformative inverse Wishart with assumed variance (V) at 1 and the degree of belief parameter (nu) at 0. With 'prior="iw"' and 'prior="iw01"' nu is equal [number of genes]-0.998, resulting in a weakly informative prior that is commonly used in this type of inference. vprior="iw" will assume large prior variance (1), vprior="iw01" will assume small prior variance (0.1). |
... |
other options for MCMCglmm function, such as nitt (number of iterations), thin (tinning interval), and burnin (number of initial iterations to disregard). For a more precise inference specify 'nitt=45000, thin=20, burnin=5000'. See MCMCglmm documentation for more details. |
This function constructs priors and runs an MCMC chain to fit a Poisson-lognormal generalized linear mixed model to the counts data.
The fixed effects for the model by default assume a gene-specific intercept, and gene-specific effect for each of the listed fixed factors.
The user-specified random effects are all assumed to be gene-specific with no covariances.
There is also a universal random factor: the scalar random effect of sample, which accounts for the unequal template loading.
Residual variances are assumed to be gene-specific with no covariances, with weakly informative inverse Wishart prior (variance=1, nu=[number of genes]-0.998).
The priors for fixed effects are diffuse gaussians with a mean at 0 and very large variances (1e+8), except for control genes, for which the prior variances are defined by the m.fix parameter. For the gene-specific random effects and residual variation, non-informative priors are used to achieve results equivalent to ML estimation. For control genes, when v.fix parameter is specified, it will be used to restrict residial variance.
Both m.fix and v.fix parameters should be specified as allowed average fold-change, so the lowest they can go is 1 (no variation).
All control genes share the same m.fix and v.fix parameters.
An MCMCglmm object. HPDsummary() function within this package summaizes these data, calculates all gene-wise credible intervals and p-values, and plots the results either as line-point-whiskers graph or a bar-whiskers graph using ggplot2 functions.
HPDsummary() only works for experiments with a single multilevel factor or two fully crossed multilevel factors. Use finctions HPDplot(), HPDplotBygene() and HPDplotBygeneBygroup() to summarize and plot more complicated designs.
For more useful operations on MCMCglmm objects, such as posterior.mode(), HPDinterval(), and plot(), see documentation for MCMCglmm package.
Mikhail V. Matz, University of Texas at Austin <[email protected]>
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
data(beckham.data) data(beckham.eff) # analysing the first 5 genes # (to try it with all 10 genes, change the line below to gcol=4:13) gcol=4:8 ccol=1:3 # columns containing experimental conditions # recalculating into molecule counts, reformatting qs=cq2counts(data=beckham.data,genecols=gcol, condcols=ccol,effic=beckham.eff,Cq1=37) # creating a single factor, 'treatment.time', out of 'tr' and 'time' qs$treatment.time=as.factor(paste(qs$tr,qs$time,sep=".")) # fitting a naive model naive=mcmc.qpcr( fixed="treatment.time", data=qs, nitt=3000,burnin=2000 # remove this line in actual analysis! ) #summary plot of inferred abundances #s1=HPDsummary(model=naive,data=qs) #summary plot of fold-changes relative to the global control s0=HPDsummary(model=naive,data=qs,relative=TRUE) # pairwise differences and their significances for each gene: s0$geneWise
data(beckham.data) data(beckham.eff) # analysing the first 5 genes # (to try it with all 10 genes, change the line below to gcol=4:13) gcol=4:8 ccol=1:3 # columns containing experimental conditions # recalculating into molecule counts, reformatting qs=cq2counts(data=beckham.data,genecols=gcol, condcols=ccol,effic=beckham.eff,Cq1=37) # creating a single factor, 'treatment.time', out of 'tr' and 'time' qs$treatment.time=as.factor(paste(qs$tr,qs$time,sep=".")) # fitting a naive model naive=mcmc.qpcr( fixed="treatment.time", data=qs, nitt=3000,burnin=2000 # remove this line in actual analysis! ) #summary plot of inferred abundances #s1=HPDsummary(model=naive,data=qs) #summary plot of fold-changes relative to the global control s0=HPDsummary(model=naive,data=qs,relative=TRUE) # pairwise differences and their significances for each gene: s0$geneWise
Normalizes the data using specified control genes, fits a single model to estimate changes at all genes. Use for datasets with not too many Cq values above 30.
mcmc.qpcr.classic(fixed = NULL, globalFixed = NULL, random = NULL, globalRandom = NULL, data, controls, genebysample = TRUE, geneSpecRes=FALSE, center = TRUE, ...)
mcmc.qpcr.classic(fixed = NULL, globalFixed = NULL, random = NULL, globalRandom = NULL, data, controls, genebysample = TRUE, geneSpecRes=FALSE, center = TRUE, ...)
fixed |
desired combination of fixed effects, as a text string. Do not use "*" symbol, list it fully, such as: 'factor1+factor2+factor1:factor2'. |
globalFixed |
Vector of fixed covariates (categorical or continuous) that are expected to affect all genes in the sample in the same way. These would be typically related to quality and/or quantity of RNA, such as RIN value. |
random |
A vector of names for gene-specific scalar random effects, such as 'c("effect1","effect2")'. |
globalRandom |
Random covariates (categorical only) affecting all genes, similar to globalFixed. |
data |
output of the cq2log() function |
controls |
Vector of control gene names. These will be pushed to the back of the gene list during model fitting, in the reverse order. |
genebysample |
Whether random gene by sample interactions should be modeled as an additional random effect. If the model fails to converge, specify 'genebysample=F'. |
geneSpecRes |
Whether the model should include gene-specific residuals. This was the default in MCMC.qpcr v.1.0; now it is switched off since it does not have any visible effect on the results (as long as genebysample=TRUE), and only makes the model converge slower or fail to converge. |
center |
Whether to center the normalized log-transformed Cq values arond the mean for each gene. Centering does not affect the inference; it only makes the plots of the results more comprehensible (in my opinion). |
... |
other options for MCMCglmm function, such as nitt (number of iterations), thin (tinning interval), and burnin (number of initial iterations to disregard). For a more precise inference (but longer runs) specify 'nitt=45000, thin=20, burnin=5000'. See MCMCglmm documentation for more details. |
This function takes an as input the log-transformed relative abundance values, performs multigene normalization as per Vandesompele et al 2002, and runs an MCMC chain to fit a lognormal linear mixed model to estimate gene expression changes jointly at all genes. It is very powerful, as long as the average stability of control genes can be trusted.
This function requires data prepared by cq2log function and must have the control genes specified; otherwise the arguments and syntax are similar to the mcmc.qpcr function.
An MCMCglmm object. See mcmc.qpcr function for details and examples.
Mikhail V. Matz, University of Texas at Austin
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
Use in combination with cq2log(), on datasets without too many Cq values above 30.
mcmc.qpcr.lognormal(fixed=NULL, globalFixed=NULL, random = NULL, globalRandom=NULL, data, controls = NULL, normalize=FALSE, include = NULL, m.fix = 1.2, v.fix = NULL, geneSpecRes=TRUE, Covar=FALSE, vprior="uninf", ...)
mcmc.qpcr.lognormal(fixed=NULL, globalFixed=NULL, random = NULL, globalRandom=NULL, data, controls = NULL, normalize=FALSE, include = NULL, m.fix = 1.2, v.fix = NULL, geneSpecRes=TRUE, Covar=FALSE, vprior="uninf", ...)
fixed |
desired combination of fixed effects, as a text string. Do not use "*" symbol, list it fully, such as: 'factor1+factor2+factor1:factor2'. |
globalFixed |
Vector of fixed covariates (categorical or continuous) that are expected to affect all genes in the sample in the same way. These would be typically related to quality and/or quantity of RNA, such as RIN value. |
random |
A vector of names for gene-specific scalar random effects, such as 'c("effect1","effect2")'. |
globalRandom |
Random covariates (categorical only) affecting all genes, similar to globalFixed. |
data |
output of the cq2counts() function |
controls |
Vector of control gene names. These will be pushed to the back of the gene list during model fitting, in the reverse order. Controls are NOT NECESSARY for the analysis. |
normalize |
If controls are specified, requiring 'normalize=TRUE' will perform "soft normalization": the geometric mean of control genes will be used to infer global effects (common to all genes) across conditions; these will be subtracted when summarizing the data with HPDsummary(). Use this option when template abundances are correlated with conditions. Note that this is different from normalizing data within each sample, as per Vandesompele et al 2002; this would be "hard normalization" (use mcmc.qpcr.classic() if you want it). |
include |
How many of the control genes ('controls') to actually incorporate as priors during model fitting. If left unspecified, all the 'controls' will be used. If 'include=0', the model will be fitted without using any of the control genes as priors. If 'include' equals some number less than the number of 'controls', only the first 'include' of them will be used as priors. In all these cases, the 'controls' will appear in the same order in the output, in the end of the gene list rather than according to their alphabetical position among other genes. This is useful when visually comparing the results of models fitted with different number of control genes, using HPDplot and HPDpoints functions. |
m.fix |
Allowed average fold-change of the control genes in response to any fixed factor combination. |
v.fix |
Allowed residual fold-change deviation of the control genes. Applies to the residual variation term. |
geneSpecRes |
Whether the model should include gene-specific residuals. Keep it TRUE unless the model fails to converge. |
Covar |
Whether the random effects should be fitted with covariances. This option might require setting vprior="iw" or vprior="iw01" (see below) |
vprior |
The default prior is an uninformative inverse Wishart with assumed variance (V) at 1 and the degree of belief parameter (nu) at 0. With 'prior="iw"' and 'prior="iw01"' nu is equal [number of genes]-0.998, resulting in a weakly informative prior that is commonly used in this type of inference. vprior="iw" will assume large prior variance (1), vprior="iw01" will assume small prior variance (0.1). |
... |
other options for MCMCglmm function, such as nitt (number of iterations), thin (tinning interval), and burnin (number of initial iterations to disregard). For a more precise inference specify 'nitt=45000, thin=20, burnin=5000'. See MCMCglmm documentation for more details. |
This function constructs priors and runs an MCMC chain to fit a lognormal linear mixed model to the log-transformed relative abundances data. The data for this function must be prepared by cq2log instead of cq2counts function; otherwise, the arguments and syntax are identical to mcmc.qpcr function.
An MCMCglmm object. See mcmc.qpcr function for details and examples.
Mikhail V. Matz, University of Texas at Austin
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
Performs multigene normalization as per Vandesompele et al 2002, centers the resulting gene expression values around the gene's means
normalize.qpcr(data, controls, center = T)
normalize.qpcr(data, controls, center = T)
data |
The output of cq2log() |
controls |
Vector of control genes, such as 'controls=c("eif3h","nd5","rpl11")' |
center |
Whether to perform centering around mean or not. |
normalized dataset
Mikhail Matz, UT Austin
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
reports samples that have too little starting material relative to others (by default, less by two standard deviations)
outlierSamples(model, data, z.cutoff = -2)
outlierSamples(model, data, z.cutoff = -2)
model |
qPCR model: the output of mcmc.qpcr or mcmc.qpcr.lognormal function fitted with pr=TRUE option |
data |
The dataset that was analysed to generate the model (output of cq2counts or cq2log functions) |
z.cutoff |
z-score cutoff to report an outlier sample. |
A vector containing outlier sample names.
Mikhail V. Matz, University of Texas at Austin <[email protected]>
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
# loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) # extracting a subset of data cs.short=subset(coral.stress, timepoint=="one") genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=cs.short,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the model mm=mcmc.qpcr( fixed="condition", data=dd, controls=c("nd5","rpl11"), nitt=4000, # remove this line when analyzing real data! pr=TRUE ) # detecting outliers outliers=outlierSamples(mm,dd) # removing outliers dd=dd[!(dd$sample %in% outliers),]
# loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) # extracting a subset of data cs.short=subset(coral.stress, timepoint=="one") genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=cs.short,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the model mm=mcmc.qpcr( fixed="condition", data=dd, controls=c("nd5","rpl11"), nitt=4000, # remove this line when analyzing real data! pr=TRUE ) # detecting outliers outliers=outlierSamples(mm,dd) # removing outliers dd=dd[!(dd$sample %in% outliers),]
Replaces raw p-values in an object returned by HPDsummary() by adjusted p-values corrected for multiple comparisons. Disregards the entries corresponding to control genes.
padj.hpdsummary(hpdsumm, controls = NULL, method = "BH")
padj.hpdsummary(hpdsumm, controls = NULL, method = "BH")
hpdsumm |
Output of HPDsummary() |
controls |
A vector of control gene names |
method |
p-value correction method (see function p.adjust), default is Benjamini-Hochberg |
HPDsummary object with original p-values replaced by the corrected ones.
Mikhail V. Matz, [email protected]
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
Takes the output of HPDplot(...,plot=FALSE), adds two columns of adjusted p-values (z and mcmc) Disregards the entries corresponding to control genes
padj.qpcr(data, controls = NULL, method = "BH")
padj.qpcr(data, controls = NULL, method = "BH")
data |
Output of HPDplot(...,plot=FALSE); may be several several such outputs concatenated with rbind |
controls |
A vector of control gene names |
method |
p-value correction method (see function p.adjust), default is Benjamini-Hochberg |
The dataframe derived from the input, with added columns "padj.z" and "padj.mcmc". See tutorial for examples.
Mikhail V. Matz, [email protected]
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
Runs linear regression on Cq versus log2(RNA concentration), plots graph, reports slope (ideally should be -1), and efficiency (with 95 percent credible limits)
PrimEff(data, plot = TRUE)
PrimEff(data, plot = TRUE)
data |
a dataframe containing three columns. First is RNA concentration. This could be absolute as well as relative concentration (1/dilution factor). Second is the Cq value. Third is gene name. Replicate the same name across all the corresponding RNA concentrations. The dataframe may contain data for multiple genes. |
plot |
set plot=FALSE if the plot is not required |
Run with at least 8 2-fold dilutions per gene
Plots the regression and under it, the values of slope and efficiency (plus and minus one SD). The dataframe may contain data for multiple genes, which will all be plotted together (so the reasonable limit is something like 25 genes)
Also returns a dataframe with columns: gene, efficiency, plus one SD, minus one SD, and intercept.
Mikhail V. Matz, UT Austin <[email protected]>
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
data(dilutions) PrimEff(dilutions)
data(dilutions) PrimEff(dilutions)
adds a fake NORM gene with counts equal to geometric mean of conrtol genes, removes control genes from the dataset
softNorm(data,controls)
softNorm(data,controls)
data |
The dataset; output of cq2counts() |
controls |
Vector fo control gene names |
A modified dataset with NORM gene set as the reference, and no control genes
Mikhail Matz, UT Austin <[email protected]>
This function is called automatically by HPDsummary() and also can be used separately to plot the results produced by HPDsummary() with more plotting options
summaryPlot(data, xgroup, facet = NA, type = "bar", x.order = NA, whiskers = "ci", genes = NA, log.base=2)
summaryPlot(data, xgroup, facet = NA, type = "bar", x.order = NA, whiskers = "ci", genes = NA, log.base=2)
data |
A summary table generated by HPDplot(), it is the first element in the returned list. |
xgroup |
Which factor will be used to form the x axis (for 2-way designs). |
facet |
The factor by which the plot will be split into facets (for 2-way designs). |
type |
Two types are supported: "bar" and "line" ("line" also has points). "bar" is more useful to plot fold-changes returned when HPDsummary() is run with the option 'relative=TRUE'. "line" is better for plotting actual inferred transcript abundances across factor levels; it is particularly good for time courses and other cases when multiple factor levels must be compared to each other. "bar" is good to plot log(fold-changes) relative to global control. |
x.order |
A vector giving the order of factor levels on the x-axis. If unspecified, an alphanumeric order will be used. |
whiskers |
The interval indicated by the whiskers. Default is "ci", the 95% credible interval; another option is "sd" - standard deviation of the posterior. |
genes |
Vector of gene names to plot. By default, all genes in the summary will be plotted. |
log.base |
Base of the logarithm to indicate on y-axis (does not affect plotting). |
The function invokes ggplot() functon from the ggplot2 package to plot the results either as a single panel (one-way designs) or a multi-panel (2-way designs, one panel per level of the factor specified by 'facet' argument).
A ggplot object. See http://docs.ggplot2.org/0.9.2.1/theme.html for ways to modify it, such as add text, rotate labels, change fonts, etc.
Mikhail V. Matz, University of Texas at Austin <[email protected]>
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
For each gene, plots model-predicted values and 95% credible intervals.
trellisByGene(modelSummary,xFactor,groupFactor, nrow=2,lineWidth=0.4,whiskerWidth=0.2,pointSize=2.5, facetScales="free_y",ylab="log(abundance)", legendPos="bottom",posDodge=0.3)
trellisByGene(modelSummary,xFactor,groupFactor, nrow=2,lineWidth=0.4,whiskerWidth=0.2,pointSize=2.5, facetScales="free_y",ylab="log(abundance)", legendPos="bottom",posDodge=0.3)
modelSummary |
two-way design model summary produced by HPDsummary() |
xFactor |
factor to form the x-axis |
groupFactor |
factor to form separate lines on the plot |
nrow |
number of rows in the resulting trellis plot |
lineWidth |
line width, passed as 'lwd' to geom_errorbar function (ggplot2) |
whiskerWidth |
width of the line denoting 95% CI margin, passed as 'width' to geom_errorbar function (ggplot2) |
pointSize |
passed as 'size' to geom_point function of ggplot2 |
facetScales |
passed as 'scales' to facet_wrap function of ggplot2 |
ylab |
y-axis label |
legendPos |
passed as 'legend.position' to theme function of ggplot2 |
posDodge |
position dodge, increase for more jitter |
A ggplot2 type object
Mikhal V. Matz, UT Austin, [email protected]
Matz MV, Wright RM, Scott JG (2013) No Control Genes Required: Bayesian Analysis of qRT-PCR Data. PLoS ONE 8(8): e71448. doi:10.1371/journal.pone.0071448
# loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=coral.stress,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the 2-way model mm=mcmc.qpcr( fixed="condition+timepoint+condition:timepoint", data=dd, nitt=4000 # remark this line to analyze real data! ) # summarizing results ss=HPDsummary(mm,data=dd,summ.plot=FALSE) # plotting predicted means and 95% CIs gene by gene trellisByGene(ss,xFactor="condition",groupFactor="timepoint")
# loading Cq data and amplification efficiencies data(coral.stress) data(amp.eff) genecolumns=c(5,6,16,17) # specifying columns corresponding to genes of interest conditions=c(1:4) # specifying columns containing factors # calculating molecule counts and reformatting: dd=cq2counts(data=coral.stress,genecols=genecolumns, condcols=conditions,effic=amp.eff,Cq1=37) # fitting the 2-way model mm=mcmc.qpcr( fixed="condition+timepoint+condition:timepoint", data=dd, nitt=4000 # remark this line to analyze real data! ) # summarizing results ss=HPDsummary(mm,data=dd,summ.plot=FALSE) # plotting predicted means and 95% CIs gene by gene trellisByGene(ss,xFactor="condition",groupFactor="timepoint")