update chipflowR

3dca65fd · Rachel LEGENDRE · 790c34d5 · 3dca65fd
Commit 3dca65fd authored 3 years ago by Rachel LEGENDRE
--- a/workflow/scripts/Report_ChIPflowR.Rmd
+++ b/workflow/scripts/Report_ChIPflowR.Rmd
@@ -122,7 +122,6 @@ kable(apply(counts,2,fun_summary), caption = "Table 3: Summary of the raw counts
 Figure 1 shows the total number of mapped and counted reads for each sample. Total read counts are expected to be similar within conditions, they may be different across conditions. Total counts sometimes vary widely between replicates.
 ```{r barplot, echo=FALSE,fig.align="center",fig.cap="Figure 1: Number of mapped reads per sample. Colors refer to the biological condition of the sample.", out.width="600px"}
-# Producing Barplot
 barplotTotal(counts = counts, conditions = Conditions)
 ```
@@ -132,8 +131,7 @@ A pairwise scatter plot is produced (figure 2) to show how replicates and sample
 - 1 for technical replicates (technical variability follows a Poisson distribution)
 - greater than 1 for biological replicates and samples from different biological conditions (biological variability is higher than technical one, data are over-dispersed with respect to Poisson). The higher the SERE value, the lower the similarity. It is expected to be lower between biological replicates than between samples of different biological conditions. Hence, the SERE statistic can be used to detect inversions between samples.
-```{r pairewiseScatter, echo=FALSE,fig.align="center",fig.cap="Figure 2: Pairwise comparison of samples (not produced when more than 12 samples).", out.width="1200px"}
+```{r pairewiseScatter, echo=FALSE,fig.align="center",fig.cap="Figure 2: Pairwise comparison of samples (not produced when more than 12 samples).", out.width="1200px", fig.width=3*ncol(counts), fig.height=2*ncol(counts)}
-#PairwiseScatter
 pairwiseScatterPlots(counts = counts, outfile = FALSE)
 ```
@@ -144,14 +142,12 @@ The main variability within the experiment is expected to come from biological d
 Figure 3 sample clustering based on normalized data. An euclidean distance is computed between samples, and the dendrogram is built upon the Ward criterion. We expect this dendrogram to group replicates and separate biological conditions.
 ```{r clusterplot, echo=FALSE,fig.align="center",fig.cap="Figure 3: Sample clustering based on normalized data.", out.width="600px", warning=FALSE, message=FALSE}
-# Cluster plot
 clusterPlot(counts.trans = counts.trans,conditions = Conditions, outfile = FALSE)
 ```
 Another way of visualizing the experiment variability is to look at the first principal components of the PCA, as shown on the figure 4. On this figure, the first principal component (PC1) is expected to separate samples from the different biological conditions, meaning that the biological variability is the main source of variance in the data.
-```{r PCA, echo=FALSE,fig.align="center",fig.cap="Figure 4: First three components of a Principal Component Analysis, with percentages of variance associated with each axis.", out.width="1200px"}
+```{r PCA, echo=FALSE,fig.align="center",fig.cap="Figure 4: First three components of a Principal Component Analysis, with percentages of variance associated with each axis.", fig.height=4, out.width="1200px"}
-# PCA plot
 PCAPlot(counts.trans = counts.trans, conditions = Conditions, outfile = FALSE)
 ```
@@ -189,7 +185,7 @@ kable(sf, caption = "Table X: Normalization factors",format = "html") %>%
 Boxplots are often used as a qualitative measure of the quality of the normalization process, as they show how distributions are globally affected during this process. We expect normalization to stabilize distributions across samples. Figure 5 shows boxplots of raw (left) and normalized (right) data respectively.
-```{r boxplot, echo=FALSE,fig.align="center",fig.cap="Figure 5: Boxplots of raw (left) and normalized (right) read counts.", out.width="1200px", warning=FALSE}
+```{r boxplot, echo=FALSE,fig.align="center",fig.cap="Figure 5: Boxplots of raw (left) and normalized (right) read counts.", out.width="1200px", fig.height=4, warning=FALSE}
 countsBoxplots(results = resAnDif$results, conditions = Conditions, method = method)
 ```
@@ -197,7 +193,7 @@ countsBoxplots(results = resAnDif$results, conditions = Conditions, method = met
 ## 5.1. Dispersions estimation
-```{r dispersionPlot, echo=FALSE, fig.align="center", fig.cap="Figure 6: Dispersion estimation", out.width="1200px", results="asis"}
+```{r dispersionPlot, echo=FALSE, fig.align="center", fig.cap="Figure 6: Dispersion estimation", out.width="600px", results="asis"}
 if (method=="DESeq2") {
  cat("The DESeq2 model assumes that the count data follow a negative binomial distribution which is a robust alternative to the Poisson law when data are over-dispersed (the variance is higher than the mean). The first step of the statistical procedure is to estimate the dispersion of the data. Its purpose is to determine the shape of the mean-variance relationship. The default is to apply a GLM (Generalized Linear Model) based method (fitType='parametric'), which can handle complex designs but may not converge in some cases.\n")
@@ -208,7 +204,7 @@ cat("The figure 6 shows the result of the dispersion estimation step. The x- and
 }
 ```
-```{r meanvar, echo=FALSE, results="asis", fig.align="center", fig.cap="Figure 6: Mean-variance trend", out.width="1200px"}
+```{r meanvar, echo=FALSE, results="asis", fig.align="center", fig.cap="Figure 6: Mean-variance trend", out.width="600px"}
 if (method=="Limma") {
  cat("For the differential marking/binding analysis we use the limma approach to RNA-seq [@ritchie2015]. Read counts are converted to log2-counts-per-million (logCPM) and the mean-variance relationship is modelled either with precision weights or with an empirical Bayes prior trend. Here we use the the precision weights approach called “voom” [@law2014]. This transformation permits to apply the linear modelling in the limma package can be applied to sequencing data. The systematic variability of the data is modeled with a linear approach to differentiate it from the random variability. This linear modeling is very similar to classical ANOVA or multiple regression, except that a model is adapted to each peak.")
@@ -225,7 +221,6 @@ cat("The figure 6 shows the result of the variance estimation step. The x- and y
 Figure 7 shows the distributions of raw p-values computed by the statistical test for the comparison(s) done. This distribution is expected to be a mixture of a uniform distribution on $[0,1]$ and a peak around 0 corresponding to the differentially expressed features.
 ```{r, echo=FALSE,fig.align="center",fig.cap="Figure 7: Distribution(s) of raw p-values", out.width="600px"}
-# Pvalue Hist
 rawpHist(result = resAnDif$results, outfile = FALSE)
 ```
@@ -251,7 +246,6 @@ kable(df, caption = "Table 4: Normalization factors",format = "html") %>%
 Figure 8 represents the MA-plot of the data for the comparisons done, where differentially expressed features are highlighted in red. A MA-plot represents the log ratio of differential expression as a function of the mean intensity for each feature. Triangles correspond to features having a too low/high $\log_2(\text{FC})$ to be displayed on the plot.
 ```{r MAplot, echo=FALSE,fig.align="center",fig.cap="Figure 8: MA-plot(s) of each comparison. Red dots represent significantly differentially expressed features.", out.width="600px"}
-# Producing MAplots
 MAPlot(results = resAnDif$results, method = method, alpha = alpha, outfile = FALSE)
 ```