diff --git a/README.md b/README.md index 0af675f4dcbd86ac5ea432ca147fc314654b3b1b..701cabd011e46be84e38b7d999d3734a24b2831a 100644 --- a/README.md +++ b/README.md @@ -1 +1 @@ -A repository of R snippets, mostly for creating graphical visualization of data. \ No newline at end of file +A repository of R snippets, mostly for creating graphical visualization of data. diff --git a/Tridimensional barplots in R/README.md b/Tridimensional barplots in R/README.md new file mode 100644 index 0000000000000000000000000000000000000000..45b27a21527ef145c2e072e2ef41fc80e07aef36 --- /dev/null +++ b/Tridimensional barplots in R/README.md @@ -0,0 +1,84 @@ +# Tridimensional plots in R + +This is a task that occurs frequently when presenting experimental data: create a bar plot that shows all the experimental values as dots and contains also error bars, usually computed as standard deviation. I am using here the term "tridimensional" to talk about the fact that the bar plot I am presenting is based on three different types of categorical information. In my example, data are grouped by vector and there are four different reporters used in this particular set of experiments. The second dimension is the tested strain and the third dimension is represented by the independent replicates for each final compared mean. + +The original data looks like this: + + + +It can be recovered from a file that is in the same directory as the R script (and we used setwd to set the working directory): + +```R +mydata <- read.delim("experimental_data.txt", stringsAsFactors = F) +``` + +Ggplot2 prefers "long" formats, that can be easily obtained from any data frame, with a package, such as `reshape2` + +```R +library(reshape2) +# convert the table to a "longer" data format, but preserve +# the dimensions that will be useful for plotting, here by index +melted <- melt(mydata, id.vars=c(2,1)) +names(melted) <- c("vector", "strain", "experiment", "value") + +melted$strain <- factor(x=melted$strain, levels=c("wt", "mutA", "mutB")) +# manually setting the "strain" as a factor with the levels in a given order +# ensures that the plotting of thses categories will be done in this order +``` + +The "melted" data look like this: + + + +Next, we compute averages and standard deviations across the groups of replicated experiments (same strain, same vector) + +```R +library(dplyr) +# compute mean and standard deviation of values, grouped by vector AND strain +averages <- melted %>% group_by(vector, strain) %>% summarise(avg=mean(value)) +sds <- melted %>% group_by(vector, strain) %>% summarise(sd=sd(value)) +# create columns that will serve for the range of the error bars +averages$ymin <- averages$avg-sds$sd +averages$ymax <- averages$avg+sds$sd +``` + +And finally the actual plotting, consisting of three different elements: the bars, the error bars and the individual data points + +```R +# changing the position of the bars in the barplot by the same amount +# is crucial to align the dots, error bars and bars. +# the following value is used three times in the plot, adjust to your liking +dodge_value <- 0.8 + +ggplot(data=averages, + aes(x=vector, y=avg, color=strain, fill=strain))+ + geom_col(width=0.7, position = position_dodge(dodge_value))+ + scale_fill_manual(values = rep(c("gray30", "gray70", "white"), 4))+ + # this needs to be adjusted and modified according to your needs + # for 4 vectors and 3 strains, three shades of gray can be enough + scale_color_manual(values = rep("black", 12))+ + # this parameter ensures that error bars and bar outlines are all black + geom_point(data=melted, + aes(x=vector, y=value, color=strain), + shape=21, fill="white", + position=position_jitterdodge(dodge.width=dodge_value, jitter.width = 0.3))+ + # add the data points. Even if the color is black for all the points, that aesthetic + # parameter is required to tell ggplot2 that we want grouping by strain + geom_errorbar(data=averages, + aes(ymin=ymin, ymax=ymax), + position = position_dodge(dodge_value), + width=0.2)+ + # error bars, as SD + ylim(0, 1.4)+ + # axis limits, to be adjusted to the range of the data, or remove to do it automatically + # the following parameters affect the plot and are especially useful for the pdf output + theme_classic(base_size=10)+ + theme(text=element_text(size=6, family="ArialMT"), legend.key.size = unit(0.3, 'cm')) + +ggsave("mybarplot.pdf", width=8, height=5, units="cm") + +``` + +The result will need further adjustments, in Inkscape, for example for the final figure: + +