# Density coloured scatter plots to avoid overplotting
# Density coloured scatter plots to avoid overplotting
Looking at correlations between various features in large scale data sets is best done with scatter plots. However, when the number of values increases, the central region of scatterplots is so crowded, that no clear information about how many points are present can be observed. A way to help better visualize the data density is to add color to the points, as in the excellent [ggpointdensity](https://cran.r-project.org/package=ggpointdensity) R package, by Lukas PM Kremer that includes the `geom_pointdensity` function. However, plotting so many dots become a problem when drawing figures, as each tiny dot is rendered by the pdf viewer and there is a lot of useless information in the final files.
Looking at correlations between various features in large scale data sets is best done with scatter plots. However, when the number of values increases, the central region of scatterplots is so crowded, that no clear information about how many points are present can be observed. A way to help better visualize the data density is to add color to the points, as in the excellent [ggpointdensity](https://cran.r-project.org/package=ggpointdensity) R package, by Lukas PM Kremer that includes the `geom_pointdensity` function. However, plotting so many dots becomes a problem when drawing figures, as each tiny dot is rendered by the pdf viewer and there is a lot of useless information in the final files.
The overplotting problem has been solved by FACS software, where tens and hundred of thousands of events are displayed in multiple scatter plots. A solution for R was proposed in one of the answers to [this](https://stackoverflow.com/questions/13094827/how-to-reproduce-smoothscatters-outlier-plotting-in-ggplot/59147836#59147836) question on stackoverflow and I adapted it to my own needs.
The overplotting problem has been solved in the flow cytometry field by several FACS software solution, since tens and hundred of thousands of events are usually displayed in multiple scatter plots. A solution for R was proposed in one of the answers to [this](https://stackoverflow.com/questions/13094827/how-to-reproduce-smoothscatters-outlier-plotting-in-ggplot/59147836#59147836) question on stackoverflow and I adapted it to my own needs.
Running the included example from the R script leads to this image:
Running the included example from the R script leads to this image: