Skip to content
Snippets Groups Projects
Commit d39c6785 authored by Cosmin  SAVEANU's avatar Cosmin SAVEANU
Browse files

Update README.md

parent 2d6eed5d
No related branches found
No related tags found
No related merge requests found
# Density coloured scatter plots to avoid overplotting # Density coloured scatter plots to avoid overplotting
Looking at correlations between various features in large scale data sets is best done with scatter plots. However, when the number of values increases, the central region of scatterplots is so crowded, that no clear information about how many points are present can be observed. A way to help better visualize the data density is to add color to the points, as in the excellent [ggpointdensity](https://cran.r-project.org/package=ggpointdensity) R package, by Lukas PM Kremer that includes the `geom_pointdensity` function. However, plotting so many dots become a problem when drawing figures, as each tiny dot is rendered by the pdf viewer and there is a lot of useless information in the final files. Looking at correlations between various features in large scale data sets is best done with scatter plots. However, when the number of values increases, the central region of scatterplots is so crowded, that no clear information about how many points are present can be observed. A way to help better visualize the data density is to add color to the points, as in the excellent [ggpointdensity](https://cran.r-project.org/package=ggpointdensity) R package, by Lukas PM Kremer that includes the `geom_pointdensity` function. However, plotting so many dots becomes a problem when drawing figures, as each tiny dot is rendered by the pdf viewer and there is a lot of useless information in the final files.
The overplotting problem has been solved by FACS software, where tens and hundred of thousands of events are displayed in multiple scatter plots. A solution for R was proposed in one of the answers to [this](https://stackoverflow.com/questions/13094827/how-to-reproduce-smoothscatters-outlier-plotting-in-ggplot/59147836#59147836) question on stackoverflow and I adapted it to my own needs. The overplotting problem has been solved in the flow cytometry field by several FACS software solution, since tens and hundred of thousands of events are usually displayed in multiple scatter plots. A solution for R was proposed in one of the answers to [this](https://stackoverflow.com/questions/13094827/how-to-reproduce-smoothscatters-outlier-plotting-in-ggplot/59147836#59147836) question on stackoverflow and I adapted it to my own needs.
Running the included example from the R script leads to this image: Running the included example from the R script leads to this image:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment