Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant concordant differences between two biological states. In this case aged (CSI_A & CSI_B) and non aged (WT + UVSS).
MethylGSA (https://doi.org/10.1093/bioinformatics/bty892) proposes and implementation of GSEA specific for methylation datasets. MethylGSA adapts the robust rank aggregation (RRA) approach to adjust for number of CpGs in DNA methylation gene set testing.
## ID Description
## GO:0007600 GO:0007600 sensory perception
## GO:0098742 GO:0098742 cell-cell adhesion via plasma-membrane adhesion molecules
## GO:0048562 GO:0048562 embryonic organ morphogenesis
## GO:0048706 GO:0048706 embryonic skeletal system development
## GO:0043269 GO:0043269 regulation of ion transport
## GO:0048705 GO:0048705 skeletal system morphogenesis
## GO:0031012 GO:0031012 extracellular matrix
## GO:0048568 GO:0048568 embryonic organ development
## GO:0030312 GO:0030312 external encapsulating structure
## GO:0006836 GO:0006836 neurotransmitter transport
## GO:0060078 GO:0060078 regulation of postsynaptic membrane potential
## GO:0022803 GO:0022803 passive transmembrane transporter activity
## GO:0048667 GO:0048667 cell morphogenesis involved in neuron differentiation
## GO:0099177 GO:0099177 regulation of trans-synaptic signaling
## GO:0098656 GO:0098656 anion transmembrane transport
## Size NES padj
## GO:0007600 342 1.364675 0.001720672
## GO:0098742 162 1.515603 0.001720672
## GO:0048562 247 1.424912 0.001720672
## GO:0048706 109 1.592570 0.001854411
## GO:0043269 479 1.288359 0.002542880
## GO:0048705 187 1.450632 0.002542880
## GO:0031012 339 1.343270 0.002650369
## GO:0048568 361 1.327196 0.002650369
## GO:0030312 340 1.346781 0.003173612
## GO:0006836 163 1.449518 0.004120658
## GO:0060078 107 1.539520 0.004514456
## GO:0022803 321 1.314685 0.005426962
## GO:0048667 444 1.285406 0.005426962
## GO:0099177 333 1.325271 0.005426962
## GO:0098656 161 1.429413 0.005426962
How would you try to do the same analysis but using pathways, instead of GO terms?
Whatโs the meaning of the following parameters? How do they change the results? minsize = 100
and maxsize = 500
From the GO enriched terms, select the most significant and try to summarize them using the REVIGO tool. http://revigo.irb.hr
In the reproducible research framework, an important step is to save all the versions of the softwares used to perform the statistical analysis. They must be provided when submitting a paper.
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
## [2] minfi_1.40.0
## [3] bumphunter_1.36.0
## [4] locfit_1.5-9.5
## [5] iterators_1.0.14
## [6] foreach_1.5.2
## [7] Biostrings_2.62.0
## [8] XVector_0.34.0
## [9] SummarizedExperiment_1.24.0
## [10] Biobase_2.54.0
## [11] MatrixGenerics_1.6.0
## [12] matrixStats_0.62.0
## [13] GenomicRanges_1.46.1
## [14] GenomeInfoDb_1.30.1
## [15] IRanges_2.28.0
## [16] S4Vectors_0.32.4
## [17] BiocGenerics_0.40.0
## [18] methylGSA_1.12.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2
## [2] tidyselect_1.1.2
## [3] RSQLite_2.2.14
## [4] AnnotationDbi_1.56.2
## [5] grid_4.1.2
## [6] BiocParallel_1.28.3
## [7] scatterpie_0.1.7
## [8] munsell_0.5.0
## [9] codetools_0.2-18
## [10] preprocessCore_1.56.0
## [11] statmod_1.4.36
## [12] colorspace_2.0-3
## [13] GOSemSim_2.20.0
## [14] filelock_1.0.2
## [15] highr_0.9
## [16] knitr_1.39
## [17] rstudioapi_0.13
## [18] DOSE_3.20.1
## [19] GenomeInfoDbData_1.2.7
## [20] polyclip_1.10-0
## [21] bit64_4.0.5
## [22] farver_2.1.0
## [23] rhdf5_2.38.1
## [24] downloader_0.4
## [25] treeio_1.18.1
## [26] vctrs_0.4.1
## [27] generics_0.1.2
## [28] xfun_0.31
## [29] BiocFileCache_2.2.1
## [30] R6_2.5.1
## [31] illuminaio_0.36.0
## [32] graphlayouts_0.8.0
## [33] bitops_1.0-7
## [34] rhdf5filters_1.6.0
## [35] cachem_1.0.6
## [36] reshape_0.8.9
## [37] fgsea_1.20.0
## [38] gridGraphics_0.5-1
## [39] DelayedArray_0.20.0
## [40] assertthat_0.2.1
## [41] promises_1.2.0.1
## [42] BiocIO_1.4.0
## [43] scales_1.2.0
## [44] ggraph_2.0.5
## [45] enrichplot_1.14.2
## [46] gtable_0.3.0
## [47] tidygraph_1.2.1
## [48] rlang_1.0.2
## [49] genefilter_1.76.0
## [50] splines_4.1.2
## [51] lazyeval_0.2.2
## [52] rtracklayer_1.54.0
## [53] GEOquery_2.62.2
## [54] yaml_2.3.5
## [55] reshape2_1.4.4
## [56] GenomicFeatures_1.46.5
## [57] httpuv_1.6.5
## [58] qvalue_2.26.0
## [59] clusterProfiler_4.2.2
## [60] tools_4.1.2
## [61] ggplotify_0.1.0
## [62] nor1mix_1.3-0
## [63] ggplot2_3.3.6
## [64] ellipsis_0.3.2
## [65] jquerylib_0.1.4
## [66] RColorBrewer_1.1-3
## [67] siggenes_1.68.0
## [68] Rcpp_1.0.8.3
## [69] plyr_1.8.7
## [70] sparseMatrixStats_1.6.0
## [71] progress_1.2.2
## [72] zlibbioc_1.40.0
## [73] purrr_0.3.4
## [74] RCurl_1.98-1.6
## [75] prettyunits_1.1.1
## [76] openssl_2.0.1
## [77] viridis_0.6.2
## [78] ggrepel_0.9.1
## [79] magrittr_2.0.3
## [80] data.table_1.14.2
## [81] DO.db_2.9
## [82] reactome.db_1.77.0
## [83] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
## [84] missMethyl_1.28.0
## [85] mime_0.12
## [86] hms_1.1.1
## [87] patchwork_1.1.1
## [88] evaluate_0.15
## [89] xtable_1.8-4
## [90] XML_3.99-0.9
## [91] RobustRankAggreg_1.1
## [92] mclust_5.4.9
## [93] gridExtra_2.3
## [94] compiler_4.1.2
## [95] biomaRt_2.50.3
## [96] tibble_3.1.7
## [97] shadowtext_0.1.2
## [98] crayon_1.5.1
## [99] htmltools_0.5.2
## [100] later_1.3.0
## [101] ggfun_0.0.6
## [102] tzdb_0.3.0
## [103] tidyr_1.2.0
## [104] aplot_0.1.4
## [105] DBI_1.1.2
## [106] tweenr_1.0.2
## [107] dbplyr_2.1.1
## [108] MASS_7.3-57
## [109] rappdirs_0.3.3
## [110] Matrix_1.4-1
## [111] readr_2.1.2
## [112] cli_3.3.0
## [113] quadprog_1.5-8
## [114] igraph_1.3.1
## [115] pkgconfig_2.0.3
## [116] GenomicAlignments_1.30.0
## [117] xml2_1.3.3
## [118] ggtree_3.2.1
## [119] annotate_1.72.0
## [120] bslib_0.3.1
## [121] rngtools_1.5.2
## [122] multtest_2.50.0
## [123] beanplot_1.3.1
## [124] doRNG_1.8.2
## [125] scrime_1.3.5
## [126] yulab.utils_0.0.4
## [127] stringr_1.4.0
## [128] digest_0.6.29
## [129] rmarkdown_2.14
## [130] base64_2.0
## [131] fastmatch_1.1-3
## [132] tidytree_0.3.9
## [133] DelayedMatrixStats_1.16.0
## [134] restfulr_0.0.13
## [135] curl_4.3.2
## [136] shiny_1.7.1
## [137] Rsamtools_2.10.0
## [138] rjson_0.2.21
## [139] lifecycle_1.0.1
## [140] nlme_3.1-157
## [141] jsonlite_1.8.0
## [142] Rhdf5lib_1.16.0
## [143] viridisLite_0.4.0
## [144] askpass_1.1
## [145] limma_3.50.3
## [146] fansi_1.0.3
## [147] pillar_1.7.0
## [148] lattice_0.20-45
## [149] KEGGREST_1.34.0
## [150] fastmap_1.1.0
## [151] httr_1.4.3
## [152] survival_3.3-1
## [153] GO.db_3.14.0
## [154] glue_1.6.2
## [155] png_0.1-7
## [156] bit_4.0.4
## [157] ggforce_0.3.3
## [158] stringi_1.7.6
## [159] sass_0.4.1
## [160] HDF5Array_1.22.1
## [161] blob_1.2.3
## [162] org.Hs.eg.db_3.14.0
## [163] memoise_2.0.1
## [164] dplyr_1.0.9
## [165] ape_5.6-2