improved doc

be3e61b2 · Hanna JULIENNE · ccdcb7e9 · be3e61b2 · be3e61b2 · be3e61b2
Commit be3e61b2 authored 2 years ago by Hanna JULIENNE
--- a/README.md
+++ b/README.md
@@ -24,25 +24,27 @@ Clone the current repository locally:
 ```
    git clone https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline.git
 ```
+<!--
 download the test data through the interface, using wget or git lfs
 and place it in the ./test_data/hg38_EAS folder.
 Option with wget
 ```
-cd ${PATH_TO_PIPELINE_FOLDER}/Ref_Panel/
+cd ${PATH_TO_PIPELINE_FOLDER}/test_data/hg38_EAS/
 wget https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/raw/pipeline_ancestry/test_data/hg38_EAS/RBC_EAS_chr22.tsv?inline=false && mv RBC_EAS_chr22.tsv\?inline\=false RBC_EAS_chr22.tsv
 wget https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/raw/pipeline_ancestry/test_data/hg38_EAS/PLT_EAS_chr22.tsv?inline=false && mv PLT_EAS_chr22.tsv\?inline\=false PLT_EAS_chr22.tsv
 wget https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/raw/pipeline_ancestry/test_data/hg38_EAS/WBC_EAS_chr22.tsv?inline=false && mv WBC_EAS_chr22.tsv\?inline\=false WBC_EAS_chr22.tsv
 ```
 Option with git-lfs (require installing git lfs)
 ```
    git lfs pull --include PLT_EAS_chr22.tsv
    git lfs pull --include RBC_EAS_chr22.tsv
    git lfs pull --include WBC_EAS_chr22.tsv
 ```
+-->
+Test data are located in the ${PATH_TO_PIPELINE_FOLDER}/test_data/hg38_EAS/ folder
 These are extracts of summary statistics from a trans ancestry GWAS on blood traits ([Chen et al](https://www.sciencedirect.com/science/article/pii/S0092867420308229?via%3Dihub)): WBC, White blood cell count; RBC, Red blood cell count; PLT, platelet count. 
@@ -50,11 +52,21 @@ They correspond to the chromosome 21 and 22 for the East asian ancestry.
 Once done you can launch the pipeline as:
 ```
+    nextflow run jass_pipeline.nf --ref_panel_WG {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/Ref_panel --gwas_folder {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/test_data/hg38_EAS/ -with-report jass_report.html
-    nextflow run jass_pipeline.nf --ref_panel {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/Ref_panel --gwas_folder {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/test_data/hg_ -with-report jass_report.html
 ```
+You can specify parameter in the jass_pipeline.nf header if prefered.
+If all went well, you have cleaned the three summary statistic files, aligned them on the reference panel, and integrated them in one database. This database was used to perform a multi-trait GWAS on the three traits. 
+Here are the output files produce by the pipeline: 
+* ${PIPELINE_FOLDER}/harmonized_GWAS_1_file/ : genome wide harmonized summary statistics 
+* ${PIPELINE_FOLDER}/harmonized_GWAS_files/ : harmonized summary statistics by chromosomes
+* ${PIPELINE_FOLDER}/init_table : database containing all summary statistics to perform multi-trait GWAS
+* ${PIPELINE_FOLDER}/worktable : multi-trait GWAS results file
+* ${PIPELINE_FOLDER}/quadrant : quadrant plot of the multi-trait GWAS 
+* ${PIPELINE_FOLDER}/manhattan : manhattan plot of the multi-trait GWAS
-If all went well, you have cleaned the three summary statistic files, aligned them on the reference panel, and integrated them in one database. The database was used to perform a multi-trait GWAS on the three traits. 
 ## Required Input
@@ -83,6 +95,9 @@ For this step you will need to install an additional dependency [RAISS](https://
 * --ref_panel : A folder containing a Reference Panel in the .bim, .bed, .fam format for imputation with RAISS
 * --ld-folder : A path toward a folder containing LD matrices (that can be generated from the reference panel with the raiss package as described here : http://statistical-genetics.pages.pasteur.fr/raiss/#precomputation-of-ld-correlation)
+imputed files will be stored in 
+* ${PIPELINE_FOLDER}/imputed_GWAS/ : harmonized summary statistics by chromosomes
 ## Available reference panels
 To make reference panel readily available, we use git lfs.
@@ -103,15 +118,7 @@ You can download the five panel using the command:
    git lfs fetch --all
 ```
 or manualy through the gitlab interface:
-![workflow image](./doc/workflow.png)
+![workflow image](./doc/download_test_files.png)
-## Imputing your summary statistics using RAISS
-If you wish to perform imputation step using RAISS you will need to:
-1. Switch the parameter params.compute_imputation to true
-2. Install the python package RAISS
-3. Follow RAISS documentation to generate Linkage desiquilibrium matrices
 ## Running the LDSC regression covariance step
 ### To infer multi-trait z-scores null distribution, heritabilities, genetic correlations using the LDscore regression
@@ -131,7 +138,9 @@ and require a HPC cluster.
 If you want to analyze data in hg38 and for all ancestries, you can contact the main developper of this pipeline (hanna.julienne@pasteur.fr)
 to request the needed input files
 2. To activate the LDscore option turn this flag to true:
-    * --group If you wish to compute joint analyses with the pipeline, a group file with the each phenotype group written on a separated line
+```
+    --compute_LDSC_matrix=true
+```
 3. Give the path of the reference panel
 Using the LDscore regression on 
@@ -139,6 +148,13 @@ Using the LDscore regression on
    --LD_SCORE_folder ${PATH_to_REFERENCE}
 ```
+If you run this additional step, the following outputs will be generated
+* ${PIPELINE_FOLDER}/ldsc_data : preprocessed data to run 
+* ${PIPELINE_FOLDER}/h2_data: heritability estimation logs
+* ${PIPELINE_FOLDER}/cor_data: covariance estimation logs
+* ${PIPELINE_FOLDER}/Correlation_matrices: parsed covariance matrices
+The H0 matrix will be integrated in the inittable file by the pipeline, and hence taken into account in the inittable.
 ##  Usage Example on HPC Cluster
 If you are working with a HPC server (Slurm job scheduler), you can adapt the nextflow_sbatch.config file and launch the pipeline with a command like:

--- a/doc/download_test_files.png
+++ b/doc/download_test_files.png
--- a/jass_pipeline.nf
+++ b/jass_pipeline.nf
@@ -14,9 +14,8 @@ params.compute_imputation=false
 /* Path of input data */
 params.meta_data = "${projectDir}"+"/input_files/Data_test_EAS.csv" // file describing gwas summary statistic format
 params.gwas_folder = "${projectDir}"+'/test_data/hg38_EAS/' 
-params.ref_panel = '/pasteur/zeus/projets/p02/GGS_JASS/jass_analysis_pipeline/Ref_panel_by_chr/'
-params.region = "${projectDir}"+"/input_files/All_Regions_ALL_ensemble_1000G_hg38_EAS.bed"
+params.region = "${projectDir}"+"/input_files/All_Regions_ALL_ensemble_1000G_hg38_EAS.bed"
 params.ref_panel_WG = "${projectDir}"+"/Ref_Panel/1000G_EAS_0_01_chr22_21.csv"//"${projectDir}/Ref_Panel/1000G_EAS_0_01_chr22.csv"
 params.ancestry="EAS"
@@ -41,7 +40,8 @@ see https://statistical-genetics.pages.pasteur.fr/raiss/#optimizing-raiss-parame
 params.r2threshold = 0.6
 params.eigenthreshold = 0.05 
 params.minimumld = 5
+params.ld_folder="/pasteur/zeus/projets/p02/GGS_WKD/DATA_1000G/Panels/Matrix_LD_RAISS/EAS/*.ld"
+params.ref_panel = '/pasteur/zeus/projets/p02/GGS_JASS/jass_analysis_pipeline/Ref_panel_by_chr/'
 /* Project group */
 params.group = "${projectDir}/input_files/group.txt"
 group = file(params.group)
@@ -52,7 +52,7 @@ Region_channel2 = Channel.fromPath(params.region)
 chr_channel = Channel.from(1..22)
 ref_chr_channel=Channel.fromPath(params.ref_panel+"/ALL_ensemble_1000G_hg38_EAS_chr*.bim")
-ld_channel=Channel.fromPath("/pasteur/zeus/projets/p02/GGS_WKD/DATA_1000G/Panels/Matrix_LD_RAISS/EAS/*.ld")
+ld_channel=Channel.fromPath(params.ld_folder)
 extract_sample_size_script_channel = Channel.fromPath("${projectDir}/bin/extract_sample_size.py")
 generate_trait_pairs_channel = Channel.fromPath("${projectDir}/bin/generate_trait_pairs.py")

--- a/nextflow_local.config
+++ b/nextflow_local.config
+dag {
+    enabled = true
+    overwrite = true
+    file = 'dag.dot'
+}
+report {
+       enabled = true
+       file = 'nextflow_logs/report.html'
+}
+trace {
+        enabled = true
+        overwrite = true
+        file = 'nextflow_logs/trace.txt'
+}
+singularity {
+            enabled = true
+            autoMounts = true
+            runOptions = '--home $HOME:/home/$USER -B /pasteur/zeus/projets/p02/'
+}
+executor {
+    submitRateLimit = '10 sec'
+    maxErrors=20
+    maxRetries=4
+    errorStrategy='terminate'
+    maxForks=400
+    queueSize = 500
+}
+process{
+    withName: 'Compute_MAF' {
+      container='plink_1.90b5--heea4ae3_0.sif'
+      time='1h'
+      queue='dedicated,common,ggs'
+      cpus=1
+    }
+    withName: 'Impute_GWAS' {
+        memory={8.GB * task.attempt}
+        time={72.h * task.attempt}
+        maxRetries=4
+        queue='dedicated,ggs,common'
+        cpus=1
+    }
+    withName: 'Munge_LDSC_data' {
+      container='ldsc_1.0.1--py_0.sif'
+      cpus=1
+    }
+    withName: 'Correlation_LDSC_data' {
+      container="ldsc_1.0.1--py_0.sif"
+    cpus=1
+    }
+}