diff --git a/README.md b/README.md index 1f34a89ca579ba422864973165c6e17c29d3a42b..0fd7d21d665f807717ad990318f1054995ead5e7 100644 --- a/README.md +++ b/README.md @@ -24,25 +24,27 @@ Clone the current repository locally: ``` git clone https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline.git ``` - +<!-- download the test data through the interface, using wget or git lfs and place it in the ./test_data/hg38_EAS folder. + Option with wget ``` -cd ${PATH_TO_PIPELINE_FOLDER}/Ref_Panel/ +cd ${PATH_TO_PIPELINE_FOLDER}/test_data/hg38_EAS/ wget https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/raw/pipeline_ancestry/test_data/hg38_EAS/RBC_EAS_chr22.tsv?inline=false && mv RBC_EAS_chr22.tsv\?inline\=false RBC_EAS_chr22.tsv wget https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/raw/pipeline_ancestry/test_data/hg38_EAS/PLT_EAS_chr22.tsv?inline=false && mv PLT_EAS_chr22.tsv\?inline\=false PLT_EAS_chr22.tsv wget https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/raw/pipeline_ancestry/test_data/hg38_EAS/WBC_EAS_chr22.tsv?inline=false && mv WBC_EAS_chr22.tsv\?inline\=false WBC_EAS_chr22.tsv ``` Option with git-lfs (require installing git lfs) - ``` git lfs pull --include PLT_EAS_chr22.tsv git lfs pull --include RBC_EAS_chr22.tsv git lfs pull --include WBC_EAS_chr22.tsv ``` +--> +Test data are located in the ${PATH_TO_PIPELINE_FOLDER}/test_data/hg38_EAS/ folder These are extracts of summary statistics from a trans ancestry GWAS on blood traits ([Chen et al](https://www.sciencedirect.com/science/article/pii/S0092867420308229?via%3Dihub)): WBC, White blood cell count; RBC, Red blood cell count; PLT, platelet count. @@ -50,11 +52,21 @@ They correspond to the chromosome 21 and 22 for the East asian ancestry. Once done you can launch the pipeline as: ``` - - nextflow run jass_pipeline.nf --ref_panel {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/Ref_panel --gwas_folder {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/test_data/hg_ -with-report jass_report.html + nextflow run jass_pipeline.nf --ref_panel_WG {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/Ref_panel --gwas_folder {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/test_data/hg38_EAS/ -with-report jass_report.html ``` +You can specify parameter in the jass_pipeline.nf header if prefered. + +If all went well, you have cleaned the three summary statistic files, aligned them on the reference panel, and integrated them in one database. This database was used to perform a multi-trait GWAS on the three traits. + +Here are the output files produce by the pipeline: +* ${PIPELINE_FOLDER}/harmonized_GWAS_1_file/ : genome wide harmonized summary statistics +* ${PIPELINE_FOLDER}/harmonized_GWAS_files/ : harmonized summary statistics by chromosomes +* ${PIPELINE_FOLDER}/init_table : database containing all summary statistics to perform multi-trait GWAS +* ${PIPELINE_FOLDER}/worktable : multi-trait GWAS results file +* ${PIPELINE_FOLDER}/quadrant : quadrant plot of the multi-trait GWAS +* ${PIPELINE_FOLDER}/manhattan : manhattan plot of the multi-trait GWAS + -If all went well, you have cleaned the three summary statistic files, aligned them on the reference panel, and integrated them in one database. The database was used to perform a multi-trait GWAS on the three traits. ## Required Input @@ -83,6 +95,9 @@ For this step you will need to install an additional dependency [RAISS](https:// * --ref_panel : A folder containing a Reference Panel in the .bim, .bed, .fam format for imputation with RAISS * --ld-folder : A path toward a folder containing LD matrices (that can be generated from the reference panel with the raiss package as described here : http://statistical-genetics.pages.pasteur.fr/raiss/#precomputation-of-ld-correlation) +imputed files will be stored in +* ${PIPELINE_FOLDER}/imputed_GWAS/ : harmonized summary statistics by chromosomes + ## Available reference panels To make reference panel readily available, we use git lfs. @@ -103,15 +118,7 @@ You can download the five panel using the command: git lfs fetch --all ``` or manualy through the gitlab interface: - - -## Imputing your summary statistics using RAISS - -If you wish to perform imputation step using RAISS you will need to: - -1. Switch the parameter params.compute_imputation to true -2. Install the python package RAISS -3. Follow RAISS documentation to generate Linkage desiquilibrium matrices + ## Running the LDSC regression covariance step ### To infer multi-trait z-scores null distribution, heritabilities, genetic correlations using the LDscore regression @@ -131,7 +138,9 @@ and require a HPC cluster. If you want to analyze data in hg38 and for all ancestries, you can contact the main developper of this pipeline (hanna.julienne@pasteur.fr) to request the needed input files 2. To activate the LDscore option turn this flag to true: - * --group If you wish to compute joint analyses with the pipeline, a group file with the each phenotype group written on a separated line +``` + --compute_LDSC_matrix=true +``` 3. Give the path of the reference panel Using the LDscore regression on @@ -139,6 +148,13 @@ Using the LDscore regression on --LD_SCORE_folder ${PATH_to_REFERENCE} ``` +If you run this additional step, the following outputs will be generated +* ${PIPELINE_FOLDER}/ldsc_data : preprocessed data to run +* ${PIPELINE_FOLDER}/h2_data: heritability estimation logs +* ${PIPELINE_FOLDER}/cor_data: covariance estimation logs +* ${PIPELINE_FOLDER}/Correlation_matrices: parsed covariance matrices + +The H0 matrix will be integrated in the inittable file by the pipeline, and hence taken into account in the inittable. ## Usage Example on HPC Cluster If you are working with a HPC server (Slurm job scheduler), you can adapt the nextflow_sbatch.config file and launch the pipeline with a command like: diff --git a/doc/download_test_files.png b/doc/download_test_files.png new file mode 100644 index 0000000000000000000000000000000000000000..c23dbeee4336ef510d7e9915e3319d791b3a72ca Binary files /dev/null and b/doc/download_test_files.png differ diff --git a/jass_pipeline.nf b/jass_pipeline.nf index 816d7b80c23964e62f2454f61861265e8f4878ea..4747e2a57a89d92b9e12ce2ef05ece64524e543c 100644 --- a/jass_pipeline.nf +++ b/jass_pipeline.nf @@ -14,9 +14,8 @@ params.compute_imputation=false /* Path of input data */ params.meta_data = "${projectDir}"+"/input_files/Data_test_EAS.csv" // file describing gwas summary statistic format params.gwas_folder = "${projectDir}"+'/test_data/hg38_EAS/' -params.ref_panel = '/pasteur/zeus/projets/p02/GGS_JASS/jass_analysis_pipeline/Ref_panel_by_chr/' -params.region = "${projectDir}"+"/input_files/All_Regions_ALL_ensemble_1000G_hg38_EAS.bed" +params.region = "${projectDir}"+"/input_files/All_Regions_ALL_ensemble_1000G_hg38_EAS.bed" params.ref_panel_WG = "${projectDir}"+"/Ref_Panel/1000G_EAS_0_01_chr22_21.csv"//"${projectDir}/Ref_Panel/1000G_EAS_0_01_chr22.csv" params.ancestry="EAS" @@ -41,7 +40,8 @@ see https://statistical-genetics.pages.pasteur.fr/raiss/#optimizing-raiss-parame params.r2threshold = 0.6 params.eigenthreshold = 0.05 params.minimumld = 5 - +params.ld_folder="/pasteur/zeus/projets/p02/GGS_WKD/DATA_1000G/Panels/Matrix_LD_RAISS/EAS/*.ld" +params.ref_panel = '/pasteur/zeus/projets/p02/GGS_JASS/jass_analysis_pipeline/Ref_panel_by_chr/' /* Project group */ params.group = "${projectDir}/input_files/group.txt" group = file(params.group) @@ -52,7 +52,7 @@ Region_channel2 = Channel.fromPath(params.region) chr_channel = Channel.from(1..22) ref_chr_channel=Channel.fromPath(params.ref_panel+"/ALL_ensemble_1000G_hg38_EAS_chr*.bim") -ld_channel=Channel.fromPath("/pasteur/zeus/projets/p02/GGS_WKD/DATA_1000G/Panels/Matrix_LD_RAISS/EAS/*.ld") +ld_channel=Channel.fromPath(params.ld_folder) extract_sample_size_script_channel = Channel.fromPath("${projectDir}/bin/extract_sample_size.py") generate_trait_pairs_channel = Channel.fromPath("${projectDir}/bin/generate_trait_pairs.py") diff --git a/nextflow_local.config b/nextflow_local.config index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f47dffbe5830faff6083afe70e07ad4489ba0a29 100644 --- a/nextflow_local.config +++ b/nextflow_local.config @@ -0,0 +1,63 @@ + + +dag { + enabled = true + overwrite = true + file = 'dag.dot' +} + +report { + enabled = true + file = 'nextflow_logs/report.html' +} + +trace { + enabled = true + overwrite = true + file = 'nextflow_logs/trace.txt' +} + +singularity { + enabled = true + autoMounts = true + runOptions = '--home $HOME:/home/$USER -B /pasteur/zeus/projets/p02/' +} +executor { + submitRateLimit = '10 sec' + maxErrors=20 + maxRetries=4 + errorStrategy='terminate' + maxForks=400 + queueSize = 500 + +} + +process{ + + withName: 'Compute_MAF' { + container='plink_1.90b5--heea4ae3_0.sif' + time='1h' + queue='dedicated,common,ggs' + cpus=1 + } + + withName: 'Impute_GWAS' { + memory={8.GB * task.attempt} + time={72.h * task.attempt} + maxRetries=4 + queue='dedicated,ggs,common' + cpus=1 + } + + withName: 'Munge_LDSC_data' { + container='ldsc_1.0.1--py_0.sif' + cpus=1 + } + + + withName: 'Correlation_LDSC_data' { + container="ldsc_1.0.1--py_0.sif" + cpus=1 + } + +}