Skip to content
Snippets Groups Projects
Commit be3e61b2 authored by Hanna  JULIENNE's avatar Hanna JULIENNE
Browse files

improved doc

parent ccdcb7e9
No related branches found
No related tags found
1 merge request!2Pipeline ancestry
...@@ -24,25 +24,27 @@ Clone the current repository locally: ...@@ -24,25 +24,27 @@ Clone the current repository locally:
``` ```
git clone https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline.git git clone https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline.git
``` ```
<!--
download the test data through the interface, using wget or git lfs download the test data through the interface, using wget or git lfs
and place it in the ./test_data/hg38_EAS folder. and place it in the ./test_data/hg38_EAS folder.
Option with wget Option with wget
``` ```
cd ${PATH_TO_PIPELINE_FOLDER}/Ref_Panel/ cd ${PATH_TO_PIPELINE_FOLDER}/test_data/hg38_EAS/
wget https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/raw/pipeline_ancestry/test_data/hg38_EAS/RBC_EAS_chr22.tsv?inline=false && mv RBC_EAS_chr22.tsv\?inline\=false RBC_EAS_chr22.tsv wget https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/raw/pipeline_ancestry/test_data/hg38_EAS/RBC_EAS_chr22.tsv?inline=false && mv RBC_EAS_chr22.tsv\?inline\=false RBC_EAS_chr22.tsv
wget https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/raw/pipeline_ancestry/test_data/hg38_EAS/PLT_EAS_chr22.tsv?inline=false && mv PLT_EAS_chr22.tsv\?inline\=false PLT_EAS_chr22.tsv wget https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/raw/pipeline_ancestry/test_data/hg38_EAS/PLT_EAS_chr22.tsv?inline=false && mv PLT_EAS_chr22.tsv\?inline\=false PLT_EAS_chr22.tsv
wget https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/raw/pipeline_ancestry/test_data/hg38_EAS/WBC_EAS_chr22.tsv?inline=false && mv WBC_EAS_chr22.tsv\?inline\=false WBC_EAS_chr22.tsv wget https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/raw/pipeline_ancestry/test_data/hg38_EAS/WBC_EAS_chr22.tsv?inline=false && mv WBC_EAS_chr22.tsv\?inline\=false WBC_EAS_chr22.tsv
``` ```
Option with git-lfs (require installing git lfs) Option with git-lfs (require installing git lfs)
``` ```
git lfs pull --include PLT_EAS_chr22.tsv git lfs pull --include PLT_EAS_chr22.tsv
git lfs pull --include RBC_EAS_chr22.tsv git lfs pull --include RBC_EAS_chr22.tsv
git lfs pull --include WBC_EAS_chr22.tsv git lfs pull --include WBC_EAS_chr22.tsv
``` ```
-->
Test data are located in the ${PATH_TO_PIPELINE_FOLDER}/test_data/hg38_EAS/ folder
These are extracts of summary statistics from a trans ancestry GWAS on blood traits ([Chen et al](https://www.sciencedirect.com/science/article/pii/S0092867420308229?via%3Dihub)): WBC, White blood cell count; RBC, Red blood cell count; PLT, platelet count. These are extracts of summary statistics from a trans ancestry GWAS on blood traits ([Chen et al](https://www.sciencedirect.com/science/article/pii/S0092867420308229?via%3Dihub)): WBC, White blood cell count; RBC, Red blood cell count; PLT, platelet count.
...@@ -50,11 +52,21 @@ They correspond to the chromosome 21 and 22 for the East asian ancestry. ...@@ -50,11 +52,21 @@ They correspond to the chromosome 21 and 22 for the East asian ancestry.
Once done you can launch the pipeline as: Once done you can launch the pipeline as:
``` ```
nextflow run jass_pipeline.nf --ref_panel_WG {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/Ref_panel --gwas_folder {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/test_data/hg38_EAS/ -with-report jass_report.html
nextflow run jass_pipeline.nf --ref_panel {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/Ref_panel --gwas_folder {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/test_data/hg_ -with-report jass_report.html
``` ```
You can specify parameter in the jass_pipeline.nf header if prefered.
If all went well, you have cleaned the three summary statistic files, aligned them on the reference panel, and integrated them in one database. This database was used to perform a multi-trait GWAS on the three traits.
Here are the output files produce by the pipeline:
* ${PIPELINE_FOLDER}/harmonized_GWAS_1_file/ : genome wide harmonized summary statistics
* ${PIPELINE_FOLDER}/harmonized_GWAS_files/ : harmonized summary statistics by chromosomes
* ${PIPELINE_FOLDER}/init_table : database containing all summary statistics to perform multi-trait GWAS
* ${PIPELINE_FOLDER}/worktable : multi-trait GWAS results file
* ${PIPELINE_FOLDER}/quadrant : quadrant plot of the multi-trait GWAS
* ${PIPELINE_FOLDER}/manhattan : manhattan plot of the multi-trait GWAS
If all went well, you have cleaned the three summary statistic files, aligned them on the reference panel, and integrated them in one database. The database was used to perform a multi-trait GWAS on the three traits.
## Required Input ## Required Input
...@@ -83,6 +95,9 @@ For this step you will need to install an additional dependency [RAISS](https:// ...@@ -83,6 +95,9 @@ For this step you will need to install an additional dependency [RAISS](https://
* --ref_panel : A folder containing a Reference Panel in the .bim, .bed, .fam format for imputation with RAISS * --ref_panel : A folder containing a Reference Panel in the .bim, .bed, .fam format for imputation with RAISS
* --ld-folder : A path toward a folder containing LD matrices (that can be generated from the reference panel with the raiss package as described here : http://statistical-genetics.pages.pasteur.fr/raiss/#precomputation-of-ld-correlation) * --ld-folder : A path toward a folder containing LD matrices (that can be generated from the reference panel with the raiss package as described here : http://statistical-genetics.pages.pasteur.fr/raiss/#precomputation-of-ld-correlation)
imputed files will be stored in
* ${PIPELINE_FOLDER}/imputed_GWAS/ : harmonized summary statistics by chromosomes
## Available reference panels ## Available reference panels
To make reference panel readily available, we use git lfs. To make reference panel readily available, we use git lfs.
...@@ -103,15 +118,7 @@ You can download the five panel using the command: ...@@ -103,15 +118,7 @@ You can download the five panel using the command:
git lfs fetch --all git lfs fetch --all
``` ```
or manualy through the gitlab interface: or manualy through the gitlab interface:
![workflow image](./doc/workflow.png) ![workflow image](./doc/download_test_files.png)
## Imputing your summary statistics using RAISS
If you wish to perform imputation step using RAISS you will need to:
1. Switch the parameter params.compute_imputation to true
2. Install the python package RAISS
3. Follow RAISS documentation to generate Linkage desiquilibrium matrices
## Running the LDSC regression covariance step ## Running the LDSC regression covariance step
### To infer multi-trait z-scores null distribution, heritabilities, genetic correlations using the LDscore regression ### To infer multi-trait z-scores null distribution, heritabilities, genetic correlations using the LDscore regression
...@@ -131,7 +138,9 @@ and require a HPC cluster. ...@@ -131,7 +138,9 @@ and require a HPC cluster.
If you want to analyze data in hg38 and for all ancestries, you can contact the main developper of this pipeline (hanna.julienne@pasteur.fr) If you want to analyze data in hg38 and for all ancestries, you can contact the main developper of this pipeline (hanna.julienne@pasteur.fr)
to request the needed input files to request the needed input files
2. To activate the LDscore option turn this flag to true: 2. To activate the LDscore option turn this flag to true:
* --group If you wish to compute joint analyses with the pipeline, a group file with the each phenotype group written on a separated line ```
--compute_LDSC_matrix=true
```
3. Give the path of the reference panel 3. Give the path of the reference panel
Using the LDscore regression on Using the LDscore regression on
...@@ -139,6 +148,13 @@ Using the LDscore regression on ...@@ -139,6 +148,13 @@ Using the LDscore regression on
--LD_SCORE_folder ${PATH_to_REFERENCE} --LD_SCORE_folder ${PATH_to_REFERENCE}
``` ```
If you run this additional step, the following outputs will be generated
* ${PIPELINE_FOLDER}/ldsc_data : preprocessed data to run
* ${PIPELINE_FOLDER}/h2_data: heritability estimation logs
* ${PIPELINE_FOLDER}/cor_data: covariance estimation logs
* ${PIPELINE_FOLDER}/Correlation_matrices: parsed covariance matrices
The H0 matrix will be integrated in the inittable file by the pipeline, and hence taken into account in the inittable.
## Usage Example on HPC Cluster ## Usage Example on HPC Cluster
If you are working with a HPC server (Slurm job scheduler), you can adapt the nextflow_sbatch.config file and launch the pipeline with a command like: If you are working with a HPC server (Slurm job scheduler), you can adapt the nextflow_sbatch.config file and launch the pipeline with a command like:
......
doc/download_test_files.png

111 KiB

...@@ -14,9 +14,8 @@ params.compute_imputation=false ...@@ -14,9 +14,8 @@ params.compute_imputation=false
/* Path of input data */ /* Path of input data */
params.meta_data = "${projectDir}"+"/input_files/Data_test_EAS.csv" // file describing gwas summary statistic format params.meta_data = "${projectDir}"+"/input_files/Data_test_EAS.csv" // file describing gwas summary statistic format
params.gwas_folder = "${projectDir}"+'/test_data/hg38_EAS/' params.gwas_folder = "${projectDir}"+'/test_data/hg38_EAS/'
params.ref_panel = '/pasteur/zeus/projets/p02/GGS_JASS/jass_analysis_pipeline/Ref_panel_by_chr/'
params.region = "${projectDir}"+"/input_files/All_Regions_ALL_ensemble_1000G_hg38_EAS.bed"
params.region = "${projectDir}"+"/input_files/All_Regions_ALL_ensemble_1000G_hg38_EAS.bed"
params.ref_panel_WG = "${projectDir}"+"/Ref_Panel/1000G_EAS_0_01_chr22_21.csv"//"${projectDir}/Ref_Panel/1000G_EAS_0_01_chr22.csv" params.ref_panel_WG = "${projectDir}"+"/Ref_Panel/1000G_EAS_0_01_chr22_21.csv"//"${projectDir}/Ref_Panel/1000G_EAS_0_01_chr22.csv"
params.ancestry="EAS" params.ancestry="EAS"
...@@ -41,7 +40,8 @@ see https://statistical-genetics.pages.pasteur.fr/raiss/#optimizing-raiss-parame ...@@ -41,7 +40,8 @@ see https://statistical-genetics.pages.pasteur.fr/raiss/#optimizing-raiss-parame
params.r2threshold = 0.6 params.r2threshold = 0.6
params.eigenthreshold = 0.05 params.eigenthreshold = 0.05
params.minimumld = 5 params.minimumld = 5
params.ld_folder="/pasteur/zeus/projets/p02/GGS_WKD/DATA_1000G/Panels/Matrix_LD_RAISS/EAS/*.ld"
params.ref_panel = '/pasteur/zeus/projets/p02/GGS_JASS/jass_analysis_pipeline/Ref_panel_by_chr/'
/* Project group */ /* Project group */
params.group = "${projectDir}/input_files/group.txt" params.group = "${projectDir}/input_files/group.txt"
group = file(params.group) group = file(params.group)
...@@ -52,7 +52,7 @@ Region_channel2 = Channel.fromPath(params.region) ...@@ -52,7 +52,7 @@ Region_channel2 = Channel.fromPath(params.region)
chr_channel = Channel.from(1..22) chr_channel = Channel.from(1..22)
ref_chr_channel=Channel.fromPath(params.ref_panel+"/ALL_ensemble_1000G_hg38_EAS_chr*.bim") ref_chr_channel=Channel.fromPath(params.ref_panel+"/ALL_ensemble_1000G_hg38_EAS_chr*.bim")
ld_channel=Channel.fromPath("/pasteur/zeus/projets/p02/GGS_WKD/DATA_1000G/Panels/Matrix_LD_RAISS/EAS/*.ld") ld_channel=Channel.fromPath(params.ld_folder)
extract_sample_size_script_channel = Channel.fromPath("${projectDir}/bin/extract_sample_size.py") extract_sample_size_script_channel = Channel.fromPath("${projectDir}/bin/extract_sample_size.py")
generate_trait_pairs_channel = Channel.fromPath("${projectDir}/bin/generate_trait_pairs.py") generate_trait_pairs_channel = Channel.fromPath("${projectDir}/bin/generate_trait_pairs.py")
......
dag {
enabled = true
overwrite = true
file = 'dag.dot'
}
report {
enabled = true
file = 'nextflow_logs/report.html'
}
trace {
enabled = true
overwrite = true
file = 'nextflow_logs/trace.txt'
}
singularity {
enabled = true
autoMounts = true
runOptions = '--home $HOME:/home/$USER -B /pasteur/zeus/projets/p02/'
}
executor {
submitRateLimit = '10 sec'
maxErrors=20
maxRetries=4
errorStrategy='terminate'
maxForks=400
queueSize = 500
}
process{
withName: 'Compute_MAF' {
container='plink_1.90b5--heea4ae3_0.sif'
time='1h'
queue='dedicated,common,ggs'
cpus=1
}
withName: 'Impute_GWAS' {
memory={8.GB * task.attempt}
time={72.h * task.attempt}
maxRetries=4
queue='dedicated,ggs,common'
cpus=1
}
withName: 'Munge_LDSC_data' {
container='ldsc_1.0.1--py_0.sif'
cpus=1
}
withName: 'Correlation_LDSC_data' {
container="ldsc_1.0.1--py_0.sif"
cpus=1
}
}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment