Test data are located in the ${PATH_TO_PIPELINE_FOLDER}/test_data/hg38_EAS/ folder
These are extracts of summary statistics from a trans ancestry GWAS on blood traits ([Chen et al](https://www.sciencedirect.com/science/article/pii/S0092867420308229?via%3Dihub)): WBC, White blood cell count; RBC, Red blood cell count; PLT, platelet count.
These are extracts of summary statistics from a trans ancestry GWAS on blood traits ([Chen et al](https://www.sciencedirect.com/science/article/pii/S0092867420308229?via%3Dihub)): WBC, White blood cell count; RBC, Red blood cell count; PLT, platelet count.
...
@@ -50,11 +52,21 @@ They correspond to the chromosome 21 and 22 for the East asian ancestry.
...
@@ -50,11 +52,21 @@ They correspond to the chromosome 21 and 22 for the East asian ancestry.
Once done you can launch the pipeline as:
Once done you can launch the pipeline as:
```
```
nextflow run jass_pipeline.nf --ref_panel_WG {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/Ref_panel --gwas_folder {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/test_data/hg38_EAS/ -with-report jass_report.html
nextflow run jass_pipeline.nf --ref_panel {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/Ref_panel --gwas_folder {ABSOLUTE_PATH_TO_PIPELINE_FOLDER}/test_data/hg_ -with-report jass_report.html
```
```
You can specify parameter in the jass_pipeline.nf header if prefered.
If all went well, you have cleaned the three summary statistic files, aligned them on the reference panel, and integrated them in one database. This database was used to perform a multi-trait GWAS on the three traits.
Here are the output files produce by the pipeline:
* ${PIPELINE_FOLDER}/quadrant : quadrant plot of the multi-trait GWAS
* ${PIPELINE_FOLDER}/manhattan : manhattan plot of the multi-trait GWAS
If all went well, you have cleaned the three summary statistic files, aligned them on the reference panel, and integrated them in one database. The database was used to perform a multi-trait GWAS on the three traits.
## Required Input
## Required Input
...
@@ -83,6 +95,9 @@ For this step you will need to install an additional dependency [RAISS](https://
...
@@ -83,6 +95,9 @@ For this step you will need to install an additional dependency [RAISS](https://
* --ref_panel : A folder containing a Reference Panel in the .bim, .bed, .fam format for imputation with RAISS
* --ref_panel : A folder containing a Reference Panel in the .bim, .bed, .fam format for imputation with RAISS
* --ld-folder : A path toward a folder containing LD matrices (that can be generated from the reference panel with the raiss package as described here : http://statistical-genetics.pages.pasteur.fr/raiss/#precomputation-of-ld-correlation)
* --ld-folder : A path toward a folder containing LD matrices (that can be generated from the reference panel with the raiss package as described here : http://statistical-genetics.pages.pasteur.fr/raiss/#precomputation-of-ld-correlation)
imputed files will be stored in
* ${PIPELINE_FOLDER}/imputed_GWAS/ : harmonized summary statistics by chromosomes
## Available reference panels
## Available reference panels
To make reference panel readily available, we use git lfs.
To make reference panel readily available, we use git lfs.
...
@@ -103,15 +118,7 @@ You can download the five panel using the command:
...
@@ -103,15 +118,7 @@ You can download the five panel using the command:
git lfs fetch --all
git lfs fetch --all
```
```
or manualy through the gitlab interface:
or manualy through the gitlab interface:


## Imputing your summary statistics using RAISS
If you wish to perform imputation step using RAISS you will need to:
1. Switch the parameter params.compute_imputation to true
2. Install the python package RAISS
3. Follow RAISS documentation to generate Linkage desiquilibrium matrices
## Running the LDSC regression covariance step
## Running the LDSC regression covariance step
### To infer multi-trait z-scores null distribution, heritabilities, genetic correlations using the LDscore regression
### To infer multi-trait z-scores null distribution, heritabilities, genetic correlations using the LDscore regression
...
@@ -131,7 +138,9 @@ and require a HPC cluster.
...
@@ -131,7 +138,9 @@ and require a HPC cluster.
If you want to analyze data in hg38 and for all ancestries, you can contact the main developper of this pipeline (hanna.julienne@pasteur.fr)
If you want to analyze data in hg38 and for all ancestries, you can contact the main developper of this pipeline (hanna.julienne@pasteur.fr)
to request the needed input files
to request the needed input files
2. To activate the LDscore option turn this flag to true:
2. To activate the LDscore option turn this flag to true:
* --group If you wish to compute joint analyses with the pipeline, a group file with the each phenotype group written on a separated line
```
--compute_LDSC_matrix=true
```
3. Give the path of the reference panel
3. Give the path of the reference panel
Using the LDscore regression on
Using the LDscore regression on
...
@@ -139,6 +148,13 @@ Using the LDscore regression on
...
@@ -139,6 +148,13 @@ Using the LDscore regression on
--LD_SCORE_folder ${PATH_to_REFERENCE}
--LD_SCORE_folder ${PATH_to_REFERENCE}
```
```
If you run this additional step, the following outputs will be generated
* ${PIPELINE_FOLDER}/ldsc_data : preprocessed data to run