@@ -14,8 +14,8 @@ The current pipeline integrate the following workflow:
...
@@ -14,8 +14,8 @@ The current pipeline integrate the following workflow:
This pipeline enables you to run multi-trait GWAS in a computationaly efficient way
This pipeline enables you to run multi-trait GWAS in a computationaly efficient way
1. Install nextflow as explain here : https://www.nextflow.io/docs/latest/getstarted.html
1. Install nextflow as explain here : https://www.nextflow.io/docs/latest/getstarted.html
2. Install the jass_preprocessing python package or use its docker container (see below).
2. Install the [jass_preprocessing](https://statistical-genetics.pages.pasteur.fr/jass_preprocessing/#installation) python package or use its docker container (see below).
3. Install the JASS python package or download its docker container.
3. Install the [JASS](https://statistical-genetics.pages.pasteur.fr/jass/install.html) python package or download its docker container.
### Launch pipeline on test data ###
### Launch pipeline on test data ###
...
@@ -63,13 +63,26 @@ The following Item are necessary to run JASS pipeline on real data
...
@@ -63,13 +63,26 @@ The following Item are necessary to run JASS pipeline on real data
1. --meta_data : A path toward a meta-data file describing GWAS (see example file in ./input_files/test1.csv and [jass_preprocessing documentation](http://statistical-genetics.pages.pasteur.fr/jass_preprocessing/))
1. --meta_data : A path toward a meta-data file describing GWAS (see example file in ./input_files/test1.csv and [jass_preprocessing documentation](http://statistical-genetics.pages.pasteur.fr/jass_preprocessing/))
2. --gwas_folder : A path toward a folder containing the summary statistics to analyze
2. --gwas_folder : A path toward a folder containing the summary statistics to analyze
3. --ref_panel_WG : a path toward a reference panel (all genome as 1 file). See below to download curated reference panels by ancestries derived from 1000G
3. --ref_panel_WG : a path toward a reference panel (all genome as 1 file). See below to download curated reference panels by ancestries derived from 1000G
4. --ld-folder : A path toward a folder containing LD matrices (that can be generated from the reference panel with the raiss package as described here : http://statistical-genetics.pages.pasteur.fr/raiss/#precomputation-of-ld-correlation)
5. --group If you wish to compute joint analyses with the pipeline, a group file with the each phenotype group written on a separated line
## Optional parameters
## Optional parameters
* --output_folder : A path toward a folder to write pipeline results (inittable, worktable...). by default results will be publish in the workflow directory.
* --output_folder : A path toward a folder to write pipeline results (inittable, worktable...). by default results will be publish in the workflow directory.
### to launch multi-trait GWAS at the end of the pipeline
You can use this pipeline to launch a batch of multi-trait GWAS at the end of the pipeline
* --group If you wish to compute joint analyses with the pipeline, a group file with the each phenotype group written on a separated line
Alternatively, use the **jass create-project-data command line** on the inittable file (all your summary statistique harmonized) stored.
See JASS documentation for its usage (https://statistical-genetics.pages.pasteur.fr/jass/generating_joint_analysis.html).
### To launch imputation based on summary statistics
For this step you will need to install an additional dependency [RAISS](https://gitlab.pasteur.fr/statistical-genetics/raiss) python package.
* --ref_panel : A folder containing a Reference Panel in the .bim, .bed, .fam format for imputation with RAISS
* --ref_panel : A folder containing a Reference Panel in the .bim, .bed, .fam format for imputation with RAISS
* --ld-folder : A path toward a folder containing LD matrices (that can be generated from the reference panel with the raiss package as described here : http://statistical-genetics.pages.pasteur.fr/raiss/#precomputation-of-ld-correlation)
## Available reference panels
## Available reference panels
To make reference panel readily available, we use git lfs.
To make reference panel readily available, we use git lfs.
...
@@ -101,26 +114,47 @@ If you wish to perform imputation step using RAISS you will need to:
...
@@ -101,26 +114,47 @@ If you wish to perform imputation step using RAISS you will need to:
3. Follow RAISS documentation to generate Linkage desiquilibrium matrices
3. Follow RAISS documentation to generate Linkage desiquilibrium matrices
## Running the LDSC regression covariance step
## Running the LDSC regression covariance step
### To infer multi-trait z-scores null distribution, heritabilities, genetic correlations using the LDscore regression
Download and extract reference panel for LD-score in the pipeline folder:
For exactitude, we recommend using the LDscore regression to infer the multivariate distribution of Z-scores under the null.
The alternative, implemented by default, is to estimate the null distribution by computing the covariance of Zscore with low genetic signal.
Hence this step is not strickly required.
When computed for a large number of trait, this step can be computationally intensive,
and require a HPC cluster.
1. For hg37 and the EUR ancestry, you can download their Download and extract reference panel for LD-score in the pipeline folder: