diff --git a/README.md b/README.md index 1e91c1d2e0542bc8250d3c3eb660a75f62b04637..1f34a89ca579ba422864973165c6e17c29d3a42b 100644 --- a/README.md +++ b/README.md @@ -14,8 +14,8 @@ The current pipeline integrate the following workflow: This pipeline enables you to run multi-trait GWAS in a computationaly efficient way 1. Install nextflow as explain here : https://www.nextflow.io/docs/latest/getstarted.html -2. Install the jass_preprocessing python package or use its docker container (see below). -3. Install the JASS python package or download its docker container. +2. Install the [jass_preprocessing](https://statistical-genetics.pages.pasteur.fr/jass_preprocessing/#installation) python package or use its docker container (see below). +3. Install the [JASS](https://statistical-genetics.pages.pasteur.fr/jass/install.html) python package or download its docker container. ### Launch pipeline on test data ### @@ -63,13 +63,26 @@ The following Item are necessary to run JASS pipeline on real data 1. --meta_data : A path toward a meta-data file describing GWAS (see example file in ./input_files/test1.csv and [jass_preprocessing documentation](http://statistical-genetics.pages.pasteur.fr/jass_preprocessing/)) 2. --gwas_folder : A path toward a folder containing the summary statistics to analyze 3. --ref_panel_WG : a path toward a reference panel (all genome as 1 file). See below to download curated reference panels by ancestries derived from 1000G -4. --ld-folder : A path toward a folder containing LD matrices (that can be generated from the reference panel with the raiss package as described here : http://statistical-genetics.pages.pasteur.fr/raiss/#precomputation-of-ld-correlation) -5. --group If you wish to compute joint analyses with the pipeline, a group file with the each phenotype group written on a separated line ## Optional parameters * --output_folder : A path toward a folder to write pipeline results (inittable, worktable...). by default results will be publish in the workflow directory. + +### to launch multi-trait GWAS at the end of the pipeline + +You can use this pipeline to launch a batch of multi-trait GWAS at the end of the pipeline +* --group If you wish to compute joint analyses with the pipeline, a group file with the each phenotype group written on a separated line + +Alternatively, use the **jass create-project-data command line** on the inittable file (all your summary statistique harmonized) stored. +See JASS documentation for its usage (https://statistical-genetics.pages.pasteur.fr/jass/generating_joint_analysis.html). + +### To launch imputation based on summary statistics + +For this step you will need to install an additional dependency [RAISS](https://gitlab.pasteur.fr/statistical-genetics/raiss) python package. + * --ref_panel : A folder containing a Reference Panel in the .bim, .bed, .fam format for imputation with RAISS +* --ld-folder : A path toward a folder containing LD matrices (that can be generated from the reference panel with the raiss package as described here : http://statistical-genetics.pages.pasteur.fr/raiss/#precomputation-of-ld-correlation) + ## Available reference panels To make reference panel readily available, we use git lfs. @@ -101,26 +114,47 @@ If you wish to perform imputation step using RAISS you will need to: 3. Follow RAISS documentation to generate Linkage desiquilibrium matrices ## Running the LDSC regression covariance step +### To infer multi-trait z-scores null distribution, heritabilities, genetic correlations using the LDscore regression -Download and extract reference panel for LD-score in the pipeline folder: +For exactitude, we recommend using the LDscore regression to infer the multivariate distribution of Z-scores under the null. +The alternative, implemented by default, is to estimate the null distribution by computing the covariance of Zscore with low genetic signal. +Hence this step is not strickly required. + +When computed for a large number of trait, this step can be computationally intensive, +and require a HPC cluster. + +1. For hg37 and the EUR ancestry, you can download their Download and extract reference panel for LD-score in the pipeline folder: ``` wget https://data.broadinstitute.org/alkesgroup/LDSCORE/eur_w_ld_chr.tar.bz2 tar -jxvf eur_w_ld_chr.tar.bz2 ``` +If you want to analyze data in hg38 and for all ancestries, you can contact the main developper of this pipeline (hanna.julienne@pasteur.fr) +to request the needed input files +2. To activate the LDscore option turn this flag to true: + * --group If you wish to compute joint analyses with the pipeline, a group file with the each phenotype group written on a separated line + +3. Give the path of the reference panel +Using the LDscore regression on +``` + --LD_SCORE_folder ${PATH_to_REFERENCE} +``` ## Usage Example on HPC Cluster If you are working with a HPC server (Slurm job scheduler), you can adapt the nextflow_sbatch.config file and launch the pipeline with a command like: -sbatch --mem-per-cpu 32G -p common,dedicated,ggs --qos=long --wrap "module load java/13.0.2;module load singularity/3.8.3;module load graphviz/2.42.3;./nextflow run imputation_only.nf -with-report imput_report.html -with-timeline imput_timeline.html -c nextflow_sbatch.config -qs 300" +sbatch --mem-per-cpu 32G -p common,dedicated,ggs --qos=long --wrap "module load java/13.0.2;module load singularity/3.8.3;module load graphviz/2.42.3;./nextflow run imputation_only.nf -with-report imput_report.html -with-timeline imput_timeline.html -c nextflow_slurm.config -qs 300" ## Using docker container -Stable versions of JASS tools are available as docker container: +Stable versions of JASS tools and dependencies are available as docker container: +- plink: +https://quay.io/repository/biocontainers/plink?tab=tags +- LDscore: +https://quay.io/repository/biocontainers/ldsc?tab=tags - JASS preprocessing: https://quay.io/repository/biocontainers/jass_preprocessing?tab=tags - - JASS containers: https://quay.io/repository/biocontainers/jass?tab=tags - RAISS containers: diff --git a/nextflow_local.config b/nextflow_local.config new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/nextflow_test.config b/nextflow_slurm.config similarity index 100% rename from nextflow_test.config rename to nextflow_slurm.config