index.rst 4.02 KB
Newer Older
Hanna  JULIENNE's avatar
Hanna JULIENNE committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
.. RAISS documentation master file, created by
   sphinx-quickstart on Mon Aug 20 16:17:59 2018.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Welcome to the Robust and Accurate Imputation from Summary Statistics (RAISS) documentation!
============================================================================================

.. toctree::
   :maxdepth: 2
   :caption: Contents:

What is RAISS ?
===================

RAISS is a python package to impute missing SNP summary statistics from
neighboring SNPs in linkage desiquilibrium.

The statistical model used to make the imputation is described in :cite:`Pasaniuc2014`

The imputation execution time is optimized by precomputing Linkage desiquilibrium between SNPs.


Dependencies
============
RAISS requires plink version 1.9 : `<https://www.cog-genomics.org/plink2>`_

Installation
============

.. code-block:: shell

  pip3 install git+https://gitlab.pasteur.fr/statistical-genetics/raiss.git

Precomputation of LD-correlation
=================================

The imputation is based the Linkage desiquilibrium
between SNPs.

To save computation time, the LD is computed before imputation and saved as tabular format.
To limit the number of SNP pairs, the LD is computed between pairs of
SNPs in a approximately LD-independent regions. For an european ancestry, you can use
the region defined by :cite:`Berisa2015` that are provided in the package data folder.

To compute the LD you need to specify a reference panel splitted by chromosomes
(bed, fam and bim formats of plink, see `PLINK formats <https://www.cog-genomics.org/plink2/formats>`_ )


.. code-block:: python

  # path to the Region file
  region_berisa = "/mnt/atlas/PCMA/WKD_Hanna/cleaned_jass_input/Region_LD.csv"
  # Path to the reference panel
  ref_folder="/mnt/atlas/PCMA/1._DATA/ImpG_refpanel"
  # path to the folder to store the results
  ld_folder_out = "/mnt/atlas/PCMA/WKD_Hanna/impute_for_jass/berisa_ld_block"
  raiss.LD.generate_genome_matrices(, ...)

Input format:
=============

GWAS results files must be provided in the tabular format by chromosome (tab separated)
all in the same folder with the following columns with the same header:

+----------+-------+------+-----+--------+
| rsID     | pos   | A0   | A1  |  Z     |
+==========+=======+======+=====+========+
| rs6548219| 30762 | A	  | G   | -1.133 |
+----------+-------+------+-----+--------+

This format can be obtained with the `JASS PreProcessing package <https://gitlab.pasteur.fr/statistical-genetics/JASS_Pre-processing>`_.


Launching imputation on one chromosome
======================================

RAISS has an interface with the command line (see Command Line Usage bellow).

If you have access to a cluster, an efficient way to use RAISS is to launch
the imputation of each chromosome on a separate cluster node. The script
`launch_imputation_all_gwas.sh <https://gitlab.pasteur.fr/statistical-genetics/raiss/blob/master/launch_imputation_all_gwas.sh>`_
contain an example of raiss usage with a SLURM scheduler.

Output
======

The raiss package outputs imputed GWAS files in the tabular format:
#TODO suppress complementary columns

+------------+---+--+----------------+-----+-----+----------------+------------------+---------+---------+
|            |A0 |A1| Nsnp_to_impute |Var  |Z    |condition_number|correct_inversion |ld_score |   pos   |
+============+===+==+================+=====+=====+================+==================+=========+=========+
| rs11584349 |C  | T|        18	     | 0.85|-0.28|     116.9      |      False       | 1.34    | 1000156 |
+------------+---+--+----------------+-----+-----+----------------+------------------+---------+---------+

# Keep only useful columns

Command Line Usage
==================

.. argparse::
  :ref: impute_jass.__main__.add_chromosome_imputation_argument

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
.. automodule:: impute_jass
   :members:
* :ref:`search`



.. autosummary::
   :toctree: _autosummary


.. bibliography:: reference.bib