Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Statistical-Genetics
RAISS
Commits
f6f688c3
Commit
f6f688c3
authored
Nov 02, 2021
by
Hanna JULIENNE
Browse files
add code Snippet in documentation (on perf testing)
parent
13b5783e
Pipeline
#68634
passed with stages
in 1 minute and 33 seconds
Changes
3
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
doc/source/index.rst
View file @
f6f688c3
...
...
@@ -97,6 +97,61 @@ The raiss package outputs imputed GWAS files in the tabular format:
| rs111876722 | 201922 | C | T | 0.297 | 0.16 | 5.412 |
+-------------+----------+------------+------------+---------+-------+----------+
Optimizing RAISS parameter for your data
========================================
Raiss package contains a function (raiss.imputation_R2.grid_search)
to assess its performance on your data and fine tune RAISS parameter.
Test procedure :
1. Mask N SNPs on a chromosome
2. Imputed masked file
3. Compute correlation between genotype Z-values to imputed Z-values
To perform this test follow this procedure :
1. Create a folder to store masked z-score files
2. Create a folder to store z-score files imputed with different parameter
3. Adapt the following code snippet to apply the function to your data:
.. code-block::
:linenos:
perf_results = raiss.imputation_R2.grid_search(
${path_z-scores_folder},
${path_to_masked_z-scores_folder},
${path_to_imputed_z-scores_folder},
${path_to_reference_panel_folder},
${path_to_LD_matrices_folder},
"GWAS_TAG", chrom="chr22",
eigen_ratio_grid = [ 1, 0.5 ,0.1, 0.01], # Enter the value you want to test in this list
window_size= 500000, buffer_size=125000, l2_regularization=0.1,
R2_threshold=0.6)
fout = "./Perf_"+GWAS_TAG+".csv"
print(perf_results)
perf_results.to_csv(fout, sep="\t")
The file Perf_GWAS_TAG ressemble the following output:
+----+----+--------------------+-----------------+
| |cor |mean_absolute_error |fraction_imputed |
+====+====+====================+=================+
|1.0 |0.95| 0.243 | 1.0 |
+----+----+--------------------+-----------------+
| 0.5|0.94| 0.246 | 0.95 |
+----+----+--------------------+-----------------+
The row names correspond to the eigen ratio parameter that was tested.
The second column is the correlation between imputed and genotyped Z-scores.
The third column is the mean L1-error between imputed and genotyped Z-scores.
The fourth column is the fraction of SNPs on the 5000 that were imputed.
The optimal eigen_ratio can vary depending on the density of your reference panel and input data.
Hence, we recommend to run a grid search to pick the best parameter for your data.
However, empirically, we never observed a difference of performance from one chromosome to another.
We suggest testing on the chr22 for computational efficiency.
Command Line Usage
==================
...
...
raiss/imputation_R2.py
View file @
f6f688c3
...
...
@@ -24,14 +24,13 @@ def generated_test_data(zscore, N_to_mask=5000, condition=None, stratifying_vec
"""
try
:
if
isinstance
(
condition
,
pd
.
Series
)
==
True
:
print
(
"Condition vector"
)
masked
=
np
.
random
.
choice
(
zscore
.
index
[
condition
],
N_to_mask
,
replace
=
False
)
else
:
print
(
"Stratifying vector?"
)
inter_id
=
zscore
.
index
.
intersection
(
stratifying_vector
.
index
).
drop_duplicates
(
keep
=
'first'
)
print
(
inter_id
[
1
:
10
])
stratifying_vector
=
stratifying_vector
.
loc
[
inter_id
]
if
isinstance
(
stratifying_vector
,
pd
.
Series
)
==
True
:
print
(
"Stratifying vector"
)
inter_id
=
zscore
.
index
.
intersection
(
stratifying_vector
.
index
).
drop_duplicates
(
keep
=
'first'
)
stratifying_vector
=
stratifying_vector
.
loc
[
inter_id
]
masked
=
[]
binned
=
np
.
digitize
(
stratifying_vector
,
stratifying_bins
)
N_bins
=
len
(
stratifying_bins
)
-
1
...
...
raiss/stat_models.py
View file @
f6f688c3
...
...
@@ -104,6 +104,6 @@ def raiss_model(zt, sig_t, sig_i_t, lamb=0.01, rcond=0.01, batch=True):
var_norm
=
var_in_boundaries
(
var
,
lamb
)
R2
=
((
1
+
lamb
)
-
var_norm
)
print
(
R2
)
mu
=
mu
/
np
.
sqrt
(
R2
)
return
({
"var"
:
var
,
"mu"
:
mu
,
"ld_score"
:
ld_score
,
"condition_number"
:
condition_number
,
"correct_inversion"
:
correct_inversion
})
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment