Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
J
jass_preprocessing
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Statistical-Genetics
jass_preprocessing
Commits
f28feeee
Commit
f28feeee
authored
5 years ago
by
Hanna JULIENNE
Browse files
Options
Downloads
Patches
Plain Diff
corrected problem when the same file is used for several GWAS
parent
823976b5
No related branches found
No related tags found
No related merge requests found
Pipeline
#18402
passed
5 years ago
Stage: test
Stage: deploy
Changes
3
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
doc/source/index.rst
+3
-2
3 additions, 2 deletions
doc/source/index.rst
jass_preprocessing/__main__.py
+15
-5
15 additions, 5 deletions
jass_preprocessing/__main__.py
jass_preprocessing/map_gwas.py
+2
-4
2 additions, 4 deletions
jass_preprocessing/map_gwas.py
with
20 additions
and
11 deletions
doc/source/index.rst
+
3
−
2
View file @
f28feeee
...
...
@@ -70,7 +70,7 @@ Input
| 1 |14930| rs75454623 | A | G | 0.482228|
+-----+-----+------------+-----+-----+---------+
* Folder containing all raw gwas data (all chromosomes in one file) (minimal conditions?? tab separated?)
* Folder containing all raw gwas data
:
(all chromosomes in one file) (minimal conditions?? tab separated?)
* a list containing the name of GWAS file to the string format.
* A descriptor csv files that will described each GWAS summary statistic files:
...
...
@@ -84,7 +84,7 @@ Input
+===========================================+============================================================+
| path to the data | filename |
+-------------------------------------------+------------------------------------------------------------+
| study info fields |
c
onsorti
a,o
utcome,fullName,type,Nsample,Ncase,Ncontrol,Nsnp|
| study info fields |
C
onsorti
um,O
utcome,fullName,type,Nsample,Ncase,Ncontrol,Nsnp|
+-------------------------------------------+------------------------------------------------------------+
| names of the header in the GWAS file | snpid,a1,a2,freq,pval,n,z,OR,se,code,imp,ncas,ncont |
+-------------------------------------------+------------------------------------------------------------+
...
...
@@ -92,6 +92,7 @@ Input
.. Give an example
.. | I don't know | altNcas,altNcont|
Note that the combination of Consortium and outcome must be unique because it will be used as an index in the cleaning process.
Here is an example of descriptor field, the field irrelevant (for example odd ratio for continuous trait) for the study must be filled with na.
...
...
This diff is collapsed.
Click to expand it.
jass_preprocessing/__main__.py
+
15
−
5
View file @
f28feeee
...
...
@@ -21,22 +21,32 @@ import argparse
#| pathOUT | **unused in main_preprocessing.py** | netPath+'PCMA/1._DATA/RAW.summary/'|
#| ImpG_output_Folder | main ouput folder | netPath+ 'PCMA/1._DATA/preprocessing_test/' |
def
raise_duplicated_index
(
tag
):
duplicated_index
=
tag
.
duplicated
()
raise
ValueError
(
"'
Consortium_Outcome
'
are duplicated for: {0}
"
.
format
(
duplicated_index
))
def
launch_preprocessing
(
args
):
"""
Preprocessing GWAS dataset
"""
gwas_map
=
pd
.
read_csv
(
args
.
gwas_info
,
sep
=
"
\t
"
)
gwas_map
.
set_index
(
"
filename
"
,
inplace
=
True
)
for
gwas_filename
in
gwas_map
.
index
:
tag
=
"
{0}_{1}
"
.
format
(
gwas_map
.
loc
[
gwas_filename
,
'
Consortium
'
],
gwas_map
.
loc
[
gwas_filename
,
'
Outcome
'
])
#define an unique
gwas_map
[
'
tag
'
]
=
gwas_map
.
Consortium
+
"
_
"
+
D
.
Outcome
if
gwas_map
.
tag
.
duplicated
().
any
():
raise_duplicated_index
(
gwas_map
.
tag
)
gwas_map
.
set_index
(
"
tag
"
,
inplace
=
True
)
for
tag
in
gwas_map
.
index
:
gwas_filename
=
D
.
loc
[
tag
,
"
filename
"
]
print
(
'
processing GWAS: {}
'
.
format
(
tag
))
start
=
time
.
time
()
GWAS_link
=
jp
.
map_gwas
.
walkfs
(
args
.
input_folder
,
gwas_filename
)[
2
]
mapgw
=
jp
.
map_gwas
.
map_columns_position
(
GWAS_link
,
args
.
gwas_info
)
mapgw
=
jp
.
map_gwas
.
map_columns_position
(
GWAS_link
,
gwas_map
.
loc
[
tag
]
)
gw_df
=
jp
.
map_gwas
.
read_gwas
(
GWAS_link
,
mapgw
)
...
...
This diff is collapsed.
Click to expand it.
jass_preprocessing/map_gwas.py
+
2
−
4
View file @
f28feeee
...
...
@@ -76,21 +76,19 @@ def convert_missing_values(df):
return
df
.
replace
(
def_missing
,
nan_vec
)
def
map_columns_position
(
gwas_internal_link
,
GWAS
_labels
):
def
map_columns_position
(
gwas_internal_link
,
my
_labels
):
"""
Find column position for each specific Gwas
Args:
gwas_internal_link (str): filename of the GWAS data (with path)
GWAS_labels (
str): filename
of the
csv
information file
GWAS_labels (
pd.DataFrame): corresponding row
of the information file
Return:
pandas Series with column position and column names as index
"""
column_dict
=
pd
.
read_csv
(
GWAS_labels
,
sep
=
'
\t
'
,
na_values
=
'
na
'
)
column_dict
.
set_index
(
"
filename
"
,
inplace
=
True
)
print
(
gwas_internal_link
)
gwas_file
=
gwas_internal_link
.
split
(
'
/
'
)[
-
1
]
my_labels
=
column_dict
.
loc
[
gwas_file
]
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment