diff --git a/jass_preprocessing/doc/source/index.rst b/jass_preprocessing/doc/source/index.rst index 6ffaf180dfb084318a807f7ad679246680e1d440..d85c2a115ff909c139b419671f9cd3b88adc236b 100644 --- a/jass_preprocessing/doc/source/index.rst +++ b/jass_preprocessing/doc/source/index.rst @@ -41,25 +41,55 @@ execute the following lines: cd JASS_Pre-processing/ pip3 install ./jass_preprocessing/ - - - Preprocessing example ===================== -The file : "/JASS_Pre-processing/main_preprocessing.py" gives an example on how to use -this package. +The file : "/JASS_Pre-processing/main_preprocessing.py" gives a complete example on +how to use this package. Input ====== -* A reference panel (1000 genome format) +* A reference panel (1000 genome format). The user is expected to provide a reference panel in tsv format with the following columns in that order, without header: + ++-----+-----+------------+-----+-----+---------+ +| chr | pos | snp_id | ref | alt | MAF | ++=====+=====+============+=====+=====+=========+ +| 1 |13116| rs62635286 | T | G |0.0970447| ++-----+-----+------------+-----+-----+---------+ +| 1 |13118| rs200579949| A | G |0.0970447| ++-----+-----+------------+-----+-----+---------+ +| 1 |14604| rs541940975| A | G | 0.147564| ++-----+-----+------------+-----+-----+---------+ +| 1 |14930| rs75454623 | A | G | 0.482228| ++-----+-----+------------+-----+-----+---------+ + * Folder containing all raw gwas data (all chromosomes in one file) * a list containing the name of GWAS file to the string format. -* A descriptor csv files that will described each GWAS summary statistic files +* A descriptor csv files that will described each GWAS summary statistic files: + + * a header + * 1 line per study + * the fields are: + + ++-------------------------------------------+------------------------------------------------------------+ +| category | field name | ++===========================================+============================================================+ +| path to the data | filename | ++-------------------------------------------+------------------------------------------------------------+ +| study info fields | consortia,outcome,fullName,type,Nsample,Ncase,Ncontrol,Nsnp| ++-------------------------------------------+------------------------------------------------------------+ +| names of the header in the GWAS file | snpid,a1,a2,freq,pval,n,z,OR,se,code,imp,ncas,ncont | ++-------------------------------------------+------------------------------------------------------------+ + +.. | I don't know | altNcas,altNcont| + + * it must contain the following columns: +Hard coded path (l.20-29 of JASS_Pre-processing/main_preprocessing.py) Indices and tables ================== diff --git a/main_preprocessing.py b/main_preprocessing.py index d95a03439986bd84e0edf04bd98c2d5b1465fd36..97b963721cd0ed2a0b3441a3607fd7e204dd3597 100644 --- a/main_preprocessing.py +++ b/main_preprocessing.py @@ -16,6 +16,24 @@ import pandas as pd import seaborn as sns import time + +#Hard coded path (l.20-29 of JASS_Pre-processing/main_preprocessing.py) + +#| variable name | description | current default value| +#|---------------|-------------|----------------------| +#| netPath | Main project folder, must end by "/" | /mnt/atlas/ | +#| GWAS_labels* | Path to the file describing the format of the individual GWASs files | netPath+'PCMA/1._DATA/RAW.GWAS/GWAS_labels.csv' | +#| GWAS_path* | Path to the folder containing the GWASs summ stat files, must end by "/" | netPath+'PCMA/1._DATA/RAW.GWAS/'| +#| diagnostic_folder | folder for histograms of sample size distribution among SNPs | /mnt/atlas/PCMA/1._DATA/sample_size_distribution/ | +#| ldscore_format | data formated to use LDscore, 1 file per study | /mnt/atlas/PCMA/1._DATA/ldscore_data/ | +#| REF_filename* | file containing the reference panel for imputation | netPath+'PCMA/0._REF/1KGENOME/summary_genome_Filter_part2.out' | +#| pathOUT | **unused in main_preprocessing.py** | netPath+'PCMA/1._DATA/RAW.summary/'| +#| ImpG_output_Folder | main ouput folder | netPath+ 'PCMA/1._DATA/preprocessing_test/' | + +#+ Hard coded variable: perSS = 0.7: the proportion of the 90th percentile of the sample size used to filter the SNPs + + + perSS = 0.7 netPath = "/mnt/atlas/" GWAS_labels = netPath+'PCMA/1._DATA/RAW.GWAS/GWAS_labels.csv'