Changes

Apolline GALLOIS · 5701d70a
--- a/Input-format.md
+++ b/Input-format.md
+# Input format
+- **Genotypes bim file**  
+This is a PLINK file, with columns separated by tabulations and no header line. It contains one line per variant with the following six fields: chromosome, variant identifier, position in morgans or centimorgans, base-pair coordinate, allele 1 and allele 2.  
+Example:  
+*(chromosome)* | *(variant identifier)* | *(position)* | *(base-pair coordinate)* | *(A1)* | *(A2)*
+:---: | :-------: | :----: | :-----: | :---: | :---:
+1 | rs123456 | 7568 | 15411 | A | T
+5 | rs6715 | 89863 | 41347 | G | A
+21 | rs75354 | 148962 | 305716 | C | A
+- **Genotypes raw file**  
+This is a PLINK file, with columns separated by spaces and a header line. It contains one line per sample with V+6 fields, where V is the number of variants.  
+To recode bed/bim/fam to raw file, use this command on PLINK:
+```bash
+plink --bfile $inputFile --recodeA --out $outputFile
+```
+Example:  
+FID | IID | PAT	| MAT | SEX | PHENOTYPE | SNP1 | SNP2 | SNP3 | ..........
+:---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---:
+1 | 1 | 0 | 0 | 2 | 0 | 0 | 1 | 2 | ..........
+2 | 2 | 0 | 0 | 1 | 0 | 1 | 0 | 2 | ..........
+- **Phenotypes file**  
+This is a text file, with columns separated by tabulations and a header line. In contains one line per individual. First column must be the individual ID.  
+Example:  
+ID | Sex | Age | LDL-C | HDL-C |  HDL-D | HDL-TG | ..........
+:---: | :---: | :---: | :---: | :---: | :---: | :---: | :---:
+1 | 1 | 45 | 0.1 | 0.48 | 0.85 | 0.89 | ..........
+2 | 1 | 32 | 0.2 | 0.65 | 0.1 | 0.41 | ..........
+3 | 2 | 47 | 0.8 | 0.21 | 0.5 | 0.3 | ..........
+- **Summary file**  
+This is a csv file with columns separated by commas and a header line. This file aims at describing the role of each variable contained in the phenotypes file. For each selected variable, the user must provide a label and a binary indicator for classification as confounding factors (i.e. variables systematically included as covariates), outcome (i.e. each single variable that will be treated as a primary outcome) and candidate covariates (i.e. variables that will be assessed by CMS for inclusion as a covariate).  
+`Note that variables classified as confounding factor cannot be used as either outome or covariate, and such combination will be flagged as an error.`  
+By default, all variables in "Covariates" column will be included as covariates in each outcome analysis. The "Excluded" column give the opportunity to exclude specific variables from covariates for a given outcome. These variables must be separated by ";" without any spaces. If no variables need to be excluded, simply let the column empty. In the example, we exclude all "HDL" variable when analysing one of them.  
+Example:  
+Label | Conf | Outcome | Covariate | Excluded
+:---: | :---: | :---: | :---: | :---:
+Sex | 1 | 0 | 0 |
+Age | 1 | 0 | 0 |
+LDL-C | 0 | 1 | 1 |
+HDL-C | 0 | 1 | 1 | HDL-D;HDL-TG
+HDL-D | 0 | 1 | 1 | HDL-C;HDL-TG
+HDL-TG | 0 | 1 | 1 | HDL-C;HDL-D
\ No newline at end of file