Update Input format authored by Apolline  GALLOIS's avatar Apolline GALLOIS
### **Genotypes bim file** Here is a description of the 4 input files required to launch runCMS:
### 1) Genotypes .bim file
This is a PLINK file, with columns separated by tabulations and no header line. It contains one line per variant with the following six fields: chromosome, variant identifier, position in morgans or centimorgans, base-pair coordinate, allele 1 and allele 2. This is a PLINK file, with columns separated by tabulations and no header line. It contains one line per variant with the following six fields: chromosome, variant identifier, position in morgans or centimorgans, base-pair coordinate, allele 1 and allele 2.
...@@ -11,7 +14,7 @@ Example: ...@@ -11,7 +14,7 @@ Example:
21 | rs75354 | 148962 | 305716 | C | A 21 | rs75354 | 148962 | 305716 | C | A
### **Genotypes raw file** ### 2) Genotypes .raw file
This is a PLINK file, with columns separated by spaces and a header line. It contains one line per sample with V+6 fields, where V is the number of variants. This is a PLINK file, with columns separated by spaces and a header line. It contains one line per sample with V+6 fields, where V is the number of variants.
To recode bed/bim/fam to raw file, use this command on PLINK: To recode bed/bim/fam to raw file, use this command on PLINK:
...@@ -28,7 +31,7 @@ FID | IID | PAT | MAT | SEX | PHENOTYPE | SNP1 | SNP2 | SNP3 | .......... ...@@ -28,7 +31,7 @@ FID | IID | PAT | MAT | SEX | PHENOTYPE | SNP1 | SNP2 | SNP3 | ..........
2 | 2 | 0 | 0 | 1 | 0 | 1 | 0 | 2 | .......... 2 | 2 | 0 | 0 | 1 | 0 | 1 | 0 | 2 | ..........
### **Phenotypes file** ### 3) Phenotypes file
This is a text file, with columns separated by tabulations and a header line. In contains one line per individual. First column must be the individual ID. This is a text file, with columns separated by tabulations and a header line. In contains one line per individual. First column must be the individual ID.
...@@ -41,7 +44,7 @@ ID | Sex | Age | LDL-C | HDL-C | HDL-D | HDL-TG | .......... ...@@ -41,7 +44,7 @@ ID | Sex | Age | LDL-C | HDL-C | HDL-D | HDL-TG | ..........
3 | 2 | 47 | 0.8 | 0.21 | 0.5 | 0.3 | .......... 3 | 2 | 47 | 0.8 | 0.21 | 0.5 | 0.3 | ..........
### **Phenotypes summary file** ### 4) Phenotypes summary file
This is a csv file with columns separated by commas and a header line. This file aims at describing the role of each variable contained in the phenotypes file. For each selected variable, the user must provide a label and a binary indicator for classification as confounding factors (i.e. variables systematically included as covariates), outcome (i.e. each single variable that will be treated as a primary outcome) and candidate covariates (i.e. variables that will be assessed by CMS for inclusion as a covariate). This is a csv file with columns separated by commas and a header line. This file aims at describing the role of each variable contained in the phenotypes file. For each selected variable, the user must provide a label and a binary indicator for classification as confounding factors (i.e. variables systematically included as covariates), outcome (i.e. each single variable that will be treated as a primary outcome) and candidate covariates (i.e. variables that will be assessed by CMS for inclusion as a covariate).
`Note that variables classified as confounding factor cannot be used as either outome or covariate, and such combination will be flagged as an error.` `Note that variables classified as confounding factor cannot be used as either outome or covariate, and such combination will be flagged as an error.`
... ...
......