README improved

f580b223 · Gael MILLOT · f1dc7e60 · f580b223
Commit f580b223 authored 7 months ago by Gael MILLOT
--- a/README.md
+++ b/README.md
@@ -226,7 +226,7 @@ An example of results obtained with the dataset is present at this address: http
 | micmac_<UNIQUE_ID> folder | Description |
 | :--- | :--- |
 | **reports** | Folder containing all the reports of the different processes, including the *nextflow.config* file used. |
-| **<FILE_NAME>_micmac.tsv** | Statistical analysis of the corresponding file.<br />Column description: <br /><ul><li>X_MAG: Name of MAG 1.<br /></li><li>Y_MAG: Name of MAG 2.<br /></li><li>COUNT: Counting (number of reads).<br /></li><li>ni.: Total number of reads in all inter contact maps of MAG 1.<br /></li><li>pi.: Proportion of reads in MAG 1 among the total number of reads in all inter contact maps ($p_{i.} = \frac{n_{i.}}{n}$).<br /></li><li>n.j: Total number of reads in all inter contact maps of MAG 2.<br /></li><li>p.j: Proportion of reads in MAG 2 among the total number of reads in all inter contact maps ($p_{.j} = \frac{n_{.j}}{n}$).<br /></li><li>n: Total number of reads in all inter contact maps.<br /></li><li>pij: Proportion of reads in the MAG 1 and MAG 2 inter contact map, among the total number of reads in inter contact maps ($p_{ij} = \frac{COUNT}{n}$).<br /></li><li>pi._p.j: Probability to have ${P(MAG_{1} \cap MAG_{2})}$ number of reads, equal to $p_{i.} \times p_{.j}$ under the $H_{0}$ assumption of independ events (no physical links between the two MAGs).<br /></li><li>tij: Theoretical number of reads in the MAG 1 and MAG 2 inter contact map, under the $H_{0}$ assumption ($t_{ij}=pi.\_p.j \times n$).<br /></li><li>n_score: optional score ${\frac{COUNT}{\sqrt{_{i.} \times n_{.j}}}}$ (deprecated but kept for comparison).<br /></li><li>chi2_p: $chi^2$ score between 0 and 1 ($chi2\_p=\frac{(pij - pi.\_p.j)^2}{pi.\_p.j}$). A 0 value means no difference between pi._p.j and pij, meaning that under $H_{0}$, the pij observed is due to randomness, i.e., noise, i.e., no physical contact between MAG 1 and MAG 2. A 1 value means maximal difference between pi._p.j and pij. This difference can be due to sampling fluctuation but also to real physical link between MAG 1 and MAG 2. It thus implies that the $H_{1}$ hypothesis, alternative to $H_{0}$, is $pij > pi.\_p.j = pij \times n > pi.\_p.j \times n = COUNT > tij$, because the number of reads in an inter contact map, lower than what is expected under $H_{0}$ means nothing, while upper means "physical contact between MAG 1 and MAG 2 above noise".<br /></li><li>res: like chi2 (below) but without the square ($res=\frac{(pij - pi.\_p.j)}{pi.\_p.j} \times n$), in order to define if the (pij - pi._p.j) difference is negative or positive. A negative value indicates that pij < pi._p.j.<br /></li><li>chi2: $chi^2$ score ($chi2=chi2\_p \times n$).<br /></li><li>df: degree of freedom (number of inter contact maps - 1).<br /></li><li>cutoff_0.05: $chi^2$ score cut-off using the $chi^2(df)$ probability distribution curve (5% of the area under the right of the distribution).<br /></li><li>p: p value, i.e., area under the $chi^2(df)$ probability distribution curve, on the right of the chi2 computed value. Warning, the chi2 score used is computed on a single inter contact map (single cell of the matrix) while the $chi^2(df)$ probability distribution curve used is for the sum of all the chi2 scores of all the inter contact maps (chi2 score total). Thus, it is as if the chi2 score total that should be positioned on $chi^2(df)$ to get the p value, is cut into many pieces, all positioned on the same curve to get $n_{c}$ p values. This strongly lower the sensitivity of the tests but increase the specificity.<br /></li><li>signif_0.05: $p \leq 5\%$ and $res > 0$ is indicated by a star.<br /></li><li>signif_0.05_BF: bonferroni correction of $p using the number of inter contact maps $n_{c}$, i.e., $p \times n_{c} \leq 5\%$ and $res > 0$, indicated by a star.</li> |
+| **<FILE_NAME>_micmac.tsv** | Statistical analysis of the corresponding file.<br />Column description: <br /><ul><li>X_MAG: Name of MAG 1.<br /></li><li>Y_MAG: Name of MAG 2.<br /></li><li>COUNT: Counting (number of reads).<br /></li><li>ni.: Total number of reads in all inter contact maps of MAG 1.<br /></li><li>pi.: Proportion of reads in MAG 1 among the total number of reads in all inter contact maps ($p_{i.} = \frac{n_{i.}}{n}$).<br /></li><li>n.j: Total number of reads in all inter contact maps of MAG 2.<br /></li><li>p.j: Proportion of reads in MAG 2 among the total number of reads in all inter contact maps ($p_{.j} = \frac{n_{.j}}{n}$).<br /></li><li>n: Total number of reads in all inter contact maps.<br /></li><li>pij: Proportion of reads in the MAG 1 and MAG 2 inter contact map, among the total number of reads in inter contact maps ($p_{ij} = \frac{COUNT}{n}$).<br /></li><li>pi.\_p.j: Probability to have ${P(MAG_{1} \cap MAG_{2})}$ number of reads, equal to $p_{i.} \times p_{.j}$ under the $H_{0}$ assumption of independ events (no physical links between the two MAGs).<br /></li><li>tij: Theoretical number of reads in the MAG 1 and MAG 2 inter contact map, under the $H_{0}$ assumption ($t_{ij}=pi.\_p.j \times n$).<br /></li><li>n_score: optional score ${\frac{COUNT}{\sqrt{_{i.} \times n_{.j}}}}$ (deprecated but kept for comparison).<br /></li><li>chi2_p: $chi^2$ score between 0 and 1 ($chi2\_p=\frac{(pij - pi.\_p.j)^2}{pi.\_p.j}$). A 0 value means no difference between pi._p.j and pij, meaning that under $H_{0}$, the pij observed is due to randomness, i.e., noise, i.e., no physical contact between MAG 1 and MAG 2. A 1 value means maximal difference between pi._p.j and pij. This difference can be due to sampling fluctuation but also to real physical link between MAG 1 and MAG 2. It thus implies that the $H_{1}$ hypothesis, alternative to $H_{0}$, is $pij > pi.\_p.j = pij \times n > pi.\_p.j \times n = COUNT > tij$, because the number of reads in an inter contact map, lower than what is expected under $H_{0}$ means nothing, while upper means "physical contact between MAG 1 and MAG 2 above noise".<br /></li><li>res: like chi2 (below) but without the square ($res=\frac{(pij - pi.\_p.j)}{pi.\_p.j} \times n$), in order to define if the (pij - pi._p.j) difference is negative or positive. A negative value indicates that pij < pi._p.j.<br /></li><li>chi2: $chi^2$ score ($chi2=chi2\_p \times n$).<br /></li><li>df: degree of freedom (number of inter contact maps - 1).<br /></li><li>cutoff_0.05: $chi^2$ score cut-off using the $chi^2(df)$ probability distribution curve (5% of the area under the right of the distribution).<br /></li><li>p: p value, i.e., area under the $chi^2(df)$ probability distribution curve, on the right of the chi2 computed value. Warning, the chi2 score used is computed on a single inter contact map (single cell of the matrix) while the $chi^2(df)$ probability distribution curve used is for the sum of all the chi2 scores of all the inter contact maps (chi2 score total). Thus, it is as if the chi2 score total that should be positioned on $chi^2(df)$ to get the p value, is cut into many pieces, all positioned on the same curve to get $n_{c}$ p values. This strongly lower the sensitivity of the tests but increase the specificity.<br /></li><li>signif_0.05: $p \leq 5\%$ and $res > 0$ is indicated by a star.<br /></li><li>signif_0.05_BF: bonferroni correction of $p using the number of inter contact maps $n_{c}$, i.e., $p \times n_{c} \leq 5\%$ and $res > 0$, indicated by a star.</li> |
 <br /><br />