Skip to content
Snippets Groups Projects
Commit 04bdca8f authored by Alexis  CRISCUOLO's avatar Alexis CRISCUOLO :black_circle:
Browse files

0.3.3.1

parent fe2cb809
No related branches found
No related tags found
No related merge requests found
......@@ -180,9 +180,9 @@ Run _SAM2MAP_ without option to read the following documentation:
* By default, all sequenced bases with Phread score < 20 are not considered (option `-q`), therefore minimizing the impact of sequencing errors when computing the consensus sequence. By default, all read alignment with Phred score < 20 (as assessed by the read mapping program) are also not considered (option `-Q`), therefore discarding from the consensus sequence every region with low mappability (e.g. low complexity or repeated regions).
* _SAM2MAP_ estimates a Poisson+Negative Binomial (NB) theoretical distribution from the observed read coverage distribution, and writes the results into an output file (cov.txt file extension). <br>
The Poisson distribution is dedicated to observed (near-)zero read coverage distribution (called the coverage tail distribution into output files *.cov.txt). It is determined by the probability mass function (PMF) <b>P</b><sub><em>&lambda;</em></sub>(<em>x</em>) = <em>&lambda;</em><sup><em>x</em></sup> <em>e</em><sup>-<em>&lambda;</em></sup> &Gamma;(<em>x</em>+1)<sup>-1</sup>, where &Gamma; is the [gamma function](https://en.wikipedia.org/wiki/Gamma_function). <br>
The (main) NB distribution is used to determine the min/max coverage depths (as ruled by option `-p`) to assess reference regions where the consensus sequence can be trustingly built. The NB(<em>p</em>,<em>r</em>) distribution is determined by the PMF <b>P</b><sub><em>p</em>,<em>r</em></sub>(<em>x</em>) = &Gamma;(<em>r</em>+<em>x</em>) &Gamma;(<em>x</em>+1)<sup>-1</sup> &Gamma;(<em>r</em>)<sup>-1</sup> <em>p</em><sup><em>x</em></sup> (1-<em>p</em>)<sup><em>r</em></sup>. However, when the observed read coverage distribution is not overdispersed (i.e. the NB parameter <em>r</em> tends to infinity), the NB distribution is replaced by the Generalized Poisson (GP) one. The GP(<em>&lambda;'</em>,<em>&rho;</em>) distribution is here determined by the PMF <b>P</b><sub><em>&lambda;'</em>,<em>&rho;</em></sub>(<em>x</em>) = <em>&lambda;'</em> (<em>&lambda;'</em>+<em>&rho;x</em>)<sup><em>x</em>-1</sup> <em>e</em><sup>-<em>&lambda;'</em>-<em>&rho;x</em></sup> &Gamma;(<em>x</em>+1)<sup>-1</sup>, where <em>&rho;</em> < 0; when <em>&rho;</em> = 0, GP(<em>&lambda;'</em>,0) reduces to a Poisson distribution of parameter <em>&lambda;'</em> (for more details, see e.g. Consul and Shoukri 1985). <br>
From the above formalizations, the Poisson+NB theoretical distribution is therefore determined by the PMF <em>w</em> <b>P</b><sub><em>&lambda;</em></sub>(<em>x</em>) + (1-<em>w</em>) <b>P</b><sub><em>p</em>,<em>r</em></sub>(<em>x</em>). The values of the different parameters <em>w</em>, <em>&lambda;</em>, <em>p</em> and <em>r</em> are written into output files *.cov.txt. Of note, such statistical results can also be used jointly with a genome coverage profile analysis (e.g. Lindner et al. 2013).
The Poisson distribution is dedicated to observed (near-)zero read coverage distribution (called the coverage tail distribution into output files *.cov.txt). It is determined by the probability mass function (PMF) <b>P</b><sub><em>&lambda;</em></sub>(<em>x</em>) = <em>&lambda;</em><sup><em>x</em></sup> <em>e</em><sup>-<em>&lambda;</em></sup> &Gamma;(<em>x</em>+1)<sup>-1</sup>, where &Gamma; is the [gamma function](https://en.wikipedia.org/wiki/Gamma_function) and <em>&lambda;</em> &le; 1. <br>
The (main) NB distribution is used to determine the min/max coverage depths (as ruled by option `-p`) to assess reference regions where the consensus sequence can be trustingly built. The NB(<em>p</em>,<em>r</em>) distribution is determined by the PMF <b>P</b><sub><em>p</em>,<em>r</em></sub>(<em>x</em>) = &Gamma;(<em>r</em>+<em>x</em>) &Gamma;(<em>x</em>+1)<sup>-1</sup> &Gamma;(<em>r</em>)<sup>-1</sup> <em>p</em><sup><em>x</em></sup> (1-<em>p</em>)<sup><em>r</em></sup>. However, when there is no overdispersion (i.e. the NB parameter <em>r</em> tends to infinity), the NB distribution is replaced by the Generalized Poisson (GP) one. The GP(<em>&lambda;'</em>,<em>&rho;</em>) distribution is here determined by the PMF <b>P</b><sub><em>&lambda;'</em>,<em>&rho;</em></sub>(<em>x</em>) = <em>&lambda;'</em> (<em>&lambda;'</em>+<em>&rho;x</em>)<sup><em>x</em>-1</sup> <em>e</em><sup>-<em>&lambda;'</em>-<em>&rho;x</em></sup> &Gamma;(<em>x</em>+1)<sup>-1</sup>, where <em>&rho;</em> < 0; when <em>&rho;</em> = 0, GP(<em>&lambda;'</em>,0) reduces to a Poisson distribution with parameter <em>&lambda;'</em> (for more details, see e.g. Consul and Shoukri 1985). <br>
From the above formalizations, the Poisson+NB theoretical distribution is therefore determined by the PMF <em>w</em> <b>P</b><sub><em>&lambda;</em></sub>(<em>x</em>) + (1-<em>w</em>) <b>P</b><sub><em>p</em>,<em>r</em></sub>(<em>x</em>). The values of the different parameters <em>w</em>, <em>&lambda;</em>, <em>p</em> and <em>r</em> are written into output files *.cov.txt. Of note, such statistical results can be useful when _SAM2MAP_ is used to perform a genome coverage profile analysis (e.g. Lindner et al. 2013).
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment