Commit 04bdca8f by Alexis CRISCUOLO

### 0.3.3.1

parent fe2cb809
 ... ... @@ -180,9 +180,9 @@ Run _SAM2MAP_ without option to read the following documentation: * By default, all sequenced bases with Phread score < 20 are not considered (option `-q`), therefore minimizing the impact of sequencing errors when computing the consensus sequence. By default, all read alignment with Phred score < 20 (as assessed by the read mapping program) are also not considered (option `-Q`), therefore discarding from the consensus sequence every region with low mappability (e.g. low complexity or repeated regions). * _SAM2MAP_ estimates a Poisson+Negative Binomial (NB) theoretical distribution from the observed read coverage distribution, and writes the results into an output file (cov.txt file extension).
The Poisson distribution is dedicated to observed (near-)zero read coverage distribution (called the coverage tail distribution into output files *.cov.txt). It is determined by the probability mass function (PMF) Pλ(x) = λx e-λ Γ(x+1)-1, where Γ is the [gamma function](https://en.wikipedia.org/wiki/Gamma_function).
The (main) NB distribution is used to determine the min/max coverage depths (as ruled by option `-p`) to assess reference regions where the consensus sequence can be trustingly built. The NB(p,r) distribution is determined by the PMF Pp,r(x) = Γ(r+x) Γ(x+1)-1 Γ(r)-1 px (1-p)r. However, when the observed read coverage distribution is not overdispersed (i.e. the NB parameter r tends to infinity), the NB distribution is replaced by the Generalized Poisson (GP) one. The GP(λ',ρ) distribution is here determined by the PMF Pλ',ρ(x) = λ' (λ'+ρx)x-1 e-λ'-ρx Γ(x+1)-1, where ρ < 0; when ρ = 0, GP(λ',0) reduces to a Poisson distribution of parameter λ' (for more details, see e.g. Consul and Shoukri 1985).
From the above formalizations, the Poisson+NB theoretical distribution is therefore determined by the PMF w Pλ(x) + (1-w) Pp,r(x). The values of the different parameters w, λ, p and r are written into output files *.cov.txt. Of note, such statistical results can also be used jointly with a genome coverage profile analysis (e.g. Lindner et al. 2013). The Poisson distribution is dedicated to observed (near-)zero read coverage distribution (called the coverage tail distribution into output files *.cov.txt). It is determined by the probability mass function (PMF) Pλ(x) = λx e-λ Γ(x+1)-1, where Γ is the [gamma function](https://en.wikipedia.org/wiki/Gamma_function) and λ ≤ 1.
The (main) NB distribution is used to determine the min/max coverage depths (as ruled by option `-p`) to assess reference regions where the consensus sequence can be trustingly built. The NB(p,r) distribution is determined by the PMF Pp,r(x) = Γ(r+x) Γ(x+1)-1 Γ(r)-1 px (1-p)r. However, when there is no overdispersion (i.e. the NB parameter r tends to infinity), the NB distribution is replaced by the Generalized Poisson (GP) one. The GP(λ',ρ) distribution is here determined by the PMF Pλ',ρ(x) = λ' (λ'+ρx)x-1 e-λ'-ρx Γ(x+1)-1, where ρ < 0; when ρ = 0, GP(λ',0) reduces to a Poisson distribution with parameter λ' (for more details, see e.g. Consul and Shoukri 1985).
From the above formalizations, the Poisson+NB theoretical distribution is therefore determined by the PMF w Pλ(x) + (1-w) Pp,r(x). The values of the different parameters w, λ, p and r are written into output files *.cov.txt. Of note, such statistical results can be useful when _SAM2MAP_ is used to perform a genome coverage profile analysis (e.g. Lindner et al. 2013). ... ...
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!