Skip to content
Snippets Groups Projects
Commit 902e4958 authored by Alexis  CRISCUOLO's avatar Alexis CRISCUOLO :black_circle:
Browse files

1.0

parent 245214b1
No related branches found
No related tags found
No related merge requests found
...@@ -82,9 +82,9 @@ Run _GenoMed_ without option to read the following documentation: ...@@ -82,9 +82,9 @@ Run _GenoMed_ without option to read the following documentation:
## Notes ## Notes
* In brief, _GenoMed_ uses the tool [_mash_](https://mash.readthedocs.io/en/latest/) to compute all pairwise _p_-distances between genomes, and next transforms them into EI/F81 evolutionary distances (see Criscuolo 2020). To obtain accurate _p_-distance estimates with [_mash_](https://mash.readthedocs.io/en/latest/), the sketch size is defined as the average genome length, and the _k_-mer length _k_ is the interger part (floor) of log<sub>4</sub>&nbsp;(_m_<sup>2</sup>-_m_), where _m_ is the maximum genome length (this optimal estimate of _k_ is derived from Formula 1 in Fofanov et al. 2004). All these pairwise evolutionary distances are finally used to compute the average distance &delta;<sub>_g_</sub> of each genome _g_ to all other ones. The medoid genome is the one that minimizes &delta;<sub>_g_</sub>. * In brief, _GenoMed_ uses the tool [_mash_](https://mash.readthedocs.io/en/latest/) to compute all pairwise _p_-distances between genomes, and next transforms them into EI/F81 evolutionary distances (see Criscuolo 2020). To obtain accurate _p_-distance estimates with [_mash_](https://mash.readthedocs.io/en/latest/), the sketch size is defined as the average genome length, and the _k_-mer length _k_ as the integer part (floor) of log<sub>4</sub>&nbsp;(_m_<sup>2</sup>-_m_), where _m_ is the maximum genome length (this optimal estimate of _k_ is derived from Formula 1 in Fofanov et al. 2004). All these pairwise evolutionary distances are finally used to compute the average distance &delta;<sub>_g_</sub> of each genome _g_ to all other ones. The medoid genome is the one that minimizes &delta;<sub>_g_</sub>.
* The medoid genome inference is assessed by an original bootstrap procedure. The initial set of genome is first sampled with replacement (default: 500 resampling). Next, the medoid genome is determined for each resampled set. Finally, a _p_-value is defined as the proportion of times that each genome was a medoid. * The medoid genome inference is assessed by an original bootstrap procedure. The initial set of genome is first sampled with replacement (default: 500 resampling). Next, the medoid genome is determined for each resampled set. Finally, a (kind of) _p_-value is defined as the proportion of times that each genome was a medoid in the resampled sets.
* All input files (at least 3) should be in FASTA format and non compressed. _GenoMed_ is able to consider many input files summarized using [filename expansion](https://tldp.org/LDP/abs/html/globbingref.html), e.g. `dirname/*.fasta`. * All input files (at least 3) should be in FASTA format and non compressed. _GenoMed_ is able to consider many input files summarized using [filename expansion](https://tldp.org/LDP/abs/html/globbingref.html), e.g. `dirname/*.fasta`.
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment