@@ -36,14 +36,14 @@ This will create the executable binary file `req` that could be launched with th
Launch _REQ_ without option to read the following documentation:
```
USAGE: REQ <dfile> <tfile> [-v]
<dfile> a distance matrice file in either PHYLIP lower-
triangular or square format
<tfile> an unrooted binary phylogenetic tree file with
no confidence value at branches in NEWICK format
-v verbose mode
USAGE: REQ <dfile> <tfile> <outfile> [-v]
<dfile> distance matrix file in either PHYLIP lower-triangular or
square format
<tfile> unrooted binary phylogenetic tree file with no confidence
value at branches in NEWICK format
<outfile> outfile name
-v verbose mode
```
## Example
...
...
@@ -54,7 +54,7 @@ The directory _example_ contains two files from the study of [Garcia-Hermoso et
The following command line writes into the file _tree.req.t_ the phylogenetic tree from _example/tree.t_ with the rate of elementary quartets at each internal branch estimated from the distance matrix _example/matrix.d_:
author={Garcia-Hermoso D and Criscuolo A and Lee SC and Legrand M and Chaouat M and Denis B and Lafaurie M and Rouveau M and Soler C and Schaal JV and Mimoun M and Mebazaa A and Heitman J and Dromer F and Brisse S and Bretagne S and Alanio A},
year={2018},
title={Outbreak of invasive wound mucormycosis in a burn unit due to multiple strains of Mucor circinelloides f. circinelloides resolved by whole-genome sequencing},
journal={MBio},
volume={9},
number={2},
pages={e00573-18},
doi={10.1128/mBio.00573-18},
url={http://mbio.asm.org/content/9/2/e00573-18}
}
@article{gascuel1997,
author={Gascuel O},
year={1997},
title={BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data},
-name:Institut Pasteur – Bioinformatics and Biostatistics Hub – C3BI, USR 3756 IP CNRS – 25-28 Rue du Docteur Roux, 75015 Paris, France
index:1
date:13 July 2018
bibliography:paper.bib
---
# Summary
``REQ`` is a program for quickly estimating a confidence value at each branch of a distance-based phylogenetic tree. Branch support assessment is commonly based on bootstrap procedures [@felsenstein1985; @makarenkov2010; @lemoine2018]. Unfortunately, as they are based on numerous resampling of aligned characters, such procedures require long running times, despite some recent advances [@minh2013; @hoang2018a; @hoang2018b]. In fact, direct branch support methods were already developed for character-based approaches that optimize maximum-parsimony or maximum-likelihood criteria, in order to achieve faster running times [@bremer1988; @bremer1994; @anisimova2006; @anisimova2011]. However, to our knowledge, no practical implementation of direct branch support methods is currently available for distance-based approaches.
Distance-based approaches proceed in two steps: a pairwise evolutionary distance is estimated between each pair of (biological) objects, and, next, an algorithm is used to infer the tree with branch lengths that best fits the evolutionary distance matrix [@pardi2016]. Because of their speed, distance-based methods are widely used for inferring phylogenetic trees. Moreover, as such algorithms only need a distance matrix, they allow phylogenetic analyses to be carried out from a wide range of data types, e.g. DNA-DNA hybridization experiments [krajewski1990], gene orders [@wang2006; @house2014], gene content [@spencer2007], or unaligned genome sequences [@chapus2005; @henz2005; @cohen2012; @garcia2018]. Nevertheless, in such cases, standard bootstrap-based methods can not be used for estimating branch confidence values.
In order to fill this void, the program ``REQ`` was developed. This tool estimates the rate of elementary quartets (REQ) for each branch of a given phylogenetic tree from the associated distance matrix, as described by [@guenoche2001]. This method simply computes the proportion of four-leaf subtrees (i.e. quartets) induced by every internal branch that are supported by the four-point condition applied to the six corresponding pairwise evolutionary distances [@zaretskii1965; @buneman1971]. Therefore, this measure is not based on a random sampling (such as bootstrap-based confidence supports). The closer this measure is to 1, the more the corresponding branch is fully supported by the pairwise evolutionary distances.
The program ``REQ`` is available on [GitLab](https://gitlab.pasteur.fr/GIPhy/REQ) under the [licence GNU GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html). Implemented in Java, ``REQ`` could be used on every operating system with a simple command line. ``REQ`` only needs two input files: a distance matrix file in either PHYLIP lower-triangular or square format, and a phylogenetic tree file in NEWICK format created from the distance matrix by any standard phylogenetic tree reconstruction method, e.g. neighbor-joining [@saitou1987; @studier1988], BioNJ [@gascuel1997], FastME [@desper2002]. Although computing the REQ value for every branch of a phylogenetic tree on *n* leaves requires *O*(*n*<sup>5</sup>) time complexity, ``REQ`` running time is quite fast (e.g. ~5 seconds with *n* = 500 on a standard computer) and could therefore be used with large phylogenetic trees.
//### man ############################################################################################################################################################################
while(true)try{if((line=in.readLine().trim()).length()!=0)break;}catch(NullPointerExceptione){System.out.println("matrix file is empty");System.exit(1);}
//### estimating the rate of EQ re for each internal branch e ########################################################################################################################
tro=newStringBuilder(tr.toString());
//# first tr rooting: ((ST1),(ST2),(ST3)); => (((ST1),(ST2)),(ST3)); ###############################################################
//# | |
//# sup last
tr=tr.insert(0,'(');sup=apc(tr,tr.lastIndexOf(")"));tr=tr.insert(sup,')');//# NOTE: closing parenthesis at index 'sup' should not be considered for REQ calculations
tr=tr.insert(0,'(');last=apc(tr,tr.lastIndexOf(")"));tr=tr.insert(last,')');// closing parenthesis at index 'last' should not be considered for EWDQ calculations
//### estimating the rate of EWDQ re for each internal branch e ######################################################################################################################
//# parsing every internal branch e at index u in order to obtain lbl(STa) lbl(STb) | lbl(STc) lbl(T)-lbl(STa U STb U STc) ###########################################
//# second tr rooting: ((ST1),(ST2),(ST3)); => ((ST1),((ST2),(ST3))); ##########################################################
//# comparing every quartet ab|cd with the topology induced by dm, where a,b,c,d belongs to STa,STb,STc,STd, respectively #############################################
//# storing re value inside are ########################################################################################################################################
//### writing re values into nwk #####################################################################################################################################################