Commit 69575d3f authored by Alexis  CRISCUOLO's avatar Alexis CRISCUOLO

v1.3.190304ac

parent 35f8c05b
......@@ -20,15 +20,17 @@ This will create the executable jar file `REQ.jar` that could be launched with t
java -jar REQ.jar [files]
```
#### Building a native code binary
#### Building a native executable
On computers with the [GNU compiler GCJ](https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcj/) installed, a binary could also be built. In a command-line window, go to the _src_ directory, and type:
On computers with [GraalVM](hhttps://www.graalvm.org/downloads/) installed, a native executable could also be built. In a command-line window, go to the _src_ directory, and type:
```bash
make
javac REQ.java
native-image -H:Name=REQ -H:-MultiThreaded REQ
rm REQ.class
```
This will create the executable binary file `req` that could be launched with the following command line model:
This will create the native executable `REQ` that could be launched with the following command line model:
```bash
./req [files]
./REQ [files]
```
## Usage
......@@ -56,7 +58,7 @@ The following command line writes into the file _tree.req.t_ the phylogenetic tr
```bash
REQ example/matrix.d example/tree.t tree.req.t -v
```
Because the option -v is set, the verbose mode will output the tree topology in NEWICK format, the list of leaf names, and, for each internal branch, the leaf quadripartition together with the rate of elementary quartets _Re_:
Because the option `-v` is set, the verbose mode will output the tree topology in NEWICK format, the list of leaf names, and, for each internal branch, the leaf quadripartition following by the rate of elementary quartets _Re_:
```
# (((((((17,18),16),((20,21),19)),(((4,((6,7),5)),(((2,3),1),0)),(8,9))),(10,11)),(12,13)),14,15);
0: P07_621_SLS
......@@ -95,7 +97,7 @@ Because the option -v is set, the verbose mode will output the tree topology in
[5,7,6,4][0,1,3,2][9,8][21,20,19,18,17,16,15,14,13,12,11,10] Re=1.000 (384/384)
[8][9][0,1,3,2,5,7,6,4][21,20,19,18,17,16,15,14,13,12,11,10] Re=1.000 (96/96)
[0,1,3,2,5,7,6,4][9,8][19,21,20,16,18,17][15,14,13,12,11,10] Re=0.453 (261/576)
[19,21,20,16,18,17][9,8,0,1,3,2,5,7,6,4][11,10][15,14,13,12] Re=0.488 (234/480)ù
[19,21,20,16,18,17][9,8,0,1,3,2,5,7,6,4][11,10][15,14,13,12] Re=0.487 (234/480)
[10][11][9,8,0,1,3,2,5,7,6,4,19,21,20,16,18,17][15,14,13,12] Re=1.000 (64/64)
[9,8,0,1,3,2,5,7,6,4,19,21,20,16,18,17][11,10][13,12][15,14] Re=0.594 (76/128)
[12][13][11,10,9,8,0,1,3,2,5,7,6,4,19,21,20,16,18,17][15,14] Re=1.000 (36/36)
......
This diff is collapsed.
---
title: 'REQ: assessing branch supports of a distance-based phylogenetic tree with the rate of elementary quartets'
tags:
- phylogenetics
- branch support
- evolutionary distances
- quartets
- Java
authors:
- name: Alexis Criscuolo
orcid: 0000-0002-8212-5215
affiliation: 1
affiliations:
- name: Institut Pasteur – Bioinformatics and Biostatistics Hub – C3BI, USR 3756 IP CNRS – 25-28 Rue du Docteur Roux, 75015 Paris, France
index: 1
date: 13 July 2018
bibliography: paper.bib
---
# Summary
*REQ* is a program for quickly estimating a confidence value at each branch of a distance-based phylogenetic tree. Branch support assessment is commonly based on bootstrap procedures [@felsenstein1985; @makarenkov2010; @lemoine2018]. Unfortunately, as they are based on numerous resampling of aligned characters, such procedures require long running times, despite some recent advances [@minh2013; @hoang2018a; @hoang2018b]. In fact, direct branch support methods were already developed for character-based approaches that optimize maximum-parsimony or maximum-likelihood criteria, in order to achieve faster running times [@bremer1988; @bremer1994; @anisimova2006; @anisimova2011]. However, to our knowledge, no practical implementation of direct branch support methods is currently available for distance-based approaches.
Distance-based approaches proceed in two steps: a pairwise evolutionary distance is estimated between each pair of (biological) objects, and, next, an algorithm is used to infer the tree with branch lengths that best fits the evolutionary distance matrix [@pardi2016]. Because of their speed, distance-based methods are widely used for inferring phylogenetic trees. Moreover, as such algorithms only need a distance matrix, they allow phylogenetic analyses to be carried out from a wide range of data types, e.g. DNA-DNA hybridization experiments [@krajewski1990], gene orders [@wang2006; @house2014], gene content [@spencer2007], or unaligned genome sequences [@chapus2005; @henz2005; @cohen2012; @garcia2018]. Nevertheless, in such cases, standard bootstrap-based methods can not be used for estimating branch confidence values.
In order to fill this void, the program *REQ* was developed. This tool estimates the rate of elementary quartets (REQ) for each branch of a given phylogenetic tree from the associated distance matrix, as described by [@guenoche2001]. This method simply computes the proportion of four-leaf subtrees (i.e. quartets) induced by every internal branch that are supported by the four-point condition applied to the six corresponding pairwise evolutionary distances [@zaretskii1965; @buneman1971]. Therefore, this measure is not based on a random sampling (such as bootstrap-based confidence supports). The closer this measure is to 1, the more the corresponding branch is fully supported by the pairwise evolutionary distances.
The program *REQ* is available on [GitLab](https://gitlab.pasteur.fr/GIPhy/REQ) under the [licence GNU GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html). Implemented in Java, *REQ* could be used on every operating system with a simple command line. *REQ* only needs two input files: a distance matrix file in either PHYLIP lower-triangular or square format, and a phylogenetic tree file in NEWICK format created from the distance matrix by any standard phylogenetic tree reconstruction method, e.g. neighbor-joining [@saitou1987; @studier1988], BioNJ [@gascuel1997], FastME [@desper2002]. Although computing the REQ value for every branch of a phylogenetic tree on *n* leaves requires $O(n^5)$ time complexity, *REQ* running time is quite fast (e.g. ~5 seconds with *n* = 500 on a standard computer) and could therefore be used with large phylogenetic trees.
# References
GCJ=gcj
GCJFLAGS=-fsource=1.6 -march=native -msse2 -O3 -minline-all-stringops -fomit-frame-pointer -momit-leaf-frame-pointer -fstrict-aliasing -fno-store-check -fno-bounds-check -funroll-all-loops -Wall
OTHERFLAGS=-funsafe-math-optimizations -ffast-math
MAIN=REQ
EXEC=req
REQ: REQ.java
$(GCJ) $(GCJFLAGS) --main=$(MAIN) $(MAIN).java -o $(EXEC)
......@@ -3,7 +3,7 @@
REQ: estimating the rate of elementary quartets (REQ) for each
branch of a phylogenetic tree from a distance matrix
Copyright (C) 2017,2018 Alexis Criscuolo
Copyright (C) 2017,2018,2019 Alexis Criscuolo
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
......@@ -35,12 +35,13 @@ import java.text.*;
public class REQ {
// constants
final static String VERSION = "v1.2.180713ac";
final static String VERSION = "v1.3.190304ac";
final static int INF = Integer.MAX_VALUE;
// io
static BufferedReader in;
static BufferedWriter out;
static File dfile, tfile;
static boolean verbose;
static NumberFormat df;
......@@ -64,6 +65,8 @@ public class REQ {
//### man ############################################################################################################################################################################
if ( args.length < 3 ) {
System.out.println("");
System.out.println(" REQ " + VERSION);
System.out.println("");
System.out.println(" USAGE: REQ <dfile> <tfile> <outfile> [-v]");
System.out.println("");
......@@ -81,21 +84,23 @@ public class REQ {
df = NumberFormat.getNumberInstance(Locale.US); df.setGroupingUsed(false); df.setMaximumFractionDigits(3); df.setMinimumFractionDigits(3);
verbose = ( (args.length > 3) && args[3].equals("-v") ) ? true : false;
if ( verbose ) System.out.println("REQ " + VERSION);
if ( ! (dfile = new File(args[0])).exists() ) { System.err.println("distance matrix file does not exist: " + args[0]); System.exit(1); }
if ( ! (tfile = new File(args[1])).exists() ) { System.err.println("tree file does not exist: " + args[1]); System.exit(1); }
//### reading distance matrix dm #####################################################################################################################################################
in = new BufferedReader(new FileReader(new File(args[0])));
in = new BufferedReader(new FileReader(dfile));
while ( true ) try { if ( (line=in.readLine().trim()).length() != 0 ) break; } catch ( NullPointerException e ) { System.out.println("matrix file is empty"); System.exit(1); }
try { n = Integer.parseInt(line); } catch ( NumberFormatException e ) { System.out.println("matrix file is incorrectly formatted"); System.exit(1); }
try { n = Integer.parseInt(line); } catch ( NumberFormatException e ) { System.out.println("distance matrix file is incorrectly formatted: " + args[0]); System.exit(1); }
if ( n > 32760 ) { System.out.println("too many taxa (>32760)"); System.exit(1); }
lbl = new ArrayList<String>(n); dm = new float[n][]; i = -1;
while ( true ) {
try { line = in.readLine().trim(); } catch ( NullPointerException e ) { in.close(); break; }
split = line.split("\\s+"); lbl.add(split[0]); dm[++i] = new float[i]; j = 0; while ( ++j <= i ) dm[i][--j] = Float.parseFloat(split[++j]);
}
if ( ++i != n ) { System.out.println("matrix file is incorrectly formatted"); System.exit(1); }
if ( ++i != n ) { System.out.println("distance matrix file is incorrectly formatted: " + args[0]); System.exit(1); }
//### reading phylogenetic tree nwk ##################################################################################################################################################
nwk = new StringBuilder(""); in = new BufferedReader(new FileReader(new File(args[1])));
nwk = new StringBuilder(""); in = new BufferedReader(new FileReader(tfile));
while ( true ) try { nwk = nwk.append(in.readLine().trim()); } catch ( NullPointerException e ) { in.close(); break; }
tr = new StringBuilder(nwk.toString());
......@@ -129,7 +134,7 @@ public class REQ {
if ( u == sup ) {
last = tro.lastIndexOf(")"); tr = tro.insert(last, ')'); v = apc(tr, apc(tr, last)); tr = tr.insert(++v, '(');
++last; //# NOTE: closing parenthesis at index 'last' should not be considered for REQ calculations
//if ( verbose ) System.out.println("# " + tr.toString());
/*if ( verbose ) System.out.println("# " + tr.toString());*/
sup = 0; --u; continue;
}
//# parsing every internal branch e at index u in order to obtain lbl(STa) lbl(STb) | lbl(STc) lbl(T)-lbl(STa U STb U STc) ########
......@@ -151,7 +156,7 @@ public class REQ {
}
}
//# storing re value inside are #####################################################################################################
are.add(Double.valueOf(re=up/(dn=((double)sta.length)*((double)stb.length)*stc.length*std.length)));
are.add(Double.valueOf(re=up/(dn=((double)sta.length)*((double)stb.length)*((double)stc.length)*((double)std.length))));
if ( verbose ) System.out.println((Arrays.toString(sta) + Arrays.toString(stb) + Arrays.toString(stc) + Arrays.toString(std)).replaceAll(" ","") + " Re=" + df.format(re) + " (" + ((long)up) + "/" + ((long)dn) + ")");
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment