Commit 63b7e9f0 authored by Alexis  CRISCUOLO's avatar Alexis CRISCUOLO
Browse files

Initial commit

parents
This diff is collapsed.
# REQ
_REQ_ is a command line program written in [Java](https://docs.oracle.com/javase/8/docs/technotes/guides/language/index.html) that allows estimating the rate of elementary quartets (REQ) for each internal branch of a phylogenetic tree from a distance matrix, as described by [Guénoche and Garreta (2001)](https://doi.org/10.1007/3-540-45727-5_5).
## Compilation and execution
The source code of _REQ_ is inside the _src_ directory and could be compiled and executed in two different ways.
#### Building an executable jar file
On computers with [Oracle JDK](http://www.oracle.com/technetwork/java/javase/downloads/index.html) (6 or higher) installed, a Java executable jar file could be created. In a command-line window, go to the _src_ directory and type:
```bash
javac REQ.java
echo Main-Class: REQ > MANIFEST.MF
jar -cmvf MANIFEST.MF REQ.jar REQ.class
rm MANIFEST.MF REQ.class
```
This will create the executable jar file `REQ.jar` that could be launched with the following command line model:
```bash
java -jar REQ.jar [files]
```
#### Building a native code binary
On computers with the [GNU compiler GCJ](https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcj/) installed, a binary could also be built. In a command-line window, go to the _src_ directory, and type:
```bash
make
```
This will create the executable binary file `req` that could be launched with the following command line model:
```bash
./req [files]
```
## Usage
Launch _REQ_ without option to read the following documentation:
```
USAGE: REQ <dfile> <tfile> [-v]
<dfile> a distance matrice file in either PHYLIP lower-
triangular or square format
<tfile> an unrooted binary phylogenetic tree file with
no confidence value at branches in NEWICK format
-v verbose mode
```
## Example
The directory _example_ contains two files from the study of [Garcia-Hermoso et al. (2018)](https://doi.org/10.1128/mBio.00573-18):
* _matrix.d_: a distance matrix in PHYLIP square format estimated from 22 _Mucor circinelloides_ f. _circinelloides_ genomes,
* _tree.t_: the associated Minimum Evolution phylogenetic tree in NEWICK format.
The following command line writes into the file _tree.req.t_ the phylogenetic tree from _example/tree.t_ with the rate of elementary quartets at each internal branch estimated from the distance matrix _example/matrix.d_:
```bash
REQ example/matrix.d example/tree.t > tree.req.t
```
## References
Garcia-Hermoso D, Criscuolo A, Lee SC, Legrand M, Chaouat M, Denis B, Lafaurie M, Rouveau M, Soler C, Schaal JV, Mimoun M, Mebazaa A, Heitman J, Dromer F, Brisse S, Bretagne S, Alanio A (2018) Outbreak of invasive wound mucormycosis in a burn unit due to multiple strains of _Mucor circinelloides_ f. _circinelloides_ resolved by whole-genome sequencing. MBio, 24;9(2):e00573-18. [doi:10.1128/mBio.00573-18](https://doi.org/10.1128/mBio.00573-18).
Guénoche A, Garreta H (2001) Can we have confidence in a tree representation? _In_: Gascuel O, Sagot MF (eds) Computational Biology. Lecture Notes in Computer Science, vol 2066. Springer, Berlin, Heidelberg. [doi:10.1007/3-540-45727-5_5](https://doi.org/10.1007/3-540-45727-5_5). [[pdf]](http://iml.univ-mrs.fr/editions/biblio/guenoche/QualiTree-1.pdf)
22
P07_621_SLS 0.00000000 0.02865660 0.02870160 0.02878250 0.03950280 0.03988420 0.04011750 0.04016660 0.03901690 0.03900900 0.03904710 0.03898010 0.03899590 0.03895910 0.03899060 0.03892640 0.03951620 0.03908920 0.03908920 0.03895390 0.03892510 0.03892380
1006PHL 0.02865660 0.00000000 0.00521582 0.00512027 0.03869940 0.03915910 0.03936560 0.03941090 0.03840960 0.03840700 0.03841730 0.03833760 0.03835940 0.03831570 0.03836200 0.03831830 0.03885310 0.03844440 0.03843530 0.03829910 0.03828620 0.03829780
P08_701_BU2_PER 0.02870160 0.00521582 0.00000000 0.00277604 0.03888190 0.03931120 0.03951210 0.03953220 0.03862280 0.03862150 0.03860070 0.03850890 0.03854890 0.03850630 0.03853730 0.03848310 0.03903660 0.03860720 0.03859040 0.03849340 0.03848310 0.03847920
P01_617_BU1_SLS 0.02878250 0.00512027 0.00277604 0.00000000 0.03885570 0.03929930 0.03948950 0.03952680 0.03858650 0.03858260 0.03858390 0.03850760 0.03855540 0.03853210 0.03853470 0.03848690 0.03904320 0.03860590 0.03859940 0.03850110 0.03848690 0.03848050
P11_702_BU2_PER 0.03950280 0.03869940 0.03888190 0.03885570 0.00000000 0.00167453 0.00178869 0.00187529 0.01829370 0.01827060 0.01831740 0.01824630 0.01831260 0.01823540 0.01825780 0.01824690 0.01881830 0.01830400 0.01829310 0.01823110 0.01822690 0.01825300
E01_615_SLS 0.03988420 0.03915910 0.03931120 0.03929930 0.00167453 0.00000000 0.00190845 0.00200267 0.01876060 0.01873950 0.01878480 0.01871840 0.01878110 0.01872150 0.01871530 0.01872280 0.01866770 0.01877920 0.01877670 0.01872400 0.01871530 0.01873820
P04_601_BU1_SLS 0.04011750 0.03936560 0.03951210 0.03948950 0.00178869 0.00190845 0.00000000 0.00028225 0.01891870 0.01889250 0.01891060 0.01884950 0.01894490 0.01887500 0.01885380 0.01886690 0.01899250 0.01892810 0.01892370 0.01887000 0.01886690 0.01888190
P04_559_BU1_SLS 0.04016660 0.03941090 0.03953220 0.03952680 0.00187529 0.00200267 0.00028225 0.00000000 0.01895120 0.01892560 0.01895620 0.01888870 0.01898690 0.01891810 0.01889750 0.01890930 0.01903140 0.01897680 0.01896500 0.01891560 0.01890740 0.01892930
P05_600_BU1_SLS 0.03901690 0.03840960 0.03862280 0.03858650 0.01829370 0.01876060 0.01891870 0.01895120 0.00000000 0.00038241 0.00165844 0.00158976 0.00160947 0.00153078 0.00140824 0.00158766 0.00198330 0.00144933 0.00150698 0.00154334 0.00155120 0.00156509
P05_598_BU1_SLS 0.03900900 0.03840700 0.03862150 0.03858260 0.01827060 0.01873950 0.01889250 0.01892560 0.00038241 0.00000000 0.00169672 0.00163578 0.00164078 0.00157007 0.00145350 0.00162498 0.00203393 0.00149000 0.00154203 0.00159317 0.00160027 0.00161183
P04_603_BU1_SLS 0.03904710 0.03841730 0.03860070 0.03858390 0.01831740 0.01878480 0.01891060 0.01895620 0.00165844 0.00169672 0.00000000 0.00043542 0.00161920 0.00154701 0.00145558 0.00147774 0.00215112 0.00160815 0.00165264 0.00152371 0.00154675 0.00154937
P04_602_BU1_SLS 0.03898010 0.03833760 0.03850890 0.03850760 0.01824630 0.01871840 0.01884950 0.01888870 0.00158976 0.00163578 0.00043542 0.00000000 0.00153496 0.00147748 0.00137866 0.00139526 0.00206551 0.00152607 0.00158293 0.00144881 0.00147226 0.00147539
P06_032_BU1_SLS 0.03899590 0.03835940 0.03854890 0.03855540 0.01831260 0.01878110 0.01894490 0.01898690 0.00160947 0.00164078 0.00161920 0.00153496 0.00000000 0.00028128 0.00133800 0.00144543 0.00195214 0.00148531 0.00156535 0.00136725 0.00144673 0.00145871
P06_023_BU1_SLS 0.03895910 0.03831570 0.03850630 0.03853210 0.01823540 0.01872150 0.01887500 0.01891810 0.00153078 0.00157007 0.00154701 0.00147748 0.00028128 0.00000000 0.00125210 0.00137970 0.00190122 0.00144283 0.00150959 0.00129487 0.00137736 0.00138618
P02_783_BU1_SLS 0.03899060 0.03836200 0.03853730 0.03853470 0.01825780 0.01871530 0.01885380 0.01889750 0.00140824 0.00145350 0.00145558 0.00137866 0.00133800 0.00125210 0.00000000 0.00126652 0.00191809 0.00145298 0.00150698 0.00136156 0.00143294 0.00144100
P12_579_STR 0.03892640 0.03831830 0.03848310 0.03848690 0.01824690 0.01872280 0.01886690 0.01890930 0.00158766 0.00162498 0.00147774 0.00139526 0.00144543 0.00137970 0.00126652 0.00000000 0.00209039 0.00155434 0.00161315 0.00144283 0.00151273 0.00152162
P05_622_BU1_SLS 0.03951620 0.03885310 0.03903660 0.03904320 0.01881830 0.01866770 0.01899250 0.01903140 0.00198330 0.00203393 0.00215112 0.00206551 0.00195214 0.00190122 0.00191809 0.00209039 0.00000000 0.00175683 0.00181076 0.00172423 0.00178284 0.00178949
P10_703_BU2_PER 0.03908920 0.03844440 0.03860720 0.03860590 0.01830400 0.01877920 0.01892810 0.01897680 0.00144933 0.00149000 0.00160815 0.00152607 0.00148531 0.00144283 0.00145298 0.00155434 0.00175683 0.00000000 0.00036705 0.00122665 0.00130622 0.00131190
P09_704_BU2_PER 0.03908920 0.03843530 0.03859040 0.03859940 0.01829310 0.01877670 0.01892370 0.01896500 0.00150698 0.00154203 0.00165264 0.00158293 0.00156535 0.00150959 0.00150698 0.00161315 0.00181076 0.00036705 0.00000000 0.00129074 0.00135845 0.00137088
P03_594_BU1_SLS 0.03895390 0.03829910 0.03849340 0.03850110 0.01823110 0.01872400 0.01887000 0.01891560 0.00154334 0.00159317 0.00152371 0.00144881 0.00136725 0.00129487 0.00136156 0.00144283 0.00172423 0.00122665 0.00129074 0.00000000 0.00030335 0.00031354
P05_599_BU1_SLS 0.03892510 0.03828620 0.03848310 0.03848690 0.01822690 0.01871530 0.01886690 0.01890740 0.00155120 0.00160027 0.00154675 0.00147226 0.00144673 0.00137736 0.00143294 0.00151273 0.00178284 0.00130622 0.00135845 0.00030335 0.00000000 0.00032861
P03_592_BU1_SLS 0.03892380 0.03829780 0.03847920 0.03848050 0.01825300 0.01873820 0.01888190 0.01892930 0.00156509 0.00161183 0.00154937 0.00147539 0.00145871 0.00138618 0.00144100 0.00152162 0.00178949 0.00131190 0.00137088 0.00031354 0.00032861 0.00000000
(((((((P10_703_BU2_PER:0.000157849066,P09_704_BU2_PER:0.000209200934):0.000483122851,P05_622_BU1_SLS:0.001116966803):0.000018368078,((P05_599_BU1_SLS:0.000159586258,P03_592_BU1_SLS:0.000169023742):0.000015792325,P03_594_BU1_SLS:0.000128336869):0.000462006451):0.000081437521,(((P11_702_BU2_PER:0.000615002481,((P04_601_BU1_SLS:0.000117392803,P04_559_BU1_SLS:0.000164857197):0.000925936953,E01_615_SLS:0.000884566733):0.000161626490):0.008730158578,(((P08_701_BU2_PER:0.001390856250,P01_617_BU1_SLS:0.001385183750):0.001280565582,1006PHL:0.002499367591):0.011475862713,P07_621_SLS:0.014634622105):0.015451073655):0.008164485602,(P05_600_BU1_SLS:0.000172301484,P05_598_BU1_SLS:0.000210108516):0.000603348539):0.000010417106):0.000045431349,(P04_603_BU1_SLS:0.000255598014,P04_602_BU1_SLS:0.000179821986):0.000552675249):0.000029208891,(P06_032_BU1_SLS:0.000173131711,P06_023_BU1_SLS:0.000108148289):0.000554756800):0.000022656931,P02_783_BU1_SLS:0.000592772460,P12_579_STR:0.000673747540);
GCJ=gcj
GCJFLAGS=-fsource=1.6 -march=native -msse2 -O3 -minline-all-stringops -fomit-frame-pointer -momit-leaf-frame-pointer -fstrict-aliasing -fno-store-check -fno-bounds-check -funroll-all-loops -Wall
OTHERFLAGS=-funsafe-math-optimizations -ffast-math
MAIN=REQ
EXEC=req
REQ: REQ.java
$(GCJ) $(GCJFLAGS) --main=$(MAIN) $(MAIN).java -o $(EXEC)
/*
####################################################################
REQ: estimating the rate of elementary quartets (REQ) for each
branch of a phylogenetic tree from a distance matrix
Copyright (C) 2017,2018 Alexis Criscuolo
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact:
Institut Pasteur
Bioinformatics and Biostatistics Hub
C3BI, USR 3756 IP CNRS
Paris, FRANCE
alexis.criscuolo@pasteur.fr
####################################################################
*/
import java.io.*;
import java.util.*;
import java.text.*;
public class REQ {
// constants
final static int INF = Integer.MAX_VALUE;
// io
static BufferedReader in;
static boolean verbose;
static NumberFormat df;
// data
static int n;
static ArrayList<String> lbl;
static float[][] dm;
static StringBuilder nwk, tr;
// stuffs
static int i, j, m, u, v, w, x, y, up, dn, last;
static double re, ab, ac, bc, abcd;
static short si;
static String line;
static String[] split;
static short[] sta, stb, stc, std, sa;
static TreeSet<Short> ts, stdts;
static ArrayList<Double> are;
public static void main(String[] args) throws IOException {
//### man ############################################################################################################################################################################
if ( args.length < 2 ) {
System.out.println("");
System.out.println(" USAGE: REQ <dfile> <tfile> [-v]");
System.out.println("");
System.out.println(" <dfile> a distance matrice file in either PHYLIP lower-");
System.out.println(" triangular or square format");
System.out.println(" <tfile> an unrooted binary phylogenetic tree file with");
System.out.println(" no confidence value at branches in NEWICK format");
System.out.println(" -v verbose mode");
System.out.println("");
System.exit(0);
}
//### init ###########################################################################################################################################################################
df = NumberFormat.getNumberInstance(Locale.US); df.setGroupingUsed(false); df.setMaximumFractionDigits(3); df.setMinimumFractionDigits(3);
verbose = false; if ( args.length > 2 ) verbose = true;
//### reading distance matrix dm #####################################################################################################################################################
in = new BufferedReader(new FileReader(new File(args[0])));
while ( true ) try { if ( (line=in.readLine().trim()).length() != 0 ) break; } catch ( NullPointerException e ) { System.out.println("matrix file is empty"); System.exit(1); }
try { n = Integer.parseInt(line); } catch ( NumberFormatException e ) { System.out.println("matrix file is incorrectly formatted"); System.exit(1); }
if ( n > 32760 ) { System.out.println("too many taxa (>32760)"); System.exit(1); }
lbl = new ArrayList<String>(n); dm = new float[n][]; i = -1;
while ( true ) {
try { line = in.readLine().trim(); } catch ( NullPointerException e ) { in.close(); break; }
split = line.split("\\s+"); lbl.add(split[0]); dm[++i] = new float[i]; j = 0; while ( ++j <= i ) dm[i][--j] = (float) Double.parseDouble(split[++j]);
}
if ( ++i != n ) { System.out.println("matrix file is incorrectly formatted"); System.exit(1); }
//### reading phylogenetic tree nwk ##################################################################################################################################################
nwk = new StringBuilder(""); in = new BufferedReader(new FileReader(new File(args[1])));
while ( true ) try { nwk = nwk.append(in.readLine().trim()); } catch ( NullPointerException e ) { in.close(); break; }
tr = new StringBuilder(nwk.toString());
//### discarding branch lengths from tr ##############################################################################################################################################
//u = -1; while ( (u=tr.indexOf(":", ++u)) != -1 ) { x = ( (x=tr.indexOf(",", u)) != -1 ) ? x : INF; y = ( (y=tr.indexOf(")", u)) != -1 ) ? y : INF; tr = tr.delete(u, (x<y)?x:y); }
u = -1 ; while ( (u=tr.indexOf(":", ++u)) != -1 ) tr = tr.delete(u, (x=((x=tr.indexOf(",",u))!=-1)?x:INF) < (y=((y=tr.indexOf(")",u))!=-1)?y:INF) ? x : y);
//### encoding lbl within tr #########################################################################################################################################################
u = -1;
while ( (u = (x=((x=tr.indexOf(",",u))!=-1)?x:INF) < (y=((y=tr.indexOf("(",u))!=-1)?y:INF) ? x : y) < INF ) {
if ( tr.charAt(++u) == '(' ) continue;
tr = tr.replace(u, (v = (x=((x=tr.indexOf(",",u))!=-1)?x:INF) < (y=((y=tr.indexOf(")",u))!=-1)?y:INF) ? x : y), String.valueOf(j=lbl.indexOf(tr.substring(u, v))));
if ( j == -1 ) { System.out.println(tr.substring(u, v) + " is not inside distance matrix"); System.exit(1); }
}
//### rooting tr #####################################################################################################################################################################
tr = tr.insert(0, '('); last = apc(tr, tr.lastIndexOf(")")); tr = tr.insert(last, ')'); // closing parenthesis at index 'last' should not be considered for EWDQ calculations
if ( verbose ) { i = -1; while ( ++i < n ) System.out.println(i + ": " + lbl.get(i)); System.out.println(tr.toString()); }
//### estimating the rate of EWDQ re for each internal branch e ######################################################################################################################
ts = new TreeSet<Short>(); si = 0; --si; while ( ++si < n ) ts.add(new Short(si)); are = new ArrayList<Double>(); u = -1;
while ( (u=tr.indexOf(")", ++u)) != last ) {
//# parsing every internal branch e at index u in order to obtain lbl(STa) lbl(STb) | lbl(STc) lbl(T)-lbl(STa U STb U STc) ###########################################
//# ( ... (((STa),(STb)),STc) ... );
//# || | | |
//# xv w 'u' y
v = aop(tr, u); w = apc(tr, u); x = cop(tr, u); y = ccp(tr, u); stdts = new TreeSet<Short>(ts);
//System.out.print(u + ": " + tr.substring(v+1,w) + " " + tr.substring(w+1,u) + " " + ((u+1!=y)?tr.substring(u+2,y):tr.substring(x+1,v-1)) + " " );
sta = new short[j=(split=tr.substring(v+1, w).replaceAll("[,\\(\\)]+", " ").trim().split(" ")).length]; for (String s: split) stdts.remove(Short.valueOf(sta[--j]=Short.parseShort(s)));
stb = new short[j=(split=tr.substring(++w, u).replaceAll("[,\\(\\)]+", " ").trim().split(" ")).length]; for (String s: split) stdts.remove(Short.valueOf(stb[--j]=Short.parseShort(s)));
stc = new short[j=(split=((u+1 != y) ? tr.substring(u+2, y) : tr.substring(++x, --v)).replaceAll("[,\\(\\)]+", " ").trim().split(" ")).length]; for (String s: split) stdts.remove(Short.valueOf(stc[--j]=Short.parseShort(s)));
std = new short[j=stdts.size()]; for (Short si: stdts) std[--j] = si.shortValue();
//# comparing every quartet ab|cd with the topology induced by dm, where a,b,c,d belongs to STa,STb,STc,STd, respectively #############################################
//Arrays.sort(sta); Arrays.sort(stb); Arrays.sort(stb);
up = 0;
for (short a: sta) {
for (short b: stb) { ab = (a>b)?dm[a][b]:dm[b][a];
for (short c: stc) { ac = (a>c)?dm[a][c]:dm[c][a]; bc = (b>c)?dm[b][c]:dm[c][b];
for (short d: std) up += ( (abcd=ab+((c>d)?dm[c][d]:dm[d][c])) < ac+((b>d)?dm[b][d]:dm[d][b]) && abcd < bc+((a>d)?dm[a][d]:dm[d][a]) ) ? 1 : 0;
}
}
}
//# storing re value inside are ########################################################################################################################################
are.add(Double.valueOf(re=up/(double)(dn=sta.length*stb.length*stc.length*std.length)));
if ( verbose ) System.out.println((Arrays.toString(sta) + Arrays.toString(stb) + Arrays.toString(stc) + Arrays.toString(std)).replaceAll(" ","") + " Re=" + df.format(re) + " (" + up + "/" + dn + ")");
}
//### writing re values into nwk #####################################################################################################################################################
u = -1; v = -1;
while ( (u=nwk.indexOf(")", ++u)) != -1 ) {
if ( nwk.charAt(u+1) == ';' ) break;
nwk = nwk.insert(u+1, df.format(are.get(++v).doubleValue()));
}
System.out.println(nwk.toString());
}
//## containing opening parenthesis (cop): returns x from t and i, where i is a closing parenthesis index
// t = ( ... ((STa),(STb)) ... );
// | |
// x i
final static int cop (StringBuilder t, int i) { short p = 0; ++p; int x = i; while ( --x >= 0 ) switch ( t.charAt(x) ) { case ')': ++p; continue; case '(': if ( --p < 0 ) return x; } return -1; }
//## containing closing parenthesis (ccp): returns x from t and i, where i is a closing parenthesis index
// t = ( ... ((STa),(STb)) ... );
// | |
// i x
final static int ccp (StringBuilder t, int i) { short p = 0; int x = i; while ( true ) switch ( t.charAt(++x) ) { case ';': return -1; case '(': ++p; continue; case ')': if ( --p < 0 ) return x; } }
//## associated opening parenthesis (aop): returns x from t and i, where i is a closing parenthesis index
// t = ( ... ((STa),(STb)) ... );
// | |
// x i
final static int aop (StringBuilder t, int i) { short p = 0; ++p; int x = i; while ( --x >= 0 ) switch ( t.charAt(x) ) { case ')': ++p; continue; case '(': if ( --p == 0 ) return x; } return -1; }
//## associated partitionning comma (apc): returns x from t and i, where i is a closing parenthesis index
// t = ( ... ((STa),(STb)) ... );
// | |
// x i
final static int apc (StringBuilder t, int i) { short p = 0; int x = i; while ( --x >= 0 ) switch ( t.charAt(x) ) { case ')': ++p; continue; case '(': --p; continue; case ',': if ( p == 0 ) return x; } return -1; }
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment