diff --git a/README.md b/README.md index a9c9b01086fafbe905d71177e394d259a1182c77..11654fdf55756f78fd7692ad0ed2eab99c870d87 100644 --- a/README.md +++ b/README.md @@ -93,12 +93,13 @@ To specify the location of a specific binary, edit the file `forest.sh` and indi Run _forest_ without option to read the following documentation: ``` - USAGE: forest -i <infile> -t <treefile> -m <model> [-o <outfile>] [-s] [-v] - [-d <delta>] [-n <max>] [-w <tmpdir>] [-p <threads>] [-c] [-h] + USAGE: forest -i <infile> -m <model> -t <treefile> [-f <forest>] [-o <outfile>] [-s] + [-d <delta>] [-n <max>] [-v] [-V] [-w <tmpdir>] [-p <threads>] [-c] [-h] OPTIONS: -i <file> FASTA-formatted multiple sequence alignment file (mandatory) - -t <file> NEWICK-formatted tree file (mandatory) + -t <file> NEWICK-formatted tree file (mandatory unless option -f is set) + -f <file> input file containing precomputed forest (default: none) -m <string> IQ-TREE-formatted evolutionary model (mandatory) -o <file> outfile name (default: standard output) -d <real> maximum log-likelihood score difference (default: 10) @@ -106,10 +107,10 @@ Run _forest_ without option to read the following documentation: -n <int> maximum size of the forest (default: 1000000) -p <int> number of threads (default: 1) -w <dir> path to the tmp directory (default: $TMPDIR, otherwise /tmp) - -c checks dependencies and exit -v verbose mode (default: not set) - -V prints the marginal log likelihood of the forest, i.e. the log + -V computes and prints the marginal log likelihood of the forest, i.e. the log of the sum of all the likelihood values (default: not set) + -c checks dependencies and exit -h prints this help and exit ``` @@ -123,13 +124,13 @@ Run _forest_ without option to read the following documentation: * By default, _forest_ builds Nearest Neighbor Interchange (NNI) neighborhoods using [_gotree_](https://research.pasteur.fr/en/software/goalign-gotree/) (Lemoine and Gascuel 2021). However, _forest_ can optionally build Subtree Pruning and Regrafting (SPR) neighborhoods (option `-s`) using the utility program [_uspr_neighbors_](https://github.com/cwhidden/uspr) (Whidden and Matsen 2019). SPR neighborhoods are larger that NNI ones, therefore leading to important running times to process trees, most of them being often outside the forest <em>F</em><sub>Δ</sub>. -* Mandatory input files are a FASTA-formatted multiple sequence alignment (option `-i`) and a NEWICK-formatted tree (option `-t`) files. Multiple trees can be present in the tree file, but it is expected that the first tree is the one with the highest likelihood (with optimized branch lengths; ideally the ML tree returned by [_IQ-TREE_](https://www.iqtree.org/)). Every tree in the input tree file should be written one per line. Duplicated input tree topologies are detected and discarded by _forest_. Trivially, the leaf names of the input tree(s) may match the sequence names into the multiple sequence alignment. +* Two mandatory input files are required: a FASTA-formatted multiple sequence alignment (option `-i`) and a NEWICK-formatted tree (option `-t`) files. Multiple trees can be present in the tree file, but it is expected that the first tree is the one with the highest likelihood (with optimized branch lengths; ideally the ML tree returned by [_IQ-TREE_](https://www.iqtree.org/)). Every tree in the input tree file should be written one per line. Duplicated input tree topologies are detected and discarded by _forest_. Trivially, the leaf names of the input tree(s) may match the sequence names into the multiple sequence alignment. * The option `-m` is mandatory and should be set with a substitution model with a format compatible with the specifications of [_IQ-TREE_](https://www.iqtree.org/) (e.g. GTR+F+I+G, LG+F+R4; see details about its option `-m` [here](http://www.iqtree.org/doc/Substitution-Models)). * The option `-d` determines the log-likelihood difference Δ between the ML tree and every other tree in the forest <em>F</em><sub>Δ</sub>. Very large forests are often returned when setting large value of Δ (e.g. Δ > 20). However, the option `-n` enables to control the size of the forest (1,000,000 by default). Note that in many cases, dealing with 10 ≤ Δ ≤ 20 is often sufficient to build a forest that is representative of the overall phylogenetic signal induced by a dataset for e.g. inferring a consensus tree (see Example 1 below). -* The output file (option `-o`) is a tab-delimited file containing the sorted log-likelihood score and the NEWICK representation of the trees in the forest. As the forest content is written into the output file at the end of each step, the script _forest_ can be stopped at any time (using `Ctrl+C`). +* The output file (option `-o`) is a tab-delimited file containing the sorted log-likelihood score and the NEWICK representation of the trees in the forest. As the forest content is written into the output file at the end of each step, the script _forest_ can be stopped at any time (using `Ctrl+C`). Of note, the forest building procedure can be restarted by specifying the current output file using option `-f`. * At each step, _forest_ prints a summary of the ongoing construction, i.e. the number of examinated trees (`viewed`), the size of the queue (`queued`), the number of trees currently in the forest (`kept`) and the reached maximum log-likelihood value (`best log-lk`). This summary can be used to decide when to stop the execution (e.g. when the forest is populated by a sufficient number of trees, or when the current forest seems to have converged to its optimum content). @@ -267,7 +268,6 @@ However, it is worth noting that using SPR neighborhoods allows to quickly reach When building (majority-rule) consensus trees from the forest content (using e.g. [_TreeCons_](https://gitlab.pasteur.fr/GIPhy/TreeCons)), setting large Δ thresholds is generally useless. To illustrate this assertion, _forest_ was run on the _Listeria_ dataset with increasing Δ thresholds (using NNI neighborhoods); for each Δ value, the size of the forest is reported below, as well as the marginal log-likelihood (i.e. the log of the sum of all tree likelihood values; option `-V`). -<sup> <div align="center"> | Δ | forest<br>size | marginal log-<br>likelihood | | Δ | forest<br>size | marginal log-<br>likelihood | @@ -284,7 +284,6 @@ To illustrate this assertion, _forest_ was run on the _Listeria_ dataset with in | 10 | 7654 | −2308.78489 | | 20 | 264457 | −2308.78100 | </div> -</sup> As expected, the size of the forest grows with the value of Δ, whereas its marginal log-likelihood converges to an asymptotic limit (≈ −2308.78100). Indeed, for each increasing Δ value threshold, supplementary trees in F<sub>Δ</sub> correspond to smaller likelihood values (very close to 0) that have little impact on their overall sum. diff --git a/forest.sh b/forest.sh index 2fb2180730d2f6091f3f2536d05d72cec7147b34..14ededdb0b178dbaee98b1fa9f018c2d5878fbef 100755 --- a/forest.sh +++ b/forest.sh @@ -47,7 +47,10 @@ # = VERSIONS = # # ============ # # # - VERSION=0.4; # + VERSION=0.5; # +# + new option -f to restart from a previously computed forest # +# # +# VERSION=0.4; # # + new option -V to print the log of the sum of all likelihood values # # # # VERSION=0.3; # @@ -157,18 +160,19 @@ # ============ # # # mandoc() { - echo -e "\n\033[1m forest v$VERSION $COPYRIGHT\033[0m"; + echo -e "\n\033[1m forest v$VERSION $COPYRIGHT\033[0m"; cat <<EOF Build a forest of near-ML (maximum likelihood) phylogenetic trees https://gitlab.pasteur.fr/GIPhy/forest - USAGE: forest -i <infile> -t <treefile> -m <model> [-o <outfile>] [-s] [-v] - [-d <delta>] [-n <max>] [-w <tmpdir>] [-p <threads>] [-c] [-h] + USAGE: forest -i <infile> -m <model> -t <treefile> [-f <forest>] [-o <outfile>] [-s] + [-d <delta>] [-n <max>] [-v] [-V] [-w <tmpdir>] [-p <threads>] [-c] [-h] OPTIONS: -i <file> FASTA-formatted multiple sequence alignment file (mandatory) - -t <file> NEWICK-formatted tree file (mandatory) + -t <file> NEWICK-formatted tree file (mandatory unless option -f is set) + -f <file> input file containing precomputed forest (default: none) -m <string> IQ-TREE-formatted evolutionary model (mandatory) -o <file> outfile name (default: standard output) -d <real> maximum log-likelihood score difference (default: 10) @@ -176,10 +180,10 @@ mandoc() { -n <int> maximum size of the forest (default: 1000000) -p <int> number of threads (default: 1) -w <dir> path to the tmp directory (default: \$TMPDIR, otherwise /tmp) - -c checks dependencies and exit -v verbose mode (default: not set) - -V prints the marginal log likelihood of the forest, i.e. the log + -V computes and prints the marginal log likelihood of the forest, i.e. the log of the sum of all the likelihood values (default: not set) + -c checks dependencies and exit -h prints this help and exit EOF @@ -337,6 +341,7 @@ export LC_ALL=C; INFILE="$NA"; # infile -i MODEL="$NA"; # model -m TREEFILE="$NA"; # treefile -t +FORESTFILE="$NA"; # forest file -f OUTFILE=/dev/stdout; # outfile -o DELTA=10; # delta -d SPR=false; # SPR -s @@ -347,24 +352,25 @@ VERBOSE=false; # verbose mode -v MARGINAL=false; # marginal lh -V DEBUG=false; # debug mode -X -while getopts i:t:m:d:n:o:w:p:svVchX option +while getopts i:t:f:m:d:n:o:w:p:svVchX option do case $option in - i) INFILE="$OPTARG" ;; - t) TREEFILE="$OPTARG" ;; - m) MODEL="$OPTARG" ;; - d) DELTA=$OPTARG ;; - s) SPR=true ;; - n) MAX=$OPTARG ;; - o) OUTFILE="$OPTARG" ;; - w) TMP_DIR="$OPTARG" ;; - p) NTHREADS=$OPTARG ;; - c) dcheck ;; - v) VERBOSE=true ;; - V) MARGINAL=true ;; - X) DEBUG=true ;; - h) mandoc ; exit 0 ;; - \?) mandoc ; exit 1 ;; + i) INFILE="$OPTARG" ;; + t) TREEFILE="$OPTARG" ;; + f) FORESTFILE="$OPTARG" ;; + m) MODEL="$OPTARG" ;; + d) DELTA=$OPTARG ;; + s) SPR=true ;; + n) MAX=$OPTARG ;; + o) OUTFILE="$OPTARG" ;; + w) TMP_DIR="$OPTARG" ;; + p) NTHREADS=$OPTARG ;; + c) dcheck ;; + v) VERBOSE=true ;; + V) MARGINAL=true ;; + X) DEBUG=true ;; + h) mandoc ; exit 0 ;; + \?) mandoc ; exit 1 ;; esac done @@ -373,18 +379,24 @@ dcontrol; $MARGINAL && VERBOSE=true; $DEBUG && VERBOSE=true; -[ "$INFILE" == "$NA" ] && echoxit "[ERROR] infile not specified (option -i)" ; -[ ! -e $INFILE ] && echoxit "[ERROR] specified infile does not exist (option -i): $INFILE" ; -[ ! -s $INFILE ] && echoxit "[ERROR] empty infile (option -i): $INFILE" ; -[ ! -r $INFILE ] && echoxit "[ERROR] no read permission (option -i): $INFILE" ; - -[ "$TREEFILE" == "$NA" ] && echoxit "[ERROR] treefile not specified (option -t)" ; -[ ! -e $TREEFILE ] && echoxit "[ERROR] specified treefile does not exist (option -t): $TREEFILE" ; -[ ! -s $TREEFILE ] && echoxit "[ERROR] empty treefile (option -t): $TREEFILE" ; -[ ! -r $TREEFILE ] && echoxit "[ERROR] no read permission (option -t): $TREEFILE" ; +[ "$INFILE" == "$NA" ] && echoxit "[ERROR] infile not specified (option -i)" ; +[ ! -e $INFILE ] && echoxit "[ERROR] specified infile does not exist (option -i): $INFILE" ; +[ ! -s $INFILE ] && echoxit "[ERROR] empty infile (option -i): $INFILE" ; +[ ! -r $INFILE ] && echoxit "[ERROR] no read permission (option -i): $INFILE" ; -n=$(grep -o -F ";" $TREEFILE | wc -l); -[ $n -eq 0 ] && echoxit "[ERROR] incorrect treefile (option -t): $TREEFILE" ; +if [ "$FORESTFILE" != "$NA" ] +then + [ ! -e $FORESTFILE ] && echoxit "[ERROR] specified file does not exist (option -f): $FORESTFILE" ; + [ ! -s $FORESTFILE ] && echoxit "[ERROR] empty file (option -f): $FORESTFILE" ; + [ ! -r $FORESTFILE ] && echoxit "[ERROR] no read permission (option -i): $FORESTFILE" ; +else + [ "$TREEFILE" == "$NA" ] && echoxit "[ERROR] treefile not specified (option -t)" ; + [ ! -e $TREEFILE ] && echoxit "[ERROR] specified treefile does not exist (option -t): $TREEFILE" ; + [ ! -s $TREEFILE ] && echoxit "[ERROR] empty treefile (option -t): $TREEFILE" ; + [ ! -r $TREEFILE ] && echoxit "[ERROR] no read permission (option -t): $TREEFILE" ; + n=$(grep -o -F ";" $TREEFILE | wc -l); + [ $n -eq 0 ] && echoxit "[ERROR] incorrect treefile (option -t): $TREEFILE" ; +fi [ "$MODEL" == "$NA" ] && echoxit "[ERROR] model not specified (option -m)" ; @@ -481,129 +493,194 @@ then echo "> Bash: $BASH_VERSION" ; echo ; echo "data file: $INFILE" ; - echo "tree file: $TREEFILE" ; - echo "model: $MODEL" ; - echo "Delta: $DELTA" ; - echo "max forest size: $MAX" ; - echo "no. threads: $NTHREADS"; fi -# copying input files -STEP=1; -cp $INFILE $ALN ; -sed 's/)[0-9\.eE-]*/)/g' $TREEFILE | - tr -d '\n' | - sed 's/;/;\n/g' > $MLTREE ; +if [ "$FORESTFILE" != "$NA" ] ##### <=== option -f +then + + if $VERBOSE + then + echo "forest file: $FORESTFILE" ; + echo "model: $MODEL" ; + echo "Delta: $DELTA" ; + echo "max forest size: $MAX" ; + echo "no. threads: $NTHREADS"; + fi -# checking no. sequences and leaves -na=$(grep "^>" $ALN | wc -l); -while read t -do - nt=$(tr '(,' '\n' <<<"$t" | grep -c -v "^$"); + # copying input files + STEP=1; + cp $INFILE $ALN ; + cp $FORESTFILE $FOREST ; + + # getting the ML tree + sed -n '1p;q' $FOREST > $TMP ; + BESTLK=$($TAWK '(NR==1){print$1}' $TMP); + $TAWK '(NR==1){print$2}' $TMP > $MLTREE ; + + # checking no. sequences and leaves + na=$(grep "^>" $ALN | wc -l); + nt=$(tr '(,' '\n' < $MLTREE | grep -c -v "^$"); if [ $nt -ne $na ] then finalize ; - echoxit "[ERROR] different taxon numbers (options -i and -t): $na != $nt" ; + echoxit "[ERROR] different taxon numbers (options -i and -f): $na != $nt" ; fi -done < $MLTREE -# no. input trees -nt=$(grep -o -F ";" $MLTREE | wc -l); + if $VERBOSE + then + f=$(grep -o -F ";" $FOREST | wc -l); + echo "forest size: $f"; + echo "no. taxa: $na"; + if $SPR ; then ngb="SPR"; else ngb="NNI"; fi + echo "neighborhood: $ngb"; + if $SPR ; then s=$(( 2 * ($na - 3) * (2 * $na - 7) )); else s=$(( 2 * $na - 6 )); fi + echo "neighborhood size: $s"; + echo ; + fi -if $VERBOSE -then - echo "no. taxa: $na"; - echo "no. tree(s): $nt" ; -fi + # printing stat header + $MARGINAL && echo -e "#step\tviewed\tqueued\tkept\tbest log-lk\tlog sum lk" || echo -e "#step\tviewed\tqueued\tkept\tbest log-lk" ; + + # selecting trees + mv $FOREST $TMP ; + while IFS=$'\t' read -r llk tre + do + chk=$($BAWK -v d=$DELTA '($1-$2<=d){print"OK"}' <<<"$BESTLK $llk"); + [ "$chk" == "OK" ] && echo -e "$llk\t$tre" >> $FOREST ; + hashtree $tre ; + done < $TMP > $HASHSET ; + rm $TMP ; + [ "$OUTFILE" != "/dev/stdout" ] && cat $FOREST > $OUTFILE ; + +else ############################## <=== option -t + + if $VERBOSE + then + echo "tree file: $TREEFILE" ; + echo "model: $MODEL" ; + echo "Delta: $DELTA" ; + echo "max forest size: $MAX" ; + echo "no. threads: $NTHREADS"; + fi -# deduplicating multiple input trees (if any) -if [ $nt -gt 1 ] # multiple input trees -then - # gathering ML tree (the first) - tml="$(sed -n '1p;q' $MLTREE)"; - hml="$(hashtree $tml)"; - # discarding duplicate trees (if any) - while read tre ; do echo -e "$(hashtree $tre)\t$tre" ; done < $MLTREE | - grep -v -F "$hml" | - sort | - $TAWK '($1==p){next} {p=$1;print$2}' > $TMP ; - # getting ML tree + remaining distinct tree(s) - { echo "$tml" ; cat $TMP ; } > $MLTREE ; - rm -f $TMP ; -fi + # copying input files + STEP=1; + cp $INFILE $ALN ; + sed 's/)[0-9\.eE-]*/)/g' $TREEFILE | + tr -d '\n' | + sed 's/;/;\n/g' > $MLTREE ; -# no. distinct trees -nt=$(grep -o -F ";" $MLTREE | wc -l); + # checking no. sequences and leaves + na=$(grep "^>" $ALN | wc -l); + while read t + do + nt=$(tr '(,' '\n' <<<"$t" | grep -c -v "^$"); + if [ $nt -ne $na ] + then + finalize ; + echoxit "[ERROR] different taxon numbers (options -i and -t): $na != $nt" ; + fi + done < $MLTREE -if $VERBOSE -then - echo "no. distinct tree(s): $nt" ; - if $SPR ; then ngb="SPR"; else ngb="NNI"; fi - echo "neighborhood: $ngb"; - if $SPR ; then s=$(( 2 * ($na - 3) * (2 * $na - 7) )); else s=$(( 2 * $na - 6 )); fi - echo "neighborhood size: $s"; - echo ; -fi + # no. input trees + nt=$(grep -o -F ";" $MLTREE | wc -l); + + if $VERBOSE + then + echo "no. taxa: $na"; + echo "no. tree(s): $nt" ; + fi + + # deduplicating multiple input trees (if any) + if [ $nt -gt 1 ] # multiple input trees + then + # gathering ML tree (the first) + tml="$(sed -n '1p;q' $MLTREE)"; + hml="$(hashtree $tml)"; + # discarding duplicate trees (if any) + while read tre ; do echo -e "$(hashtree $tre)\t$tre" ; done < $MLTREE | + grep -v -F "$hml" | + sort | + $TAWK '($1==p){next} {p=$1;print$2}' > $TMP ; + # getting ML tree + remaining distinct tree(s) + { echo "$tml" ; cat $TMP ; } > $MLTREE ; + rm -f $TMP ; + fi + + # no. distinct trees + nt=$(grep -o -F ";" $MLTREE | wc -l); -# printing stat header -$MARGINAL && echo -e "#step\tviewed\tqueued\tkept\tbest log-lk\tlog sum lk" || echo -e "#step\tviewed\tqueued\tkept\tbest log-lk" ; + if $VERBOSE + then + echo "no. distinct tree(s): $nt" ; + if $SPR ; then ngb="SPR"; else ngb="NNI"; fi + echo "neighborhood: $ngb"; + if $SPR ; then s=$(( 2 * ($na - 3) * (2 * $na - 7) )); else s=$(( 2 * $na - 6 )); fi + echo "neighborhood size: $s"; + echo ; + fi + + # printing stat header + $MARGINAL && echo -e "#step\tviewed\tqueued\tkept\tbest log-lk\tlog sum lk" || echo -e "#step\tviewed\tqueued\tkept\tbest log-lk" ; -# first processing of the input tree(s) -if [ $nt -eq 1 ] # <=== one input tree -then + # first processing of the input tree(s) + if [ $nt -eq 1 ] # <=== one input tree + then - # building neighborhood of the input tree - { cat $MLTREE ; cat $MLTREE ; # twice because incorrect log-lk format for tree 1 when using IQ-TREE v2.3.6 - sed 's/:[0-9\.eE-]*//g;s/)[0-9\.eE-]*/)/g' $MLTREE | bash -c "$NEIGHBORHOOD" ; } > $INTREES ; - -else # <=== multiple input trees - - mv $MLTREE $TMP ; - # getting the first tree - sed -n '1p;q' $TMP > $MLTREE ; - tml="$(cat $MLTREE)"; - hml="$(hashtree $tml)"; - # building neighborhood of the ML tree - sed 's/:[0-9\.eE-]*//g;s/)[0-9\.eE-]*/)/g' $MLTREE | bash -c "$NEIGHBORHOOD" > $INTREES ; - # adding other trees - cat $TMP >> $INTREES ; - # discarding duplicate trees (if any) and the ML one - while read tre ; do echo -e "$(hashtree $tre)\t$tre" ; done < $INTREES | - grep -v -F "$hml" | - sort | - $TAWK '($1==p){next} {p=$1;print$2}' > $TMP ; - # gathering trees - { cat $MLTREE ; # twice because incorrect log-lk format - cat $MLTREE ; # for tree 1 when using IQ-TREE v2.3.6 - cat $TMP ; } > $INTREES ; - rm -f $TMP ; + # building neighborhood of the input tree + { cat $MLTREE ; cat $MLTREE ; # twice because incorrect log-lk format for tree 1 when using IQ-TREE v2.3.6 + sed 's/:[0-9\.eE-]*//g;s/)[0-9\.eE-]*/)/g' $MLTREE | bash -c "$NEIGHBORHOOD" ; } > $INTREES ; + + else # <=== multiple input trees + + mv $MLTREE $TMP ; + # getting the first tree + sed -n '1p;q' $TMP > $MLTREE ; + tml="$(cat $MLTREE)"; + hml="$(hashtree $tml)"; + # building neighborhood of the ML tree + sed 's/:[0-9\.eE-]*//g;s/)[0-9\.eE-]*/)/g' $MLTREE | bash -c "$NEIGHBORHOOD" > $INTREES ; + # adding other trees + cat $TMP >> $INTREES ; + # discarding duplicate trees (if any) and the ML one + while read tre ; do echo -e "$(hashtree $tre)\t$tre" ; done < $INTREES | + grep -v -F "$hml" | + sort | + $TAWK '($1==p){next} {p=$1;print$2}' > $TMP ; + # gathering trees + { cat $MLTREE ; # twice because incorrect log-lk format + cat $MLTREE ; # for tree 1 when using IQ-TREE v2.3.6 + cat $TMP ; } > $INTREES ; + rm -f $TMP ; -fi + fi -# estimating likelihood of every NNI tree -$IQTREE -s $ALN -m $MODEL -te $MLTREE --trees $INTREES -pre $PREFIXNAME ; + # estimating likelihood of every NNI tree + $IQTREE -s $ALN -m $MODEL -te $MLTREE --trees $INTREES -pre $PREFIXNAME ; -# processing outtrees -sed 1d $OUTTREES | - sed 's/.* lh=-//g' | - tr -d ']' | - sort -g | - $BAWK '{printf("%.5f\t%s\n", 0-$1, $2)}' > $TMP ; -rm -f $PREFIXNAME.* ; + # processing outtrees + sed 1d $OUTTREES | + sed 's/.* lh=-//g' | + tr -d ']' | + sort -g | + $BAWK '{printf("%.5f\t%s\n", 0-$1, $2)}' > $TMP ; + rm -f $PREFIXNAME.* ; -# getting the ML tree -BESTLK=$($BAWK '(NR==1){print$1}' $TMP); -$BAWK '(NR==1){print$2}' $TMP> $MLTREE ; + # getting the ML tree + BESTLK=$($BAWK '(NR==1){print$1}' $TMP); + $BAWK '(NR==1){print$2}' $TMP> $MLTREE ; -# selecting trees -while IFS=$'\t' read -r llk tre -do - chk=$($BAWK -v d=$DELTA '($1-$2<=d){print"OK"}' <<<"$BESTLK $llk"); - [ "$chk" == "OK" ] && echo -e "$llk\t$tre" >> $FOREST ; - hashtree $tre ; -done < $TMP > $HASHSET ; -rm $TMP ; -[ "$OUTFILE" != "/dev/stdout" ] && cat $FOREST > $OUTFILE ; + # selecting trees + while IFS=$'\t' read -r llk tre + do + chk=$($BAWK -v d=$DELTA '($1-$2<=d){print"OK"}' <<<"$BESTLK $llk"); + [ "$chk" == "OK" ] && echo -e "$llk\t$tre" >> $FOREST ; + hashtree $tre ; + done < $TMP > $HASHSET ; + rm $TMP ; + [ "$OUTFILE" != "/dev/stdout" ] && cat $FOREST > $OUTFILE ; + +fi # init queue sed 1d $FOREST |