Skip to content
Snippets Groups Projects
Commit b62cceaf authored by Alexis  CRISCUOLO's avatar Alexis CRISCUOLO :black_circle:
Browse files

updating README

parent a63d0d9a
No related branches found
No related tags found
No related merge requests found
......@@ -21,7 +21,7 @@ _JolyTree_ runs on UNIX, Linux and most OS X operating systems.
git clone https://gitlab.pasteur.fr/GIPhy/JolyTree.git
```
**C.** If at least one of the four required binaries (step A) is not available on your `$PATH` variable, edit the file `JolyTree.sh` and indicate the local path to the mash, gawk, FastME and/or REQ binary(ies) (approximately between lines 100 and 200):
**C.** If at least one of the four required binaries (step A) is not available on your `$PATH` variable, edit the file `JolyTree.sh` and indicate the local path to the `mash`, `gawk`, `FastME` and/or `REQ` binary(ies) (approximately between lines 100 and 200):
```bash
#############################################################################################################
......@@ -130,52 +130,49 @@ Launch _JolyTree_ without option to read the following documentation:
## Example
In order to illustrate the usefulness of _jolyTree_ and to describe its output files, the following use case example describes its usage for inferring an exploratory phylogenetic tree of _Klebsiella_ genomes.
In order to illustrate the usefulness of _JolyTree_ and to describe its output files, the following use case example describes its usage for inferring a phylogenetic tree of _Klebsiella_ genomes derived from the analysis of [Rodrigues et al. (2019)](https://doi.org/10.1016/j.resmic.2019.02.003).
##### Downloading genome sequences
The following command lines allows downloading the genome sequences of 39 _Klebsiella_ species from the [NCBI genome repository](https://www.ncbi.nlm.nih.gov/genome) inside a directory named _genomes_:
The following [Bash](https://www.gnu.org/software/bash/) command lines allows the genome sequences of 40 _Klebsiella_ species (36 belonging to the _Klebsiella pneumoniae_ complex –Kp1 to Kp7– and 4 outgroup species –Kog–) to be downloaded from the [NCBI genome repository](https://www.ncbi.nlm.nih.gov/genome) inside a directory named _genomes_:
```bash
mkdir genomes/ ;
EUTILS="wget -q -O- https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&rettype=fasta&id=";
NCBIFTP="wget -q -O- https://ftp.ncbi.nlm.nih.gov/sra/wgs_aux/"; Z=".1.fsa_nt.gz";
t="K.pneumoniae";
echo -e "HS11286 CP003200\nNTUH-K2044 AP006725\nMGH78578 CP000647\nKCTC2242 CP002910\nATCCBAA-2146 CP006659\nCAV1217 CP018676" |
while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
t="K.oxytoca";
echo -e "CAV1374 CP011636\nKONIH1 CP008788\nAR_0147 CP020358\nFDAARGOS_335 CP027426\nJKo3 AP014951\nAR380 CP029128" |
while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
t="K.aerogenes";
echo -e "KCTC2190 CP002824\nEA1509E FO203355\nG7 CP011539\nAR_0062 CP026756\nCAV1320 CP011574\nFDAARGOS_139 CP014748" |
while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
t="K.quasipneumoniae";
echo -e "ATCC700603 CP014696\nHKUOPA4 CP014154\nKPC142 CP023478\nMGH44 AYIV01\nSKLX2781 LYWP01\nCCBH16302 MDCA01" |
while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
t="K.variicola";
echo -e "At-22 CP001891\nDSM15968 CP010523\nGJ2 CP017849\nWCHKP19 CP028555\nBIDMC88 LFBA01" |
while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
t="K.quasivariicola";
echo -e "KPN1705 CP022823\n10982 AKYX01\nPO552 NFVM01\nVRCO0126 FWGJ01\nVRCO0168 FWNZ01" |
while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
t="K.pneumoniae";
echo -e "ATCC13883 JOOW01\nMGH78578 CP000647\nKp13 CP003999\nNTUH-K2044 AP006725" |
while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
t="K.quasipneumoniae.subsp.quasipneumoniae";
echo -e "01A030 CCDF01" |
while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
t="K.variicola";
echo -e "342 CP000964\nAt-22 CP001891" |
while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
t="K.quasipneumoniae.subsp.similipneumoniae";
echo -e "07A044 CBZR01" |
while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
t="K.quasivariicola";
echo -e "10982 AKYX01\nKPN1705 CP022823" |
while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
NCBIFTP="wget -q -O- https://ftp.ncbi.nlm.nih.gov/sra/wgs_aux/";
A1='^[A-Z]{2}[0-9]*$'; A2='^[A-Z]{6}[0-9]*$'; Z=".1.fsa_nt.gz";
t="Kp1-K.pneumoniae";
echo -e "SB4-2\tCAAHFS01\nATCC13883_T\tJOOW01\nMGH78578\tCP000647\nSB1139\tCAAHFT01\n5-2\tCAAHGI01\n04A025\tCAAHFZ01\n2-3\tCAAHGH01\nKp13\tCP003999\nNTUH-K2044\tAP006725\nBJ1-GA\tCAAHGC01" |
while read -r s a; do echo $t.$s;([[ $a =~ $A1 ]]&&$EUTILS$a||$NCBIFTP${a:0:2}/${a:2:2}/$([[ $a =~ $A2 ]]&&echo ${a:4:2}/)$a/$a$Z|zcat)>genomes/$t.$s.fa; done
t="Kp2-K.quasipneumoniae.subsp.quasipneumoniae";
echo -e "01A030_T\tCCDF01\nSB1124\tCAAHFU01\nU41\tCAAHGA01\n18A069\tCAAHGF01\n0320584\tCAAHGK01" |
while read -r s a; do echo $t.$s;([[ $a =~ $A1 ]]&&$EUTILS$a||$NCBIFTP${a:0:2}/${a:2:2}/$([[ $a =~ $A2 ]]&&echo ${a:4:2}/)$a/$a$Z|zcat)>genomes/$t.$s.fa; done
t="Kp3-K.variicola";
echo -e "01A065\tCAAHFX01\nF2R9_T\tCAAHGE01\n342\tCP000964\nAt-22\tCP001891" |
while read -r s a; do echo $t.$s;([[ $a =~ $A1 ]]&&$EUTILS$a||$NCBIFTP${a:0:2}/${a:2:2}/$([[ $a =~ $A2 ]]&&echo ${a:4:2}/)$a/$a$Z|zcat)>genomes/$t.$s.fa; done
t="Kp4-K.quasipneumoniae.subsp.similipneumoniae";
echo -e "09A323\tCAAHFV01\n12A476\tCAAHFY01\n07A044_T\tCBZR01\nCIP110288\tCAAHGD01\n1-1\tCAAHGG01" |
while read -r s a; do echo $t.$s;([[ $a =~ $A1 ]]&&$EUTILS$a||$NCBIFTP${a:0:2}/${a:2:2}/$([[ $a =~ $A2 ]]&&echo ${a:4:2}/)$a/$a$Z|zcat)>genomes/$t.$s.fa; done
t="Kp5-K.variicola.subsp.tropicalensis";
echo -e "CDC4241-71\tCAAHGJ01\n814\tCAAHGL01\n885\tCAAHGM01\n1266_T\tCAAHGN01\n1283\tCAAHGO01\n1375\tCAAHGP01" |
while read -r s a; do echo $t.$s;([[ $a =~ $A1 ]]&&$EUTILS$a||$NCBIFTP${a:0:2}/${a:2:2}/$([[ $a =~ $A2 ]]&&echo ${a:4:2}/)$a/$a$Z|zcat)>genomes/$t.$s.fa; done
t="Kp6-K.quasivariicola";
echo -e "08A119\tCAAHGB01\n10982\tAKYX01\nKPN1705\tCP022823\n01-467-2ECBU\tCAAHGR01\n01-310MBV\tCAAHGS01" |
while read -r s a; do echo $t.$s;([[ $a =~ $A1 ]]&&$EUTILS$a||$NCBIFTP${a:0:2}/${a:2:2}/$([[ $a =~ $A2 ]]&&echo ${a:4:2}/)$a/$a$Z|zcat)>genomes/$t.$s.fa; done
t="Kp7-K.africanensis";
echo -e "200023\tCAAHGQ01" |
while read -r s a; do echo $t.$s;([[ $a =~ $A1 ]]&&$EUTILS$a||$NCBIFTP${a:0:2}/${a:2:2}/$([[ $a =~ $A2 ]]&&echo ${a:4:2}/)$a/$a$Z|zcat)>genomes/$t.$s.fa; done
t="Kog-K";
echo -e "oxytoca.ATCC13182\tCAAHFW01\naerogenes.ATCC13048\tQVMZ01\ngrimontii.06D021\tFZTC01\nmichiganensis.DSM25444\tPRDB01" |
while read -r s a; do echo $t.$s;([[ $a =~ $A1 ]]&&$EUTILS$a||$NCBIFTP${a:0:2}/${a:2:2}/$([[ $a =~ $A2 ]]&&echo ${a:4:2}/)$a/$a$Z|zcat)>genomes/$t.$s.fa; done
```
##### Launching _jolyTree_
......@@ -205,6 +202,8 @@ Nei M, Kumar S (2000) Molecular Evolution and Phylogenetics. Oxford University P
Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM (2016) Mash: fast genome and metagenome distance estimation using MinHash. Genome Biology, 17(1):132. [doi:10.1186/s13059-016-0997-x](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x).
Rodrigues C, Passet V, Rakotondrasoa A, Abdoulaye Diallo T, Criscuolo A, Brisse S (2019) Description of _Klebsiella africanensis_ sp. nov., _Klebsiella variicola_ subsp. _tropicalensis_ subsp. nov. and _Klebsiella variicola_ subsp. _variicola_ subsp. nov. Research in Microbiology. [doi:10.1016/j.resmic.2019.02.003](https://doi.org/10.1016/j.resmic.2019.02.003).
Tajima F, Nei M (1982) Biases of the estimates of DNA divergence obtained by the restriction enzyme technique. Journal of Molecular Evolution, 18(2):115-120. [doi:10.1007/BF01810830](https://link.springer.com/article/10.1007/BF01810830).
Tajima F, Nei M (1984) Estimation of evolutionary distance between nucleotide sequences. Molecular Biology and Evolution, 1(3):269-285. [doi:10.1093/oxfordjournals.molbev.a040317](https://academic.oup.com/mbe/article/1/3/269/1244029).
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment