diff --git a/COPYING b/COPYING
new file mode 100644
index 0000000000000000000000000000000000000000..be3f7b28e564e7dd05eaf59d64adba1a4065ac0e
--- /dev/null
+++ b/COPYING
@@ -0,0 +1,661 @@
+                    GNU AFFERO GENERAL PUBLIC LICENSE
+                       Version 3, 19 November 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The GNU Affero General Public License is a free, copyleft license for
+software and other kinds of works, specifically designed to ensure
+cooperation with the community in the case of network server software.
+
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+our General Public Licenses are intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+  Developers that use our General Public Licenses protect your rights
+with two steps: (1) assert copyright on the software, and (2) offer
+you this License which gives you legal permission to copy, distribute
+and/or modify the software.
+
+  A secondary benefit of defending all users' freedom is that
+improvements made in alternate versions of the program, if they
+receive widespread use, become available for other developers to
+incorporate.  Many developers of free software are heartened and
+encouraged by the resulting cooperation.  However, in the case of
+software used on network servers, this result may fail to come about.
+The GNU General Public License permits making a modified version and
+letting the public access it on a server without ever releasing its
+source code to the public.
+
+  The GNU Affero General Public License is designed specifically to
+ensure that, in such cases, the modified source code becomes available
+to the community.  It requires the operator of a network server to
+provide the source code of the modified version running there to the
+users of that server.  Therefore, public use of a modified version, on
+a publicly accessible server, gives the public access to the source
+code of the modified version.
+
+  An older license, called the Affero General Public License and
+published by Affero, was designed to accomplish similar goals.  This is
+a different license, not a version of the Affero GPL, but Affero has
+released a new version of the Affero GPL which permits relicensing under
+this license.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                       TERMS AND CONDITIONS
+
+  0. Definitions.
+
+  "This License" refers to version 3 of the GNU Affero General Public License.
+
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+  1. Source Code.
+
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+  The Corresponding Source for a work in source code form is that
+same work.
+
+  2. Basic Permissions.
+
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+  4. Conveying Verbatim Copies.
+
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+  5. Conveying Modified Source Versions.
+
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+  6. Conveying Non-Source Forms.
+
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+  7. Additional Terms.
+
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+  8. Termination.
+
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+  9. Acceptance Not Required for Having Copies.
+
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+  10. Automatic Licensing of Downstream Recipients.
+
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+  11. Patents.
+
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+  12. No Surrender of Others' Freedom.
+
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+  13. Remote Network Interaction; Use with the GNU General Public License.
+
+  Notwithstanding any other provision of this License, if you modify the
+Program, your modified version must prominently offer all users
+interacting with it remotely through a computer network (if your version
+supports such interaction) an opportunity to receive the Corresponding
+Source of your version by providing access to the Corresponding Source
+from a network server at no charge, through some standard or customary
+means of facilitating copying of software.  This Corresponding Source
+shall include the Corresponding Source for any work covered by version 3
+of the GNU General Public License that is incorporated pursuant to the
+following paragraph.
+
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the work with which it is combined will remain governed by version
+3 of the GNU General Public License.
+
+  14. Revised Versions of this License.
+
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU Affero General Public License from time to time.  Such new versions
+will be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU Affero General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU Affero General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+  If the Program specifies that a proxy can decide which future
+versions of the GNU Affero General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+  15. Disclaimer of Warranty.
+
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+  16. Limitation of Liability.
+
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+  17. Interpretation of Sections 15 and 16.
+
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+  If your software can interact with users remotely through a computer
+network, you should also make sure that it provides a way for users to
+get its source.  For example, if your program is a web application, its
+interface could display a "Source" link that leads users to an archive
+of the code.  There are many ways you could offer source, and different
+solutions will be better for different programs; see section 13 for the
+specific requirements.
+
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU AGPL, see
+<https://www.gnu.org/licenses/>.
diff --git a/JolyTree.sh b/JolyTree.sh
new file mode 100755
index 0000000000000000000000000000000000000000..2c5c57048492e0e2175b1082924aa9f0043e9932
--- /dev/null
+++ b/JolyTree.sh
@@ -0,0 +1,451 @@
+#!/bin/bash
+
+#############################################################################################################
+#                                                                                                           #
+# JolyTree: fast distance-based phylogenetic inference from unaligned genome sequences                      #
+#                                                                                                           #
+# Copyright (C) 2017,2018,2019  Alexis Criscuolo                                                            #
+#                                                                                                           #
+# This program  is free software:  you can  redistribute it  and/or modify it  under the terms  of the GNU  #
+# General Public License as published by the Free Software Foundation, either version 3 of the License, or  #
+# (at your option) any later version.                                                                       #
+#                                                                                                           #
+# This program is distributed in the hope that it will be useful,  but WITHOUT ANY WARRANTY;  without even  #
+# the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public  #
+# License for more details.                                                                                 #
+#                                                                                                           #
+# You should have received a copy of the  GNU General Public License along with this program.  If not, see  #
+# <http://www.gnu.org/licenses/>.                                                                           #
+#                                                                                                           #
+#  Contact:                                                                                                 #
+#  Institut Pasteur                                                                                         #
+#  Bioinformatics and Biostatistics Hub                                                                     #
+#  C3BI, USR 3756 IP CNRS                                                                                   #
+#  Paris, FRANCE                                                                                            #
+#                                                                                                           #
+#  alexis.criscuolo@pasteur.fr                                                                              #
+#                                                                                                           #
+#############################################################################################################
+
+#############################################################################################################
+#                                                                                                           #
+# ============                                                                                              #
+# = VERSIONS =                                                                                              #
+# ============                                                                                              #
+#                                                                                                           #
+  VERSION=1.1.181205ac                                                                                      #
+# + option -q to set desired probability of observing a random k-mer                                        #
+# + ability to be run on clusters managed by SLURM                                                          #
+#                                                                                                           #
+# VERSION=1.0.180115ac                                                                                      #
+# + option -n to only estimate evolutionary distances                                                       #
+# + option -r to set the number of iterations when performing the ratchet-based BME tree search             #
+# + important bug fixed when sorting the input file names                                                   #
+# + reimplementation of the F81 distance estimation                                                         #
+#                                                                                                           #
+# VERSION=0.8.171207ac                                                                                      #
+# + k-mer size could be set by the user                                                                     #
+# + bug fixed for manual tbl estimation                                                                     #
+# + .fas allowed inside the input directory                                                                 #
+#                                                                                                           #
+# VERSION=0.7.170919ac                                                                                      #
+# + tree output file suffix is now .nwk                                                                     #
+#                                                                                                           #
+# VERSION=0.6.170728ac                                                                                      #
+# + implements the F81  transformation suggested  by Tamura & Kumar (2002)  in order to deal with putative  #
+#   heterogeneous substitution pattern among lineages                                                       #
+#                                                                                                           #
+# VERSION=0.5.170727ac                                                                                      #
+# + automatic estimation of the k-mer size                                                                  #
+#                                                                                                           #
+# VERSION=0.4.170726ac                                                                                      #
+# + precomputed pairwise p-distances could be used (option -d)                                              #
+# + no limit with the length of the input FASTA filenames                                                   #
+#                                                                                                           #
+# VERSION=0.3.170724ac                                                                                      #
+# + implements the F81 transformation when at least one p-distance is larger than a specified cutoff        #
+#                                                                                                           #
+# VERSION=0.2.170721ac                                                                                      #
+# + uses FastME 2.1.5.1                                                                                     #
+#                                                                                                           #
+#############################################################################################################
+
+#############################################################################################################
+#                                                                                                           #
+# ============                                                                                              #
+# = DOC      =                                                                                              #
+# ============                                                                                              #
+#                                                                                                           #
+  if [ "$1" = "-?" ] || [ "$1" = "-h" ] || [ $# -le 1 ]                                                     #
+  then                                                                                                      #
+    cat << EOF                                                       
+
+ JolyTree v.$VERSION
+
+ USAGE:
+    JolyTree.sh  [options]
+ where:
+    -i <directory>  a directory name containing FASTA-formatted contig files; only files
+                    ending with .fa, .fna, .fas or .fasta will be considered (mandatory)
+    -b <basename>   the basename of every written output file (mandatory)
+    -s <int>        the sketch size when preprocessing contig files (default: automatic)
+    -q <double>     desired probability of observing a random k-mer (default: 0.0001)
+    -k <int>        the k-mer size when  preprocessing contig files  (default: estimated
+                    from the average genome size with option -q)
+    -c <real>       if at least one of the estimated p-distances is above this specified 
+                    cutoff, then a F81 transformation will be performed (default: 0.1)
+    -n              no BME tree inference (only pairwise distance estimation)
+    -r <int>        number of steps  when performing the  ratchet-based  BME tree search
+                    (default: 100)
+    -t <int>        number of threads (default: 2)
+
+EOF
+    exit 1 ;                                                                                                #
+  fi                                                                                                        #
+#                                                                                                           #
+#############################################################################################################
+
+  
+#############################################################################################################
+#                                                                                                           #
+# ================                                                                                          #
+# = INSTALLATION =                                                                                          #
+# ================                                                                                          #
+#                                                                                                           #
+# [1] REQUIREMENTS =======================================================================================  #
+#  JolyTree depends on Mash, gawk, FastME and REQ (see below),  each with a minimum version required. You   #
+#  should have them installed on your computer prior to using JolyTree.  Make sure that each is installed   #
+#  on your $PATH variable, or specify below the full path to each of them.                                  #
+#                                                                                                           #
+# -- Mash: fast pairwise p-distance estimation --------------------------------------------------------     #
+#    VERSION >= 1.0.2                                                                                       #
+#    src: github.com/marbl/Mash                                                                             #
+#    Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM (2016) Mash: fast     #
+#      genome  and  metagenome  distance  estimation  using  MinHash.   Genome  Biology,  17:132.  doi:     #
+#      10.1186/s13059-016-0997-x                                                                            #
+#                                                                 ################################################
+                                                                  ################################################
+  MASH=mash;                                                      ## <=== WRITE HERE THE PATH TO THE MASH       ##
+                                                                  ##      BINARY (VERSION 1.0.2 MINIMUM)        ##
+                                                                  ################################################
+                                                                  ################################################
+#                                                                                                           #
+# -- gawk: fast text file processing ------------------------------------------------------------------     #
+#    VERSION >= 4.1.0                                                                                       #
+#    src: ftp.gnu.org/gnu/gawk                                                                              #
+#    Robbins AD  (2018)  GAWK:  Effective AWK Programming -- A User’s Guide  for GNU Awk  (Edition 4.2)     #
+#      www.gnu.org/software/gawk/manual                                                                     #
+#                                                                 ################################################
+                                                                  ################################################
+  GAWK=gawk;                                                      ## <=== WRITE HERE THE PATH TO THE GAWK       ##
+                                                                  ##      BINARY (VERSION 4.1.0 MINIMUM)        ##
+                                                                  ################################################
+                                                                  ################################################
+#                                                                                                           #
+# -- FastME: fast distance-based phylogenetic tree inference ------------------------------------------     #
+#    VERSION >= 2.1.5.1                                                                                     #
+#    src: gite.lirmm.fr/atgc/FastME/                                                                        #
+#    Lefort V, Desper R, Gascuel O  (2015)  FastME 2.0:  a comprehensive, accurate,  and fast distance-     #
+#      based  phylogeny  inference  program.  Molecular Biology and Evolution,  32(10):2798–2800.  doi:     #
+#      10.1093/molbev/msv150                                                                                #
+#                                                                 ################################################
+                                                                  ################################################
+  FASTME=fastme;                                                  ## <=== WRITE HERE THE PATH TO THE FASTME     ##
+                                                                  ##      BINARY (VERSION 2.1.5.1 MINIMUM)      ##
+                                                                  ################################################
+                                                                  ################################################
+#                                                                                                           #
+# -- REQ: fast computation of the rates of elementary quartets ----------------------------------------     #
+#    VERSION >= 1.2                                                                                         #
+#    src: gitlab.pasteur.fr/GIPhy/REQ                                                                       #
+#    Guenoche A, Garreta H (2001) Can we have confidence in a tree representation. In: Gascuel O, Sagot     #
+#      MF (eds) Computational Biology.  Lecture Notes in Computer Science, vol 2066.  Springer, Berlin,     #
+#      Heidelberg. doi:10.1007/3-540-45727-5_5                                                              #
+#                                                                 ################################################
+                                                                  ################################################
+  REQ=REQ;                                                        ## <=== WRITE HERE THE PATH TO THE REQ        ##
+                                                                  ##      BINARY (VERSION 1.2 MINIMUM)          ##
+                                                                  ################################################
+                                                                  ################################################
+#                                                                                                           #
+#                                                                                                           #
+#                                                                                                           #
+# [2] EXECUTE PERMISSION =================================================================================  #
+#  In order to run JolyTree, give the execute permission on the script JolyTree.sh:                         #
+#    chmod +x JolyTree.sh                                                                                   #
+#                                                                                                           #
+#                                                                                                           #
+#                                                                                                           #
+# [3] NOTES ON THE USE OF JOLYTREE WITH SLURM (slurm.schedmd.com) ========================================  #
+#  By default, JolyTree is able to perform the pairwise p-distance estimate step on multiple threads (the   #
+#  option -t  allows the  number of  threads  to  be specified).  The corresponding  pieces of  codes are   #
+#  therefore executed concurrently via the following standard 'launcher':                                   #
+#                                                                                                           #
+   EXEC="sh -c";                                                                                            #
+#                                                                                                           #
+#  It is therefore possible to use JolyTree on a computer that allows multiple threads to be executed. It   #
+#  is  also possible  to launch  JolyTree on  multiple  threads  on a  cluster managed  by Slurm  via the   #
+#  following command line models (with t = number of threads):                                              #
+#    srun   <Slurm options> -c $t  ./JolyTree.sh  <JolyTree options>  -t $t                                 #
+#    sbatch <Slurm options> -c $t  ./JolyTree.sh  <JolyTree options>  -t $t                                 #
+#  Moreover, it is also possible to launch JolyTree on  multiple cores on a cluster managed by Slurm. For   #
+#  this particular case, you should first uncomment the following line:                                     #
+#                                                                                                           #
+#  EXEC="srun -n 1 -N 1 $EXEC";                                                                             #
+#                                                                                                           #
+#  and launch JolyTree via the following command line models (with t = number of cores):                    #
+#    salloc <Slurm options> -n $t  ./JolyTree.sh  <JolyTree options>  -t $t                                 #
+#    sbatch <Slurm options> -n $t  ./JolyTree.sh  <JolyTree options>  -t $t                                 #
+#                                                                                                           #
+#############################################################################################################
+
+  
+#############################################################################################################
+#############################################################################################################
+#### INITIALIZING PARAMETERS AND READING OPTIONS                                                         ####
+#############################################################################################################
+#############################################################################################################
+
+if [ ! $(command -v $MASH) ];   then echo "$MASH not found"   >&2 ; exit 1 ; fi
+if [ ! $(command -v $GAWK) ];   then echo "$GAWK not found"   >&2 ; exit 1 ; fi
+if [ ! $(command -v $FASTME) ]; then echo "$FASTME not found" >&2 ; exit 1 ; fi
+if [ ! $(command -v $REQ) ];    then echo "$REQ not found"    >&2 ; exit 1 ; fi
+
+DATADIR="N.O.D.I.R";            # -i (mandatory)
+BASEFILE="N.O.B.A.S.E.F.I.L.E"; # -b (mandatory)
+
+SKETCH=0;                       # -s (auto from data)
+Q=0.00001;                      # -q (0.00001)
+K=0;                            # -k (auto from -q)
+CUTOFF=0.1;                     # -c (0.1)
+
+INFERTREE=true;                 # -n (none)
+RATCHET=100;                    # -r (100)
+RATCHET_LIMIT=200;              #    (static)
+
+NPROC=2;                        # -t (2)
+WAITIME=0.5;                    #    (auto from -t)
+
+while getopts :i:b:s:q:k:c:d:r:t:n option
+do
+  case $option in
+    i) DATADIR="$OPTARG"                                  ;;
+    b) BASEFILE="$OPTARG"                                 ;;
+    s) SKETCH=$OPTARG                                     ;;
+    q) Q=$OPTARG                                          ;;
+    k) K=$OPTARG                                          ;;
+    c) CUTOFF=$OPTARG                                     ;;
+    n) INFERTREE=false                                    ;;
+    r) RATCHET=$OPTARG                                    ;;
+    t) NPROC=$OPTARG                                      ;;
+    :) echo "option $OPTARG : missing argument" ; exit 1  ;;
+   \?) echo "$OPTARG : option invalide" ;         exit 1  ;;
+  esac
+done
+if [ "$DATADIR" == "N.O.D.I.R" ];             then echo "genome directory is not specified (option -i)" >&2 ; exit 1 ; fi
+if [ ! -e "$DATADIR" ];                       then echo "genome directory does not exist (option -i)"   >&2 ; exit 1 ; fi
+if [ ! -d "$DATADIR" ];                       then echo "$DATADIR is not a directory (option -i)"       >&2 ; exit 1 ; fi
+if [ "$BASEFILE" == "N.O.B.A.S.E.F.I.L.E" ];  then echo "basename is not specified (option -b)"         >&2 ; exit 1 ; fi
+if [ $SKETCH -ne 0 ] && [ $SKETCH -le 1000 ]; then echo "sketch size $SKETCH is too low (option -s)"    >&2 ; exit 1 ; fi
+
+### verifying the number of threads
+[ $NPROC -le 0 ] && NPROC=2;
+echo "$NPROC thread(s)" ;
+WAITIME=$($GAWK -v x=$NPROC 'BEGIN{print 1/sqrt(x)'});
+
+### gathering the genome list
+GLIST=$(ls $DATADIR/*.fna $DATADIR/*.fas $DATADIR/*.fa $DATADIR/*.fasta 2> /dev/null | sort);
+n=$(echo $GLIST | $GAWK '{print NF}');
+if [ $n -lt 4 ]; then echo "directory $DATADIR should contain at least 4 files *.fna, *.fas, *.fasta or *.fa" >&2 ; exit 1 ; fi
+echo "$n taxa" ;
+
+### creating output file names
+ACGT=$BASEFILE.acgt;    # ACGT content of each input genome
+OEPL=$BASEFILE.oepl;    # p-distance estimates in OEPL (One Entry Per Line) format 
+DFILE=$BASEFILE.d;      # evolutioanry distances in PHYLIP square format
+
+
+#############################################################################################################
+#############################################################################################################
+#### PREPROCESSING GENOMES                                                                               ####
+#############################################################################################################
+#############################################################################################################
+
+### estimating ACGT content
+rm -f $ACGT ; 
+for f in $GLIST
+do
+  echo "parsing $(basename ${f%.*})" >&2 ;
+  $EXEC "x=\$($GAWK '! /^>/{i=split(\$0,c,\"\");++i;while(--i>0)w[c[i]]++}END{print w[\"A\"]+w[\"a\"]\" \"w[\"C\"]+w[\"c\"]\" \"w[\"G\"]+w[\"g\"]\" \"w[\"T\"]+w[\"t\"]}' $f); flock -x $ACGT echo \"$(basename $f) \$x\" >> $ACGT;" &
+  while [ $(jobs -r | wc -l) -gt $NPROC ]; do sleep $WAITIME ; done
+done
+
+wait ;
+
+sort $ACGT > $ACGT.tmp ;
+mv $ACGT.tmp $ACGT ;
+
+### estimating k-mer and sketch size
+[ $K -le 0 ] && K=$($GAWK -v q=$Q '{n=$2+$3+$4+$5;kc=int(log(n*(1-q)/q)/log(4))+1;k=(kc>k)?kc:k}END{print k}' $ACGT) && [ $K -le 0 ] && k=19;
+echo "k-mer size: $K (q=$Q)" ;
+[ $SKETCH -le 0 ] && SKETCH=$($GAWK '{s+=$2+$3+$4+$5;n+=4}END{printf("%d\n", 100000*int((s/n)/100000))}' $ACGT) && [ $SKETCH -eq 0 ] && SKETCH=10000;
+echo "sketch size: $SKETCH" ;
+
+### sketching genomes
+TLIST="" ;
+for f in $GLIST
+do
+  TLIST="$TLIST $(basename ${f%.*})" ;
+  echo "sketching $(basename ${f%.*})" >&2 ;
+  $EXEC "$MASH sketch -o ${f%.*} -s $SKETCH -k $K $f" &> /dev/null &
+  while [ $(jobs -r | wc -l) -gt $NPROC ]; do sleep $WAITIME ; done
+done
+
+wait ; 
+
+#############################################################################################################
+#############################################################################################################
+#### DISTANCE ESTIMATES                                                                                  ####
+#############################################################################################################
+#############################################################################################################
+
+### estimating and writing pairwise p-distances
+echo $TLIST > $OEPL ;
+a=($(ls $DATADIR/*.msh | sort));
+i=${#a[@]}; 
+while [ $((j=--i)) -ge 0 ]
+do
+  mi=${a[$i]};
+  ti=$(basename ${mi%.*});
+  while [ $((--j)) -ge 0 ]
+  do
+    mj=${a[$j]};
+    tj=$(basename ${mj%.*});
+    echo "estimating p-distance between $ti ($(( $i + 1 ))) and $tj ($(( $j + 1 )))" >&2 ;
+    $EXEC "d=\$(timeout 5 $MASH dist -s $SKETCH $mi $mj | $GAWK '{printf(\"%.8f\\n\",\$3)}'); [ -n \"\$d\" ] && flock -x $OEPL echo \"$(( $i + 1 )) $(( $j + 1 )) \$d\" >> $OEPL ;" &
+    while [ $(jobs -r | wc -l) -gt $NPROC ]; do sleep $WAITIME ; done
+  done
+done
+
+wait ;
+
+### verifying every p-distance estimates
+$GAWK '(NR==1){n=NF;while((j=++i)<=n)while(--j>0)d[i][j]=d[j][i]=-1;next}
+              {d[$1][$2]=(d[$2][$1]=$3)}
+       END    {i=0;while((j=++i)<=n)while(--j>0)if(d[i][j]<0||d[j][i]<0)print i"\t"j}' $OEPL |
+  while read i j
+  do
+    let i--;
+    mi=${a[$i]};
+    ti=$(basename ${mi%.*});
+    let j--;
+    mj=${a[$j]};
+    tj=$(basename ${mj%.*});
+    echo "re-estimating p-distance between $ti ($(( $i + 1 ))) and $tj ($(( $j + 1 )))" >&2 ;
+    d=$($MASH dist -s $SKETCH $mi $mj | $GAWK '{printf("%.8f\n",$3)}'); flock -x $OEPL echo "$(( $i + 1 )) $(( $j + 1 )) $d" >> $OEPL ;
+  done
+
+wait ;
+
+### transforming (if required) p-distances and writing in PHYLIP square format
+if [ -n "$($GAWK -v c=$CUTOFF '(NR==1){next}($3>c){print;exit}' $OEPL)" ]
+then
+  $GAWK -v p=8 'function s(x){return x*x}
+                (ARGIND==1)        {++x;sx=$2+$3+$4+$5;a[x]=$2/sx;c[x]=$3/sx;g[x]=$4/sx;t[x]=$5/sx}
+                (ARGIND==2&&FNR==1){while(++n<=NF){m=(m>(l=length(lbl[n]=$n)))?m:l;d[n][n]=0}--n;  print(b=" ")n;x=0.5;while((x*=2)<m)b=b""b;  next}
+                (ARGIND==2)        {d[$1][$2]=(d[$2][$1]=$3)}
+                END                {while(++i<=n){printf substr(lbl[i]b,1,m);ai=a[i];ci=c[i];gi=g[i];ti=t[i];j=0;
+                                      while(++j<=n)printf(" %."p"f",((dij=d[i][j])==0)?0:((x=1-dij/(1-ai*a[j]-ci*c[j]-gi*g[j]-ti*t[j]))>0)?((s(ai+a[j])+s(ci+c[j])+s(gi+g[j])+s(ti+t[j]))/4-1)*log(x):1.23456789);
+                                      print""}}' $ACGT $OEPL > $DFILE ;
+  echo "F81 distances written into $DFILE" ;
+else
+  $GAWK -v p=8 '(NR==1){while(++n<=NF){m=(m>(l=length(lbl[n]=$n)))?m:l;d[n][n]=0}--n;  print(b=" ")n;x=0.5;while((x*=2)<m)b=b""b;  next}  
+                       {d[$1][$2]=(d[$2][$1]=$3)}
+                END    {while(++i<=n){printf substr(lbl[i]b,1,m);j=0;while(++j<=n)printf(" %."p"f",d[i][j]);print""}}' $OEPL > $DFILE ;
+  echo "p-distances written into $DFILE" ;
+fi
+
+### deleting all *.msh files
+for f in $DATADIR/*.msh ; do rm -f $f ; done
+
+
+if ! $INFERTREE ; then exit 0 ; fi
+
+
+#############################################################################################################
+#############################################################################################################
+#### BME TREE INFERENCE                                                                                  ####
+#############################################################################################################
+#############################################################################################################
+
+echo "searching for the BME phylogenetic tree..." ;
+
+TAXFILE=$BASEFILE.tax;
+grep -v "^ " $DFILE | $GAWK '{print $1}' > $TAXFILE ;
+
+$GAWK '(NR==1){print;next}{printf"@"(++i)"@";j=1;while(++j<=NF)printf" "$j;print""}' $DFILE > $BASEFILE.dd ;
+DFILE=$BASEFILE.dd;
+
+BMETREE=$BASEFILE.nwk;  # BME phylogenetic tree in NEWICK format
+OUTTREE=$BASEFILE.tt;
+STATFILE=$DFILE""_fastme_stat.txt;
+
+### first BME tree inference
+$FASTME -i $DFILE -o $OUTTREE -s -f 12 -T 1 &> /dev/null ;
+tblo=$(grep -B1 "Performed" $STATFILE | sed -n 1p | sed 's/.* //g' | sed 's/\.$//g');
+[ -z "$tblo" ] && tblo=$(grep -o ":[0-9\.-]*" $OUTTREE | tr -d :- | paste -sd+ | bc -l | sed 's/^\./0./');
+echo "  step 0   $tblo" >&2 ;
+echo "step 0   tbl=$tblo" ;
+cp $OUTTREE $BMETREE;
+i=0; while read tax; do let i++; sed -i "s/@$i""@/$tax/" $BMETREE ; done < $TAXFILE ;
+
+### ratchet-based search of the BME tree
+ct=0;
+for s in $(seq 1 $RATCHET)
+do
+  ### noising evolutionary distances
+  v=0.$s; [ $(echo "$v>=0.7" | bc) -eq 1 ] && v=0$(echo "scale=4;$v*$v" | bc -l);
+  $GAWK -v v=$v -v s=$s 'BEGIN  {srand(s)}
+                         (NR==1){n=$0;next}  {lbl[++i]=$1;d[i][i]=0;j=0;f=1;while(++f<=i){++j;d[i][j]=(d[j][i]=($f*(1-v)+2*v*$f*rand()))}}
+                         END    {print" "n;i=0;while(++i<=n){printf lbl[i];j=0;while(++j<=n){printf(" %.8f",d[i][j])}print""}}' $DFILE > $DFILE.noised ;
+
+  ### using current BME tree as starting tree for a new BME tree search
+  $FASTME -i $DFILE.noised -u $OUTTREE        -o $OUTTREE.noised    -nB -s -T 1 &> /dev/null ;
+  sed -i 's/:-/:/g' $OUTTREE.noised ;
+  $FASTME -i $DFILE        -u $OUTTREE.noised -o $OUTTREE.candidate     -s -T 1 &> /dev/null ;
+
+  tbl=$(grep -B1 "Performed" $STATFILE | sed -n 1p | sed 's/.* //g' | sed 's/\.$//g'); 
+  out=" ";
+  [ -z "$tbl" ] && tbl=$(grep -o ":[0-9\.-]*" $OUTTREE.candidate | tr -d :- | paste -sd+ | bc | sed 's/^\./0./') && out="+";
+  echo -n "$out step $s   $tbl" >&2 ; 
+  if [ $(echo "$tbl<$tblo" | bc) -eq 0 ]
+  then
+    echo >&2 ;
+  else
+    ct=0;
+    tblo=$tbl;
+    mv $OUTTREE.candidate $OUTTREE ;
+    cp $OUTTREE $BMETREE;
+    i=0; while read tax; do let i++; sed -i "s/@$i""@/$tax/" $BMETREE ; done < $TAXFILE ;
+    echo " *" >&2 ;
+    echo "step $s   tbl=$tbl"; 
+  fi
+
+  rm -f $DFILE.noised_fastme_stat.txt $STATFILE $OUTTREE.noised $OUTTREE.candidate $DFILE.noised ;
+
+  if [ $((++ct)) -eq $RATCHET_LIMIT ]; then break; fi
+done
+
+### confidence value at every branch
+$REQ $BASEFILE.d $BMETREE $OUTTREE ;
+mv $OUTTREE $BMETREE ;
+echo "BME tree (tbl=$tblo) with branch supports written into $BMETREE" ;
+rm -f $DFILE $TAXFILE $OUTTREE ;
+
+
+exit ;
+
+
+
+
+
diff --git a/README.md b/README.md
index cd6d304a7714a03bc1fde1a724b83f007ff9c30d..50df6691c688c84410fbdbee48ba8f076e186895 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,212 @@
 # JolyTree
 
+_JolyTree_ (named in memory of [Nicolas Joly](https://research.pasteur.fr/en/member/nicolas-joly/)) is a command line script written in [Bash](https://www.gnu.org/software/bash/) that allows a distance-based phylogenetic tree with branch supports to be quickly inferred from non-aligned genome sequences.
+_JolyTree_ runs on UNIX, Linux and most OS X operating systems.
+
+## Installation and execution
+
+**A.** Install the following programs and tools, or verify that they are already installed with the required version:
+* [mash](http://mash.readthedocs.io/en/latest/) [(Ondov et al. 2016)](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x) version >= 1.0.2;
+  * binaries: [github.com/marbl/Mash/releases](https://github.com/marbl/Mash/releases)
+  * sources: [github.com/marbl/Mash](https://github.com/marbl/Mash)
+* [gawk](https://www.gnu.org/software/gawk/manual/) version >= 4.1.0
+  * sources: [ftp.gnu.org/gnu/gawk](http://ftp.gnu.org/gnu/gawk/)
+* [FastME](http://www.atgc-montpellier.fr/fastme/usersguide.php) [(Lefort et al. 2015)](https://doi.org/10.1093/molbev/msv150) version >= 2.1.5.1
+  * sources: [gite.lirmm.fr/atgc/FastME](https://gite.lirmm.fr/atgc/FastME)
+* [REQ](https://research.pasteur.fr/en/tool/r%ce%b5q-assessing-branch-supports-o%c6%92-a-distance-based-phylogenetic-tree-with-the-rate-o%c6%92-elementary-quartets/) version >= 1.2
+  * sources: [gitlab.pasteur.fr/GIPhy/REQ](https://gitlab.pasteur.fr/GIPhy/REQ)
+
+**B.** Clone this repository with the following command line:
+```bash
+git clone https://gitlab.pasteur.fr/GIPhy/jolytree.git
+```
+
+**C.** If at least one of the four required binaries (step A) is not available on your `$PATH` variable, edit the file `JolyTree.sh` and indicate the local path to the mash, gawk, FastME and/or REQ binary(ies) (approximately between lines 100 and 200):
+
+```bash
+#############################################################################################################
+#                                                                                                           #
+# ================                                                                                          #
+# = INSTALLATION =                                                                                          #
+# ================                                                                                          #
+#                                                                                                           #
+# [1] REQUIREMENTS =======================================================================================  #
+# JolyTree depends on Mash, gawk,  FastME and REQ (see below),  each with a minimum version required.  You  #
+# should have them installed on your computer prior to using JolyTree. Make sure that each is installed on  #
+# your $PATH variable, or specify below the full path to each of them.                                      #
+#                                                                                                           #
+# -- Mash: fast pairwise p-distance estimation --------------------------------------------------------     #
+#    VERSION >= 1.0.2                                                                                       #
+#    src: github.com/marbl/Mash                                                                             #
+#                                                                 ################################################
+                                                                  ################################################
+  MASH=mash;                                                      ## <=== WRITE HERE THE PATH TO THE MASH       ##
+                                                                  ##      BINARY (VERSION 1.0.2 MINIMUM)        ##
+                                                                  ################################################
+                                                                  ################################################
+#                                                                                                           #
+# -- gawk: fast text file processing ------------------------------------------------------------------     #
+#    VERSION >= 4.1.0                                                                                       #
+#    src: ftp.gnu.org/gnu/gawk                                                                              #
+#                                                                 ################################################
+                                                                  ################################################
+  GAWK=gawk;                                                      ## <=== WRITE HERE THE PATH TO THE GAWK       ##
+                                                                  ##      BINARY (VERSION 4.1.0 MINIMUM)        ##
+                                                                  ################################################
+                                                                  ################################################
+#                                                                                                           #
+# -- FastME: fast distance-based phylogenetic tree inference ------------------------------------------     #
+#    VERSION >= 2.1.5.1                                                                                     #
+#    src: gite.lirmm.fr/atgc/FastME/                                                                        #
+#                                                                 ################################################
+                                                                  ################################################
+  FASTME=fastme;                                                  ## <=== WRITE HERE THE PATH TO THE FASTME     ##
+                                                                  ##      BINARY (VERSION 2.1.5.1 MINIMUM)      ##
+                                                                  ################################################
+                                                                  ################################################
+#                                                                                                           #
+# -- REQ: fast computation of the rates of elementary quartets ----------------------------------------     #
+#    VERSION >= 1.2                                                                                         #
+#    src: gitlab.pasteur.fr/GIPhy/REQ                                                                       #
+#                                                                 ################################################
+                                                                  ################################################
+  REQ=REQ;                                                        ## <=== WRITE HERE THE PATH TO THE REQ        ##
+                                                                  ##      BINARY (VERSION 1.2 MINIMUM)          ##
+                                                                  ################################################
+                                                                  ################################################
+#                                                                                                           #
+#############################################################################################################
+
+```
+
+**D.** Give the execute permission to the file `JolyTree.sh`:
+```bash
+chmod +x JolyTree.sh
+```
+
+**E.** Execute _JolyTree_ with the following command line model:
+```bash
+./jolyTree.sh  [options]
+```
+
+## Usage
+
+Launch _JolyTree_ without option to read the following documentation:
+
+```
+ USAGE:
+    JolyTree.sh  [options]
+ where:
+    -i <directory>  directory name containing  FASTA-formatted contig files;  only files
+                    ending with .fa, .fna, .fas or .fasta will be considered (mandatory)
+    -b <basename>   basename of every written output file (mandatory)
+    -s <int>        sketch size (default: 25% of the largest genome size)
+    -q <double>     probability of observing a random k-mer (default: 0.0001)
+    -k <int>        k-mer size (default: estimated from the average genome size with the
+                    probability set by option -q)
+    -c <real>       if at least one of the estimated p-distances is above this specified
+                    cutoff, then a F81 correction is performed (default: 0.1)
+    -n              no BME tree inference (only pairwise distance estimation)
+    -r <int>        number of steps  when performing the  ratchet-based  BME tree search
+                    (default: 100)
+    -t <int>        number of threads (default: 2)
+```
+
+## Notes
+
+* It is not recommended to modify the option -k. The optimal value of _k_ is automatically estimated by equation (2) in [Ondov et al. (2016)](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x) from the desired probability _q_ of observing a random _k_-mer (option -q). Increasing _q_ (e.g. > 0.001) is not recommended, especially when dealing distantly-related genomes. Lowering _q_ (e.g. < 0.00001) leads to larger _k_-mer size that increases the variance of the estimated evolutionary distances.
+
+* Increasing the sketch size (option -s) does not generally modify the inferred phylogenetic tree; on the other side, it is not recommended to set a sketch size lower than 10,000 (except for small genomic sequences, e.g. plasmids, viruses)
+
+* Lowering the cutoff value for correcting the evolutionary distances (option -c) does generally not modify the inferred phylogenetic tree; on the other side, it is strongly not recommended to increase this cutoff value.
+
+* The option -c allows multiple substitutions per character to be accurately estimated when an observed _p_-distance is quite large (e.g.> 0.1; see [Figure 3.1](https://books.google.fr/books?id=3Xc8DwAAQBAJ&pg=PA41) in Nei and Kumar 2000). In such cases, the F81 correction is performed by using the equation (4) in [Tamura and Kumar (2002)](https://academic.oup.com/mbe/article/19/10/1727/1258975) that allows estimating the pairwise distance based on the Equal-Input model of evolution ([Felsenstein 1981](https://link.springer.com/article/10.1007/BF01734359); [Tajima and Nei 1982](https://link.springer.com/article/10.1007/BF01810830), [1984](https://academic.oup.com/mbe/article/1/3/269/1244029)). This transformation was chosen because it could be directly computed from a _p_-distance value, and it takes into account putative unequal base frequencies and heterogeneous base composition among lineages.
+
+* Fast running times will be observed when using multiple threads; of note, only pairwise distance estimation step benefits from a large number of threads (other steps are quite fast).
+
+* The verbosity of _JolyTree_ could be reduced by ending the command line by `2>/dev/null`
+* To launch _JolyTree_ on multiple cores on a cluster managed by [SLURM](https://slurm.schedmd.com), edit the file `JolyTree.sh` and read the subsection [3] of the _Installation_ section (approximately line 200).
+
+
+## Example
+
+In order to illustrate the usefulness of _jolyTree_ and to describe its output files, the following use case example describes its usage for inferring an exploratory phylogenetic tree of _Klebsiella_ genomes.
+
+##### Downloading genome sequences
+
+The following command lines allows downloading the genome sequences of 39 _Klebsiella_ species from the [NCBI genome repository](https://www.ncbi.nlm.nih.gov/genome) inside a directory named _genomes_:
+
+```bash
+mkdir genomes/ ;
+EUTILS="wget -q -O- https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&rettype=fasta&id=";
+NCBIFTP="wget -q -O- https://ftp.ncbi.nlm.nih.gov/sra/wgs_aux/"; Z=".1.fsa_nt.gz";
+t="K.pneumoniae";
+echo -e "HS11286 CP003200\nNTUH-K2044 AP006725\nMGH78578 CP000647\nKCTC2242 CP002910\nATCCBAA-2146 CP006659\nCAV1217 CP018676" |
+ while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
+t="K.oxytoca";
+echo -e "CAV1374 CP011636\nKONIH1 CP008788\nAR_0147 CP020358\nFDAARGOS_335 CP027426\nJKo3 AP014951\nAR380 CP029128" |
+ while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
+t="K.aerogenes";
+echo -e "KCTC2190 CP002824\nEA1509E FO203355\nG7 CP011539\nAR_0062 CP026756\nCAV1320 CP011574\nFDAARGOS_139 CP014748" |
+ while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
+t="K.quasipneumoniae";
+echo -e "ATCC700603 CP014696\nHKUOPA4 CP014154\nKPC142 CP023478\nMGH44 AYIV01\nSKLX2781 LYWP01\nCCBH16302 MDCA01" |
+ while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
+t="K.variicola";
+echo -e "At-22 CP001891\nDSM15968 CP010523\nGJ2 CP017849\nWCHKP19 CP028555\nBIDMC88 LFBA01" |
+ while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
+t="K.quasivariicola";
+echo -e "KPN1705 CP022823\n10982 AKYX01\nPO552 NFVM01\nVRCO0126 FWGJ01\nVRCO0168 FWNZ01" |
+ while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
+
+
+t="K.pneumoniae";
+echo -e "ATCC13883 JOOW01\nMGH78578 CP000647\nKp13 CP003999\nNTUH-K2044 AP006725" |
+ while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
+t="K.quasipneumoniae.subsp.quasipneumoniae";
+echo -e "01A030 CCDF01" |
+ while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
+t="K.variicola";
+echo -e "342 CP000964\nAt-22 CP001891" |
+ while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
+t="K.quasipneumoniae.subsp.similipneumoniae";
+echo -e "07A044 CBZR01" |
+ while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
+t="K.quasivariicola"; 
+echo -e "10982 AKYX01\nKPN1705 CP022823" |
+ while IFS=" " read -r s a;do echo $t.$s;([ ${#a} -eq 6 ]&&$NCBIFTP/${a:0:2}/${a:2:2}/$a/$a$Z|zcat||$EUTILS$a)>genomes/$t.$s.fa; done
+
+```
+
+##### Launching _jolyTree_
+
+The following command line allows the script `jolyTree.sh` to be launched with default options on 8 threads:
+```bash
+./JolyTree.sh  -i genomes  -b klebsiella  -t 8  2>/dev/null
+```
+Of note, the verbosity could be expanded by omitting the final `2>/dev/null`.
+
+As the basename was set to 'klebsiella', _JolyTree_ writes in few minutes the four following output files:
+
+* `klebsiella.acgt`: the A, C, G and T residue counts for each genome,
+* `klebsiella.oepl`: every pairwise _p_-distance in [OEPL (One Entry Per Line) format](http://giphy.pasteur.fr/faq/phylogenetics/distance-matrix-file-conversion/#how-to-deal-with-the-one-entry-per-line-oepl-matrix-format)
+* `klebsiella.d`: the matrix of (corrected) pairwise evolutionary distances in PHYLIP square format
+* `klebsiella.nwk`: the BME phylogenetic tree in NEWICK format with REQ confidence support at branches
+
+
+
+## References
+
+Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17(6):368-376. [doi:10.1007/BF01734359](https://link.springer.com/article/10.1007/BF01734359).
+
+Lefort V, Desper R, Gascuel O (2015) FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Molecular Biology and Evolution, 32(10):2798-2800. [doi:10.1093/molbev/msv150](https://doi.org/10.1093/molbev/msv150).
+
+Nei M, Kumar S (2000) Molecular Evolution and Phylogenetics. Oxford University Press, Oxford. ISBN: 0-19-513584-9.
+
+Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM (2016) Mash: fast genome and metagenome distance estimation using MinHash. Genome Biology, 17(1):132. [doi:10.1186/s13059-016-0997-x](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x).
+
+Tajima F, Nei M (1982) Biases of the estimates of DNA divergence obtained by the restriction enzyme technique. Journal of Molecular Evolution, 18(2):115-120. [doi:10.1007/BF01810830](https://link.springer.com/article/10.1007/BF01810830).
+
+Tajima F, Nei M (1984) Estimation of evolutionary distance between nucleotide sequences. Molecular Biology and Evolution, 1(3):269-285. [doi:10.1093/oxfordjournals.molbev.a040317](https://academic.oup.com/mbe/article/1/3/269/1244029).
+
+Tamura K, Kumar S (2002) Evolutionary distance estimation under heterogeneous substitution pattern among lineages. Molecular Biology and Evolution, 19(10):1727-1736. [doi:10.1093/oxfordjournals.molbev.a003995](https://academic.oup.com/mbe/article/19/10/1727/1258975).