Commit d8363b4a authored by Alexis  CRISCUOLO's avatar Alexis CRISCUOLO

Initial commit

parents
This diff is collapsed.
# C2A / A2C
_C2A_ and _A2C_ are command line programs written in [Java](https://docs.oracle.com/javase/8/docs/technotes/guides/language/index.html) that allow translating and back-translating FASTA-formatted codon and amino-acid sequence files, respectively. These tools were implemented to easily infer multiple sequence alignments at the codon level.
## Compilation and execution
The source codes are inside the _src_ directory and could be compiled and executed in two different ways.
#### Building an executable jar file
On computers with [Oracle JDK](http://www.oracle.com/technetwork/java/javase/downloads/index.html) (6 or higher) installed, Java executable jar files could be created. In a command-line window, go to the _src_ directory and type:
```bash
javac C2A.java A2C.java
echo Main-Class: C2A > MANIFEST.MF
jar -cmvf MANIFEST.MF C2A.jar C2A.class
echo Main-Class: A2C > MANIFEST.MF
jar -cmvf MANIFEST.MF A2C.jar A2C.class
rm MANIFEST.MF C2A.class A2C.class
```
This will create the two executable jar files `C2A.jar` and `A2C.jar` that could be launched with the following command line models:
```bash
java -jar C2A.jar [file]
java -jar A2C.jar [files]
```
#### Building a native code binary
On computers with the [GNU compiler GCJ](https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcj/) installed, binaries could also be built. In a command-line window, go to the _src_ directory, and type:
```bash
make
```
This will create the two executable binary files `c2a` and `a2c` that could be launched with the following command line models:
```bash
./c2a [file]
./a2c [files]
```
## Usage
Launch _C2A_ without option to read the following documentation:
```
USAGE: C2A <seq.fna>
where <seq.fna> is a FASTA-formatted codon sequence file.
This will output in stdout the translation (standard
genetic code) of each sequence in the same format.
```
Launch _A2C_ without option to read the following documentation:
```
USAGE: A2C <ali.faa> <seq.fna>
where <ali.faa> is a FASTA-formatted multiple amino-acid
sequence alignment file and <seq.ali> a FASTA-formatted
file containing the associated codon sequences. This
will output in stdout the multiple back-translated
sequence alignment.
```
## Example
To illustrate the usefulness of _C2A_ and _A2C_, the directory _example_ contains FASTA files from the study of [Drini et al. (2016)](https://doi.org/10.1093/gbe/evw140). The first file _seq.fna_ contains several _Leishmania_ and _Trypanosoma_ codon sequences from the sub-family HSPA1. In order to easily infer an accurate multiple sequence alignment at the codon level, _C2A_ and _A2C_ could be used together with a standard multiple sequence alignment program.
First, using _C2A_ allows creating the file _seq.faa_ that contains the translation of every codon sequence inside _seq.fna_:
```bash
C2A seq.fna > seq.faa
```
Second, the created `seq.faa`could be used to infer a multiple amino-acid sequence alignment, which is expected to be more accurate than the one inferred from the initial codon sequences. The directory _src_ contains such an alignment inside the file _ali.faa_.
Finally, using _A2C_ allows creating the file _ali.fna_ by back-translating the amino-acid sequences inside _ali.faa_ with the associated codon sequences inside _seq.fna_:
```bash
A2C ali.faa seq.fna > ali.fna
```
Following this way, the file _ali.fna_ contains an accurate multiple sequence alignment at the codon level, i.e. the homology is recovered for each codon position.
## Reference
Drini S, Criscuolo A, Lechat P, Imamura H, Skalický T, Rachidi N, Lukeš J, Dujardin JC, Späth GF (2016) Species- and strain-specific adaptation of the HSP70 super family in pathogenic Trypanosomatids. Genome Biology and Evolution, 8(6):1980-1995. [doi:10.1093/gbe/evw140](https://doi.org/10.1093/gbe/evw140).
>LinJ.28.2950
MPWYLFSTTSSPRCALPLPPFYPPNHTQDPKHALALKLSEENTYAHRHTSLSLCALLRNP
ITTLLPPPPIPHAHTHTTTAAEMTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPMIAVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDRAQRDRV--------------------------------------
------------------------------------------------------------
-------------------------------------------
>LDPBQ7IC8_280037100
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPMIAVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDRAQRDRVEAKNGLENYAYSMKNTLGDSNVSGKLDDSDKATLNKEI
DVTLEWLSSNQEATKEEYEHKQKELESVCNPIMTKMYQSMGGA--G--------------
-GGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LDPBQ7IC8_280037000
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPMIAVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDRAQRDRVEAKNGLENYAYSMKNTLGDSNVSGKLDDSDKATLNKEI
DVTLEWLSSNQEATKEEYEHKQKELESVCNPIMTKMYQSMGGA--G--------------
-GGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LDPBQ7IC8_280036500
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPMIAVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDRAQRDRVEAKNGLENYAYSMKNTLGDSNVSGKLDDSDKATLNKEI
DVTLEWLSSNQEATKEEYEHKQKELESVCNPIMTKMYQSMGGA--G--------------
-GGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LDPBQ7IC8_280036600
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPMIAVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGMMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDRAQRDRVEAKNGLENYAYSMKNTLGDSNVSGKLDDSDKATLNKEI
DVTLEWLSSNQEATKEEYEHKQKELESVCNPIMTKMYQSMGGA--G--------------
-GGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LDPBQ7IC8_280036700
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPMIAVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDRAQRDRVEAKNGLENYAYSMKNTLGDSNVSGKLDDSDKATLNKEI
DVTLEWLSSNQEATKEEYEHKQKELESVCNPIMTKMYQSMGGA--G--------------
-GGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LDPBQ7IC8_280036800
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPMIAVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDRAQRDRVEAKNGLENYAYSMKNTLGDSNVSGKLDDSDKATLNKEI
DVTLEWLSSNQEATKEEYEHKQKELESVCNPIMTKMYQSMGGA--G--------------
-GGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LDPBQ7IC8_280036900
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPMIAVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDRAQRDRVEAKNGLENYAYSMKNTLGDSNVSGKLDDSDKATLNKEI
DVTLEWLSSNQEATKEEYEHKQKELESVCNPIMTKMYQSMGGA--G--------------
-GGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LinJ.28.3000
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPMIAVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDRAQRDRVEAKNGLENYAYSMKNTLGDSNVSGKLDDSDKATLNKEI
DVTLEWLSSNQEATKEEYEHKQKELESVCNPIMTKMYQSMGGA--G--------------
-GGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LinJ.28.2960
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPMIAVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDRAQRDRVEAKNGLENYAYSMKNTLGDSNVSGKLDDSDKATLNKEI
DVTLEWLSSNQEATKEEYEHKQKELESVCNPIMTKMYQSMGGA--G--------------
-GGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LinJ.28.3060
MPWYLFSTTSSPRCALPLPPFYPPNHTQDPKHALALKLSEENTYAHRHTSLSLCALLRNP
ITTLLPPPPIPHAHTHTTTAAEMTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPMIAVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDRAQRDRVEAKNGLENYAYSMKNTLGDSNVSGKLDDSDKATLNKEI
DVTLEWLSSNQEATKEEYEHKQKELESVCNPIMTKMYQSMGGA--G--------------
-GGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LmjF.28.2770
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPVISVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENIDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEEDDKAQRDRVEAKNGLENYAYSMKNTLSDSNVSGKLDDSDKATLNKEI
DAALEWLSSNQEATKEEYEHRQKELESVCNPIMTKMYQSMGGA--AG-----------GM
PGGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LmjF.28.2780
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNERVDIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPVISVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRALRRLRTACERAKRTL
SSATQATIEIDALFENIDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEEDDKAQRDRVEAKNGLENYAYSMKNTLSDSNVSGKLDDSDKATLNKEI
DAALEWLSSNQEATKEEYEHRQKELESVCNPIMTKMYQSMGGA--AG-----------GM
PGGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LmxM.28.2780
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNDRVEIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPVISVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTISGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKNXKGKNLASSHRSLRRLRTACERAKRTL
SSATQATIEIDALFDNVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDKAQRDRVEAKNGLENYAYSMKNTLGDSNVSGKLDDTDKSTLNKEI
DAALEWLSSNQEATKEEYEHKQKELENVCNPIMTKMYQSMGGG--A--------------
-GGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVD-
>LmxM.28.2770
------------------------------------------------------------
----------------------MTFDGAIGIDLGTTYSCVGVWQNDRVEIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKFNDSVVQSDMKHWPFKVTTKG
DDKPVISVQYRGEEKTFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTISGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFTEEFKRKN-KGKNLASSHRSLRRLRTACERAKRTL
SSATQATIEIDALFDNVDFQATITRARFEELCGDLFRSTIQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQITITNDKGRLSKDE
IERMVNDAMKYEADDKAQRDRVEAKNGLENYAYSMKNTLGDSNVSGKLDDTDKSTLNKEI
DAALEWLSSNQEATKEEYEHKQKELENVCNPIMTKMYQSMGGG--A--------------
-GGMPGGMPD--MSGMSGGAG---PA---GGASSGPKVEEVDX
>LbrM.28.2990
------------------------------------------------------------
----------------------MTFEGAIGIDLGTTYSCVGVWQNERVEIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKYGDPVVQADMKHWPFKVKTKG
EDKPVISVQYCNEEKIFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFSEEFKRKN-KGKDLSSSHRALRRLRTACERAKRTL
SSATQATIEIDALFDNVDFQANITRARFEELCGDLFRSTMQPVERVLQDAKMDKRSVHDV
VLVGGSTRIPKVQSLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNHITITNDKGRLSKDE
IERMVNDASKYEQADKMQRERVEAKNGLENYAYSMKNTVSDTNVSGKLEESDRSALNSAI
DAALEWLNSNQEASKEEYEHRQKELESTCNPIMTKMYQSMGGG--A--------------
-GGMPGGMPD--MSGMGGGAG---PA---AGASSGPKVEEVDX
>LbrM.28.2980
------------------------------------------------------------
----------------------MTFEGAIGIDLGTTYSCVGVWQNERVEIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPHNTVFDAKRLIGRKYGDPVVQADMKHWPFKVKTKG
EDKPVISVQYCNEEKIFTPEEISSMVLLKMKETAEAYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKGDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVTFFSEE--------------------------------
------------------------------------------------------------
------------------------------------------------------------
-----------------------------XXXXFLTYGANQPGWPSR-FRGRARDEKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNHITITNDKGRLSKDE
IERMVNDASKYEQADKMQRERVEAKNGLENYAYSMKNTVSDTNVSGKLEESDRSALNSAI
DAALEWLNSNQEASKEEYEHRQKELESTCNPIMTKMYQS---------------------
-------------------------------------------
>TvY486_1112330
------------------------------------------------------------
----------------------MTYEGAIGIDLGTTYSCVGVWQNERVEIIANDQGNRTT
PSYVAFTDTERLIGDAAKNQVAMNPLNTVFDAKRLIGRKFSDSVVQADMKHWPFKVTTKG
DDKPVIQVQFHGETKTFNPEEISSTVLLKMKEVAESYLGKQVTKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKVDDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVAHFTEEFKRKN-KGKDLTTSQRALRRLRTACERAKRTL
SSAAQATIEIDALFDSVDFQSTITRARFEELCGDLFRGTLQPVERVLQDAKMDKRAVHDV
VLVGGSTRIPKVMQLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVAPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLNADGILSVSAEEKGTGKRNQIVITNDKGRLSKAE
IERMVNDAAKYEAQDKAQRERVDAKNALENYAFSMKSTATDPNVGGKLDEADKNTIIEAV
DGALQWLNNNQEASKEEYEHRQKELEGVCTPIMTKMYQGMGGG--A--------------
-GGMPGGMPGGMPGGMPGGMGGG-------APSAGPRVEEVDX
>Tb927.11.11330
------------------------------------------------------------
----------------------MTYEGAIGIDLGTTYSCVGVWQNERVEIIANDQGNRTT
PSYVAFTDSERLIGDAAKNQVAMNPTNTVFDAKRLIGRKFSDSVVQSDMKHWPFKVVTKG
DDKPVIQVQFRGETKTFNPEEISSMVLLKMKEVAESYLGKQVAKAVVTVPAYFNDSQRQA
TKDAGTIAGLEVLRIINEPTAAAIAYGLDKADEGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVAHFTEEFKRKN-KGKDLSSNLRALRRLRTACERAKRTL
SSAAQATIEIDALFENIDFQATITRARFEELCGDLFRGTLQPVERVLQDAKMDKRAVHDV
VLVGGSTRIPKVMQLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVAPLTLGIETAGGVMTALIKRNTTIPTKKSQIFSTYSDNQPGVHIQVFEGERTMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILSVSAEEKGTGKRNQIVITNDKGRLSKAD
IERMVSDAAKYEAEDKAQRERIDAKNGLENYAFSMKNTINDPNVAGKLDDADKNAVTTAV
EEALRWLNDNQEASLDEYNHRQKELEGVCAPILSKMYQGMGGGDAA--------------
-GGMPGGMPGGMPGGMPGGMGGGMGG---AAASSGPKVEEVDX
>TcCLB.511211.170
------------------------------------------------------------
----------------------MTYEGAIGIDLGTTYSCVGVWQNERVEIIANDQGSRTT
PSYVAFTDTERLIGDAAKNQVAMNPTNTVFDAKRLIGRKFSDPVVQSDMKHWPFKVITKG
DDKPVIQVQFRGETKTFNPEEVSSMVLSKMKEIAESYLGKQVKKAVVTVPAYFNDSQRQA
TKDAGTIAGMEVLRIINEPTAAAIAYGLDKVEDGKERNVLIFDLGGGTFDVTLLTIDGGI
FEVKATNGDTHLGGEDFDNRLVSHFTDEFKRKN-KGKDLTTSQRALRRLRTACERAKRTL
SSAAQATIEIDALFDNVDFQATITRARFEELCGDLFRGTLQPVERVLQDAKMDKRAVHDV
VLVGGSTRIPKVMQLVSDFFGGKELNKSINPDEAVAYGAAVQAFILTGGKSKQTEGLLLL
DVTPLTLGIETAGGVMTSLIKRNTTIPTKKSQIFSTYADNQPGVHIQVFEGERAMTKDCH
LLGTFDLSGIPPAPRGVPQIEVTFDLDANGILNVSAEEKGTGKRNQIVITNDKGRLSKAD
IERMVSEAAKYEAQDKEQRERIDAKNGLENYAFSMKNTVNEPNVAGKIEEADKNTITSAV
EEALQWLNNNQEASKEEYEHRQKELENLCTPIMTKMYQGMGAG--GGMPGGMPGGMPGGM
PGGMPGGMPGGMPGGMPGGMPGGMPGGANPSSSSGPKVEEVDX
This diff is collapsed.
/*
####################################################################
A2C: back-translating a multiple amino-acid sequence alignment into
a multiple codon sequence alignment
Copyright (C) 2015-2018 Alexis Criscuolo
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact:
Institut Pasteur
Bioinformatics and Biostatistics Hub
C3BI, USR 3756 IP CNRS
Paris, FRANCE
alexis.criscuolo@pasteur.fr
####################################################################
*/
import java.io.*;
import java.util.*;
public class A2C {
static File aafile, ntfile;
static BufferedReader in;
static ArrayList<String> fh;
static ArrayList<StringBuilder> aa, nt;
static int i, c, l, n, x;
static String line, aaseq, ntseq;
static StringBuilder coseq;
public static void main(String[] args) throws IOException {
if ( args.length < 2 ) {
System.out.println("");
System.out.println(" USAGE: A2C <ali.faa> <seq.fna>"); System.out.println("");
System.out.println(" where <ali.faa> is a FASTA-formatted multiple amino-acid");
System.out.println(" sequence alignment file and <seq.ali> a FASTA-formatted");
System.out.println(" file containing the associated codon sequences. This");
System.out.println(" will output in stdout the multiple back-translated");
System.out.println(" sequence alignment.");
System.out.println(""); System.exit(0);
}
fh = new ArrayList<String>(); aa = new ArrayList<StringBuilder>(); nt = new ArrayList<StringBuilder>(); i = n = -1;
if ( ! (aafile=new File(args[0])).exists() ) { System.err.println("file " + args[0] + " does not exist"); System.exit(1); }
in = new BufferedReader(new FileReader(aafile));
while ( true ) {
try { line = in.readLine().trim(); } catch ( NullPointerException e ) { ++n; in.close(); break; }
if ( line.startsWith(">") ) { ++n; fh.add(line); aa.add(new StringBuilder("")); nt.add(new StringBuilder("")); continue; }
aa.set(n, aa.get(n).append(line));
}
if ( ! (ntfile=new File(args[1])).exists() ) { System.err.println("file " + args[1] + " does not exist"); System.exit(1); }
in = new BufferedReader(new FileReader(ntfile));
while ( true ) {
try { line = in.readLine().trim(); } catch ( NullPointerException e ) { i = -1; in.close(); break; }
if ( line.startsWith(">") ) { i = fh.indexOf(line); continue; }
if ( i >= 0 ) nt.set(i, nt.get(i).append(line));
}
while ( ++i < n ) {
if ( (ntseq=nt.get(i).toString()).length() == 0 ) continue;
coseq = new StringBuilder(""); l = (aaseq=aa.get(i).toString()).length(); c = -1; x = 0;
while ( ++c < l ) coseq = ( aaseq.charAt(c) == '-' ) ? coseq.append("---") : coseq.append(ntseq.substring(x, (x += 3)));
System.out.println(fh.get(i)); System.out.println(coseq.toString());
}
}
}
/*
####################################################################
C2A: translating a FASTA-formatted codon sequence file into an
amino-acid one
Copyright (C) 2015-2018 Alexis Criscuolo
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact:
Institut Pasteur
Bioinformatics and Biostatistics Hub
C3BI, USR 3756 IP CNRS
Paris, FRANCE
alexis.criscuolo@pasteur.fr
####################################################################
*/
import java.io.*;
public class C2A {
static BufferedReader in;
static String line, fh;
static int lgt;
static StringBuilder sb;
public static void main(String[] args) throws IOException {
if ( args.length < 1 ) {
System.out.println("");
System.out.println(" USAGE: C2A <seq.fna>"); System.out.println("");
System.out.println(" where <seq.fna> is a FASTA-formatted codon sequence file.");
System.out.println(" This will output in stdout the translation (standard");
System.out.println(" genetic code) of each sequence in the same format.");
System.out.println(""); System.exit(0);
}
try { in = new BufferedReader(new FileReader(new File(args[0]))); sb = new StringBuilder(""); }
catch ( FileNotFoundException e ) { System.out.println("file " + args[0] + " does not exist"); System.exit(1); }
while ( true ) {
try { line = in.readLine().trim(); } catch ( NullPointerException e ) { in.close(); break; }
if ( line.startsWith(">") ) {
if ( ((lgt=sb.length()) > 0) && (lgt % 3 == 0) ) { System.out.println(fh); System.out.println(toaa(sb)); }
fh = line; sb = new StringBuilder(""); continue;
}
sb = sb.append(line.toUpperCase());
}
if ( ((lgt=sb.length()) > 0) && (lgt % 3 == 0) ) { System.out.println(fh); System.out.println(toaa(sb)); }
}
static String toaa(StringBuilder cod) {
int c = -1, l = cod.length(); StringBuilder aa = new StringBuilder("");
while ( ++c < l )
switch (cod.charAt(c)) {
case 'A':
switch (cod.charAt(++c)) { // A..
case 'A': switch (cod.charAt(++c)) { case 'A': case 'G': aa = aa.append('K'); continue; case 'C': case 'T': aa = aa.append('N'); continue; default: aa = aa.append('X'); continue; } // AA.
case 'C': aa = aa.append('T'); ++c; continue; // AC.
case 'G': switch (cod.charAt(++c)) { case 'A': case 'G': aa = aa.append('R'); continue; case 'C': case 'T': aa = aa.append('S'); continue; default: aa = aa.append('X'); continue; } // AG.
case 'T': switch (cod.charAt(++c)) { case 'A': case 'C': case 'T': aa = aa.append('I'); continue; case 'G': aa = aa.append('M'); continue; default: aa = aa.append('X'); continue; } } // AT.
case 'C':
switch (cod.charAt(++c)) { // C..
case 'A': switch (cod.charAt(++c)) { case 'A': case 'G': aa = aa.append('Q'); continue; case 'C': case 'T': aa = aa.append('H'); continue; default: aa = aa.append('X'); continue; } // CA.
case 'C': aa = aa.append('P'); ++c; continue; // CC.
case 'G': aa = aa.append('R'); ++c; continue; // CG.
case 'T': aa = aa.append('L'); ++c; continue; } // CT.
case 'G':
switch (cod.charAt(++c)) { // G..
case 'A': switch (cod.charAt(++c)) { case 'A': case 'G': aa = aa.append('E'); continue; case 'C': case 'T': aa = aa.append('D'); continue; default: aa = aa.append('X'); continue; } // GA.
case 'C': aa = aa.append('A'); ++c; continue; // GC.
case 'G': aa = aa.append('G'); ++c; continue; // GG.
case 'T': aa = aa.append('V'); ++c; continue; } // GT.
case 'T':
switch (cod.charAt(++c)) { // T..
case 'A': switch (cod.charAt(++c)) { case 'A': case 'G': aa = aa.append('X'); continue; case 'C': case 'T': aa = aa.append('Y'); continue; default: aa = aa.append('X'); continue; } // TA.
case 'C': aa = aa.append('S'); ++c; continue; // TC.
case 'G': switch (cod.charAt(++c)) { case 'A': aa = aa.append('X'); continue; case 'C': case 'T': aa = aa.append('C'); continue; case 'G': aa = aa.append('W'); continue; default: aa = aa.append('X'); continue; } // TG.
case 'T': switch (cod.charAt(++c)) { case 'A': case 'G': aa = aa.append('L'); continue; case 'C': case 'T': aa = aa.append('F'); continue; default: aa = aa.append('X'); continue; } } // TT.
default: aa = aa.append('X'); ++c; ++c; continue;
}
return aa.toString();
}
}
GCJ=gcj
GCJFLAGS=-fsource=1.6 -march=native -msse2 -O3 -minline-all-stringops -fomit-frame-pointer -momit-leaf-frame-pointer -fstrict-aliasing -fno-store-check -fno-bounds-check -funroll-all-loops -Wall
OTHERFLAGS=-funsafe-math-optimizations -ffast-math
all: C2A A2C
C2A: C2A.java
$(GCJ) $(GCJFLAGS) --main=C2A C2A.java -o c2a
A2C: A2C.java
$(GCJ) $(GCJFLAGS) --main=A2C A2C.java -o a2c
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment