Commit 11b81e74 authored by Alexis  CRISCUOLO's avatar Alexis CRISCUOLO

1.3b

parent 0c7d6ca8
# C2A / A2C
_C2A_ and _A2C_ are command line programs written in [Java](https://docs.oracle.com/javase/8/docs/technotes/guides/language/index.html) that allow translating and back-translating FASTA-formatted codon and amino-acid sequence files, respectively. These tools were implemented to easily infer multiple sequence alignments at the codon level.
_C2A_ and _A2C_ are command line programs written in [Java](https://docs.oracle.com/javase/8/docs/technotes/guides/language/index.html) to translate and back-translate FASTA-formatted codon and amino-acid sequence files, respectively. These tools were implemented to easily infer multiple sequence alignments at the codon level.
## Compilation and execution
......@@ -8,6 +8,10 @@ The source codes are inside the _src_ directory and could be compiled and execut
#### Building an executable jar file
Clone this repository with the following command line:
```bash
git clone https://gitlab.pasteur.fr/GIPhy/C2A.A2C.git
```
On computers with [Oracle JDK](http://www.oracle.com/technetwork/java/javase/downloads/index.html) (6 or higher) installed, Java executable jar files could be created. In a command-line window, go to the _src_ directory and type:
```bash
javac C2A.java A2C.java
......@@ -17,7 +21,7 @@ echo Main-Class: A2C > MANIFEST.MF
jar -cmvf MANIFEST.MF A2C.jar A2C.class
rm MANIFEST.MF C2A.class A2C.class
```
This will create the two executable jar files `C2A.jar` and `A2C.jar` that could be launched with the following command line models:
This will create the two executable jar files `C2A.jar` and `A2C.jar` that could be run with the following command line models:
```bash
java -jar C2A.jar [file]
java -jar A2C.jar [files]
......@@ -25,38 +29,48 @@ java -jar A2C.jar [files]
#### Building a native code binary
On computers with the [GNU compiler GCJ](https://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcj/) installed, binaries could also be built. In a command-line window, go to the _src_ directory, and type:
Clone this repository with the following command line:
```bash
make
git clone https://gitlab.pasteur.fr/GIPhy/C2A.A2C.git
```
This will create the two executable binary files `c2a` and `a2c` that could be launched with the following command line models:
On computers with [GraalVM](hhttps://www.graalvm.org/downloads/) installed, native executables can be built. In a command-line window, go to the _src_ directory, and type:
```bash
./c2a [file]
./a2c [files]
javac C2A.java A2C.java
native-image C2A C2A
native-image A2C A2C
rm C2A.class A2C.class
```
This will create the two native executables `C2A` an `A2C` that can be run with the following command line models:
```bash
./C2A [file]
./A2C [files]
```
## Usage
Launch _C2A_ without option to read the following documentation:
Run _C2A_ without option to read the following documentation:
```
C2A
USAGE: C2A <seq.fna>
where <seq.fna> is a FASTA-formatted codon sequence file.
This will output in stdout the translation (standard
genetic code) of each sequence in the same format.
where <seq.fna> is a FASTA-formatted codon sequence file. This will
output in stdout the translation (standard genetic code) of each
sequence in the same format.
```
Launch _A2C_ without option to read the following documentation:
Run _A2C_ without option to read the following documentation:
```
A2C
USAGE: A2C <ali.faa> <seq.fna>
where <ali.faa> is a FASTA-formatted multiple amino-acid
sequence alignment file and <seq.ali> a FASTA-formatted
file containing the associated codon sequences. This
will output in stdout the multiple back-translated
sequence alignment.
where <ali.faa> is a FASTA-formatted multiple amino acid sequence
alignment file and <seq.ali> a FASTA-formatted file containing the
associated codon sequences. This will output in stdout the multiple
back-translated sequence alignment.
```
## Example
......@@ -67,7 +81,7 @@ First, using _C2A_ allows creating the file _seq.faa_ that contains the translat
```bash
C2A seq.fna > seq.faa
```
Second, the created _seq.faa_ could be used to infer a multiple amino-acid sequence alignment, which is expected to be more accurate than the one inferred from the initial codon sequences. The directory _src_ contains such an alignment inside the file _ali.faa_.
Second, the created _seq.faa_ could be used to infer a multiple amino-acid sequence alignment, which is expected to be more accurate than the one inferred from the initial codon sequences. The directory _example_ contains such an alignment inside the file _ali.faa_.
Finally, using _A2C_ allows creating the file _ali.fna_ by back-translating the amino-acid sequences inside _ali.faa_ with the associated codon sequences inside _seq.fna_:
```bash
......
/*
####################################################################
A2C: back-translating a multiple amino-acid sequence alignment into
a multiple codon sequence alignment
########################################################################################################
Copyright (C) 2015-2018 Alexis Criscuolo
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
A2C: back-translating a multiple amino-acid sequence alignment into a multiple codon sequence alignment
Copyright (C) 2015-2020 Institut Pasteur
This program is free software: you can redistribute it and/or modify it under the terms of the GNU
General Public License as published by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even
the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact:
Institut Pasteur
Bioinformatics and Biostatistics Hub
C3BI, USR 3756 IP CNRS
Paris, FRANCE
You should have received a copy of the GNU General Public License along with this program. If not, see
<http://www.gnu.org/licenses/>.
Contact:
Alexis Criscuolo alexis.criscuolo@pasteur.fr
Genome Informatics & Phylogenetics (GIPhy) giphy.pasteur.fr
Bioinformatics and Biostatistics Hub research.pasteur.fr/team/hub-giphy
USR 3756 IP CNRS research.pasteur.fr/team/bioinformatics-and-biostatistics-hub
Dpt. Biologie Computationnelle research.pasteur.fr/department/computational-biology
Institut Pasteur, Paris, FRANCE research.pasteur.fr
alexis.criscuolo@pasteur.fr
####################################################################
########################################################################################################
*/
import java.io.*;
import java.util.*;
public class A2C {
final static String VERSION = "1.3b.201024ac";
static File aafile, ntfile;
static BufferedReader in;
static ArrayList<String> fh;
......@@ -40,13 +40,16 @@ public class A2C {
public static void main(String[] args) throws IOException {
if ( args.length < 2 ) {
System.out.println("");
System.out.println(" USAGE: A2C <ali.faa> <seq.fna>"); System.out.println("");
System.out.println(" where <ali.faa> is a FASTA-formatted multiple amino-acid");
System.out.println(" sequence alignment file and <seq.ali> a FASTA-formatted");
System.out.println(" file containing the associated codon sequences. This");
System.out.println(" will output in stdout the multiple back-translated");
System.out.println(" sequence alignment.");
System.out.println(""); System.exit(0);
System.out.println(" A2C v." + VERSION + " Copyright (C) 2015-2020 Institut Pasteur");
System.out.println("");
System.out.println(" USAGE: A2C <ali.faa> <seq.fna>");
System.out.println("");
System.out.println(" where <ali.faa> is a FASTA-formatted multiple amino acid sequence");
System.out.println(" alignment file and <seq.ali> a FASTA-formatted file containing the");
System.out.println(" associated codon sequences. This will output in stdout the multiple");
System.out.println(" back-translated sequence alignment.");
System.out.println("");
System.exit(0);
}
fh = new ArrayList<String>(); aa = new ArrayList<StringBuilder>(); nt = new ArrayList<StringBuilder>(); i = n = -1;
if ( ! (aafile=new File(args[0])).exists() ) { System.err.println("file " + args[0] + " does not exist"); System.exit(1); }
......
/*
####################################################################
C2A: translating a FASTA-formatted codon sequence file into an
amino-acid one
########################################################################################################
Copyright (C) 2015-2018 Alexis Criscuolo
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
C2A: translating a FASTA-formatted codon sequence file into an amino-acid one
Copyright (C) 2015-2020 Institut Pasteur
This program is free software: you can redistribute it and/or modify it under the terms of the GNU
General Public License as published by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even
the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact:
Institut Pasteur
Bioinformatics and Biostatistics Hub
C3BI, USR 3756 IP CNRS
Paris, FRANCE
You should have received a copy of the GNU General Public License along with this program. If not, see
<http://www.gnu.org/licenses/>.
Contact:
Alexis Criscuolo alexis.criscuolo@pasteur.fr
Genome Informatics & Phylogenetics (GIPhy) giphy.pasteur.fr
Bioinformatics and Biostatistics Hub research.pasteur.fr/team/hub-giphy
USR 3756 IP CNRS research.pasteur.fr/team/bioinformatics-and-biostatistics-hub
Dpt. Biologie Computationnelle research.pasteur.fr/department/computational-biology
Institut Pasteur, Paris, FRANCE research.pasteur.fr
alexis.criscuolo@pasteur.fr
####################################################################
########################################################################################################
*/
import java.io.*;
public class C2A {
final static String VERSION = "1.3b.201024ac";
static BufferedReader in;
static String line, fh;
static int lgt;
......@@ -36,11 +36,15 @@ public class C2A {
public static void main(String[] args) throws IOException {
if ( args.length < 1 ) {
System.out.println("");
System.out.println(" USAGE: C2A <seq.fna>"); System.out.println("");
System.out.println(" where <seq.fna> is a FASTA-formatted codon sequence file.");
System.out.println(" This will output in stdout the translation (standard");
System.out.println(" genetic code) of each sequence in the same format.");
System.out.println(""); System.exit(0);
System.out.println(" C2A v." + VERSION + " Copyright (C) 2015-2020 Institut Pasteur");
System.out.println("");
System.out.println(" USAGE: C2A <seq.fna>");
System.out.println("");
System.out.println(" where <seq.fna> is a FASTA-formatted codon sequence file. This will");
System.out.println(" output in stdout the translation (standard genetic code) of each");
System.out.println(" sequence in the same format.");
System.out.println("");
System.exit(0);
}
try { in = new BufferedReader(new FileReader(new File(args[0]))); sb = new StringBuilder(""); }
catch ( FileNotFoundException e ) { System.out.println("file " + args[0] + " does not exist"); System.exit(1); }
......
GCJ=gcj
GCJFLAGS=-fsource=1.6 -march=native -msse2 -O3 -minline-all-stringops -fomit-frame-pointer -momit-leaf-frame-pointer -fstrict-aliasing -fno-store-check -fno-bounds-check -funroll-all-loops -Wall
OTHERFLAGS=-funsafe-math-optimizations -ffast-math
all: C2A A2C
C2A: C2A.java
$(GCJ) $(GCJFLAGS) --main=C2A C2A.java -o c2a
A2C: A2C.java
$(GCJ) $(GCJFLAGS) --main=A2C A2C.java -o a2c
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment