README.md 1.25 KB
Newer Older
Julien GUGLIELMINI's avatar
Julien GUGLIELMINI committed
1
# Description
Julien GUGLIELMINI's avatar
Julien GUGLIELMINI committed
2
When considering a set of aligned sequences, identical character states can be [homoplasic](https://en.wikipedia.org/wiki/Homoplasy) (_i.e._ convergent evolution), or [synapomorphic](https://en.wikipedia.org/wiki/Synapomorphy_and_apomorphy) (acquired by descent).
Julien GUGLIELMINI's avatar
Julien GUGLIELMINI committed
3
This scripts aims at finding shared character states inside a multiple sequences alignment irrespective of the evolution of the sequences.
Julien GUGLIELMINI's avatar
Julien GUGLIELMINI committed
4
It takes as input a fasta file containing aligned sequences, and a list of sequences whose name match those in the alignment.
Julien GUGLIELMINI's avatar
Julien GUGLIELMINI committed
5 6
The output is a list of the shared character states followed by an entropy score - the closest is the score to 1, the more the result is significant.
If the sequences you gave as parameters share a common ancestor, then the output are synapomorphies. Otherwise they are homoplasies.
Julien GUGLIELMINI's avatar
Julien GUGLIELMINI committed
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

# Installation
Download the file, navigate to its location then type

```bash
chmod +x find_synapomorphies.sh
```

# Usage

```bash
./find_synapomorphies.sh "seq1,seq2,seq3" <input_file> <output_file>
```

* the input file is a fasta file containing a multiple sequences alignment
* the sequences of interest are comma-separated and inside a pair of double quotes

Julien GUGLIELMINI's avatar
Julien GUGLIELMINI committed
24 25 26 27 28 29
# Example

```bash
./find_synapomorphies.sh "seq1,seq2" example.fasta example.output
```