Description
When considering a set of aligned sequences, identical character states can be homoplasitc (i.e. convergent evolution), or synapomorphic (acquired by descent). This script aims at finding shared character states inside a multiple sequences alignment irrespective of the evolution of the sequences. It takes as input a fasta file containing aligned sequences, and a list of sequences whose name match those in the alignment. The output is a list of the shared character states followed by an entropy score - the closest is the score to 1, the more the result is significant. If the sequences you gave as parameters share a common ancestor, then the output are synapomorphies. Otherwise they are homoplasies.
Installation
This script is written in Bash and as such should run on any Unix platform. Simply download the file, navigate to its location and then type
chmod +x find_synapomorphies.sh
Usage
./find_synapomorphies.sh "seq1,seq2,seq3" <input_file> <output_file>
- the input file is a fasta file containing a multiple sequences alignment
- the sequences of interest are comma-separated and inside a pair of double quotes
Example
./find_synapomorphies.sh "seq1,seq2" example.fasta example.output