VIRUS POP
Virus Pop is a tool for building fake realistic viral protein sequences belonging to a taxonomic group given as input.
Installing
This pipeline needs snakemake (eg. via pip pip3 install snakemake) and Apptainer to be installed.
Let $MV_DIR
be the directory in which Virus Pop is located.
Clone the repository
git clone git@gitlab.pasteur.fr:ldp/mock-viruses.git
Then the singularity container has to be build (as root or with sudo, only needed for building, not for usage):
cd $MV_DIR/Virus_Pop
sudo singularity build singularity/virus_pop.sif singularity/virus_pop.def
Examples
cd $MV_DIR/Virus_Pop
# fetch as much as much as 15 genomes per species, work on clusters of at least 20 sequences
python3 ./run_virus_pop.py group_name Orthopneumovirus Orthopneumovirus_simu -t 15 -s 20
# load the real protein trees in iTOL (usefull to choose none where you want to branch simulations)
python3 ./load_in_ITOL.py Orthopneumovirus_simu --ITOL_info ${your_itol_batch_number} ${existing_itol_project_name}
# fetch as much as much 15 genomes per species and generate simulations on all ancestral nodes
python3 ./run_virus_pop.py group_name Pestivirus Pestivirus_simu -t 15 --all_ancestral_nodes
# use the provided homologous dataset, generate 2 ancestral sequences at each internal node and simulate the specified distances and infer tree with the simulations
python3 ./run_virus_pop.py homologous_protein_file DATA_example/sarbecovirus.spike.fa Sarbecovirus_spike_simu --all_ancestral_nodes -n 2 --evolution_distance '0.1 0.5 1 2 3 4 5'