diff --git a/README.md b/README.md index 8a31352244e141dd295025df1b1e29a3c732e8a4..a058f7225091cc706704a6b949f61ae2a774c9ee 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ Name |Â Description ---- | ----------- -[Data simulation](simulation/) | Generate metagenomics simulated data for benchmarking +[Data simulation](simulation/) | Generate simulated metagenomics data for benchmarking ## Projects and repository diff --git a/simulation/README.md b/simulation/README.md index 3ab0886f6b4ddc9ee4da2f93a55685d9e15b360d..6545498628736e2c830cdff1c4038a9c421edd53 100644 --- a/simulation/README.md +++ b/simulation/README.md @@ -1 +1,96 @@ -# Simulation \ No newline at end of file +# Simulation + +[CAMISIM](https://github.com/CAMI-challenge/CAMISIM) can model different microbial abundance +profiles (from multi-sample time series to differential abundance studies) and was used to +generate the benchmark data sets of the first CAMI challenge. + +We describe here only the usage of de novo metagenomes simulation. For a more exhaustive +documentation, please refer to [CASISIM wiki](https://github.com/CAMI-challenge/CAMISIM/wiki). + +## Install + +CAMISIM contains a lot of dependencies and the list can be found [Here](https://github.com/CAMI-challenge/CAMISIM/wiki/User-manual#installation). + +However, we recommand the use of the docker image `cami/camisim:latest` or the singularity one (WIP). + +## Run de novo metagenomes simulation + +The repository of the tool ([Here](https://github.com/CAMI-challenge/CAMISIM)) comes with a +serie of sample data which makes it possible to directly test the tool without downloading +anything. + +For this example, we are going to start from the files given and modify a bit the parameters to +make the process a bit faster. + +### Configuration file + +You can here set the different parameters for your simulation. The customed file is the +`config.ini` file. + +We will quickly go through the different part of this config file. You can find the +complete description on the [Documentation](https://github.com/CAMI-challenge/CAMISIM/wiki/Configuration-File-Options). + +#### Main + +```ini +[Main] +seed=632741178 # if None is used, random seed is chosen +phase=0 # 0: Full run; 1: Only community design; 2: Start with read simulation +max_processors=8 +dataset_id=RL # name of the created sample +output_directory=out +temp_directory=/tmp +gsa=True # whether a gold standard assembly should be created +pooled_gsa=True # whether a pooled gold standard over all samples is created +anonymous=False # whether the output is anonymized +compress=1 # 0 is for no comrepssion, 9 is maximum comporession +``` + +Since we do not need the data for a challenge, we can switch off the anonymous part of the process. + +#### Read Simulator + +```ini +[ReadSimulator] +readsim=tools/art_illumina-2.3.6/art_illumina # leave by default since we are in a container +error_profiles=tools/art_illumina-2.3.6/profiles # leave by default +samtools=tools/samtools-1.3/samtools # leave by default +profile=mbarc # choose for ART: mi/hi/hi150/mbarc +size=0.1 # size of a single sample in Gigabasepairs (Gbp) +type=art # simulation tool +fragments_size_mean=270 +fragment_size_standard_deviation=27 +``` + +All the path for the tools are kept by default since we are using the tool from a container. +For the different profile, this corresponds to errors profiles and the documentation mention +that `mbarc` is recommended for bacterial communities. + +#### Community Design + +```ini +[CommunityDesign] +#distribution_file_paths='out/abundance0.tsv', 'out/abundance1.tsv', 'out/abundance2.tsv', 'out/abundance3.tsv', 'out/abundance4.tsv', 'out/abundance5.tsv', 'out/abundance6.tsv', 'out/abundance7.tsv', 'out/abundance8.tsv', 'out/abundance9.tsv' +ncbi_taxdump=tools/ncbi-taxonomy_20170222.tar.gz +strain_simulation_template=scripts/StrainSimulationWrapper/sgEvolver/simulation_dir +number_of_samples=10 +``` + +#### Community + +```ini +[community0] +metadata=defaults/metadata.tsv +id_to_genome_file=defaults/genome_to_id.tsv +id_to_gff_file= +genomes_total=24 +genomes_real=24 +max_strains_per_otu=1 +ratio=1 +mode=differential +log_mu=1 +log_sigma=2 +gauss_mu=1 +gauss_sigma=1 +view=False +```