start documentation about simulation

fcb574a2 · Kenzo-Hugo Hillion · 3aac3756 · fcb574a2 · fcb574a2
Commit fcb574a2 authored 5 years ago by Kenzo-Hugo Hillion
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 Name | Description
 ---- | -----------
-[Data simulation](simulation/) | Generate metagenomics simulated data for benchmarking
+[Data simulation](simulation/) | Generate simulated metagenomics data for benchmarking
 ## Projects and repository

--- a/simulation/README.md
+++ b/simulation/README.md
 # Simulation
\ No newline at end of file
+[CAMISIM](https://github.com/CAMI-challenge/CAMISIM) can model different microbial abundance
+profiles (from multi-sample time series to differential abundance studies) and was used to
+generate the benchmark data sets of the first CAMI challenge.
+We describe here only the usage of de novo metagenomes simulation. For a more exhaustive
+documentation, please refer to [CASISIM wiki](https://github.com/CAMI-challenge/CAMISIM/wiki).
+## Install
+CAMISIM contains a lot of dependencies and the list can be found [Here](https://github.com/CAMI-challenge/CAMISIM/wiki/User-manual#installation).
+However, we recommand the use of the docker image `cami/camisim:latest` or the singularity one (WIP).
+## Run de novo metagenomes simulation
+The repository of the tool ([Here](https://github.com/CAMI-challenge/CAMISIM)) comes with a
+serie of sample data which makes it possible to directly test the tool without downloading
+anything.
+For this example, we are going to start from the files given and modify a bit the parameters to
+make the process a bit faster.
+### Configuration file
+You can here set the different parameters for your simulation. The customed file is the
+`config.ini` file.
+We will quickly go through the different part of this config file. You can find the
+complete description on the [Documentation](https://github.com/CAMI-challenge/CAMISIM/wiki/Configuration-File-Options).
+#### Main
+```ini
+[Main]
+seed=632741178          # if None is used, random seed is chosen
+phase=0                 # 0: Full run; 1: Only community design; 2: Start with read simulation
+max_processors=8
+dataset_id=RL           # name of the created sample
+output_directory=out
+temp_directory=/tmp
+gsa=True                # whether a gold standard assembly should be created
+pooled_gsa=True        # whether a pooled gold standard over all samples is created
+anonymous=False         # whether the output is anonymized
+compress=1              # 0 is for no comrepssion, 9 is maximum comporession
+```
+Since we do not need the data for a challenge, we can switch off the anonymous part of the process.
+#### Read Simulator
+```ini
+[ReadSimulator]
+readsim=tools/art_illumina-2.3.6/art_illumina       # leave by default since we are in a container
+error_profiles=tools/art_illumina-2.3.6/profiles    # leave by default
+samtools=tools/samtools-1.3/samtools                # leave by default
+profile=mbarc                                       # choose for ART: mi/hi/hi150/mbarc
+size=0.1                                            # size of a single sample in Gigabasepairs (Gbp)
+type=art                                            # simulation tool
+fragments_size_mean=270
+fragment_size_standard_deviation=27
+```
+All the path for the tools are kept by default since we are using the tool from a container.
+For the different profile, this corresponds to errors profiles and the documentation mention
+that `mbarc` is recommended for bacterial communities.
+#### Community Design
+```ini
+[CommunityDesign]
+#distribution_file_paths='out/abundance0.tsv', 'out/abundance1.tsv', 'out/abundance2.tsv', 'out/abundance3.tsv', 'out/abundance4.tsv', 'out/abundance5.tsv', 'out/abundance6.tsv', 'out/abundance7.tsv', 'out/abundance8.tsv', 'out/abundance9.tsv'
+ncbi_taxdump=tools/ncbi-taxonomy_20170222.tar.gz
+strain_simulation_template=scripts/StrainSimulationWrapper/sgEvolver/simulation_dir
+number_of_samples=10
+```
+#### Community
+```ini
+[community0]
+metadata=defaults/metadata.tsv
+id_to_genome_file=defaults/genome_to_id.tsv
+id_to_gff_file=
+genomes_total=24
+genomes_real=24
+max_strains_per_otu=1
+ratio=1
+mode=differential
+log_mu=1
+log_sigma=2
+gauss_mu=1
+gauss_sigma=1
+view=False
+```