From 34adf7327b9b6576003785fce09dd28e1c7b15fe Mon Sep 17 00:00:00 2001 From: Yoann Dufresne <yoann.dufresne0@gmail.com> Date: Tue, 26 May 2020 11:41:19 +0200 Subject: [PATCH] README update --- README.md | 16 ++++++++++++++++ Snakefile_d2_eval | 4 ++-- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 8444f79..c70b2a4 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,8 @@ A compilation of scripts and pipelines to count and extract scaffolds of barcodes from linked reads datasets. +**WARNING**: This code is a proof of concept, not a usable software for production. If the code is too slow for your tests or you are encontering some bugs (maybe it's a feature ? :p) don't hesitate to contact us via the issues or with a direct mail to me (yoann [dot] dufresne [at] pasteur [dot] fr). + ## Nomenclature warnings During the process of writing a scientific article, some of the datastructure names have been modified. In this repository the majority of the names are old names. @@ -27,6 +29,20 @@ Install the package from the root directory. For the majority of the scripts, argparse is used. To know how to use it please use the -h command line option. +### Test the complete pipeline on simulated data + +For a complete test, we made a bunch of snakemake files. +If you are looking for a complete pipeline from synthetic data generation, you should look into the "Snakefile_d2_eval" file. +You can play with the N (number of molecules in the interval graph), M (average number of merge to perform in a barcode), DEV (standard deviation on merge) variables to see impact on performances. +These values are arrays. You can enter multiple values and all the combinations will be done. +A summary is output in the tsv file "{WORKDIR}/eval_compare_maxclique.tsv. +Warning: the pipeline can be very slow for huge number of parameters. + +Command to run the pipeline: +``` + snakemake -s Snakefile_d2_eval +``` + ### Data simulation * generate_fake_molecule_graph.py: Create a linear molecule graph, where the molecules are linked to the d molecules on their left and d molecules on their right. diff --git a/Snakefile_d2_eval b/Snakefile_d2_eval index e88cef7..6a7f1ec 100644 --- a/Snakefile_d2_eval +++ b/Snakefile_d2_eval @@ -3,8 +3,8 @@ include: "Snakefile_d2" include: "Snakefile_d2_path" WORKDIR = "snake_experiments" if "workdir" not in config else config["workdir"] -N = [500] -D = [6] +N = [5000] +D = [10] M = [2] DEV = [0] -- GitLab