A compilation of scripts and pipelines to count and extract scaffolds of barcodes from linked reads datasets.
**WARNING**: This code is a proof of concept, not a usable software for production. If the code is too slow for your tests or you are encontering some bugs (maybe it's a feature ? :p) don't hesitate to contact us via the issues or with a direct mail to me (yoann [dot] dufresne [at] pasteur [dot] fr).
## Nomenclature warnings
During the process of writing a scientific article, some of the datastructure names have been modified.
In this repository the majority of the names are old names.
...
...
@@ -27,6 +29,20 @@ Install the package from the root directory.
For the majority of the scripts, argparse is used.
To know how to use it please use the -h command line option.
### Test the complete pipeline on simulated data
For a complete test, we made a bunch of snakemake files.
If you are looking for a complete pipeline from synthetic data generation, you should look into the "Snakefile_d2_eval" file.
You can play with the N (number of molecules in the interval graph), M (average number of merge to perform in a barcode), DEV (standard deviation on merge) variables to see impact on performances.
These values are arrays. You can enter multiple values and all the combinations will be done.
A summary is output in the tsv file "{WORKDIR}/eval_compare_maxclique.tsv.
Warning: the pipeline can be very slow for huge number of parameters.
Command to run the pipeline:
```
snakemake -s Snakefile_d2_eval
```
### Data simulation
* generate_fake_molecule_graph.py: Create a linear molecule graph, where the molecules are linked to the d molecules on their left and d molecules on their right.
parser.add_argument('--component_min_size','-c',type=int,default=10,help="Minimum size of a component to keep it in the d2-graph, after the edge simplifications.")
parser.add_argument('lcp_graph',help='lcp graph to reduce. Must be a gefx formated file.')
parser.add_argument('--component_min_size','-c',type=int,default=10,help="Minimum size of a component to keep it in the lcp graph, after the edge simplifications.")