A compilation of scripts and pipelines to count and extract scaffolds of barcodes from linked reads datasets.
A compilation of scripts and pipelines to extract orderings of barcodes from linked reads datasets.
**WARNING**: This code is a proof of concept, not a usable software for production. If the code is too slow for your tests or you are encontering some bugs (maybe it's a feature ? :p) don't hesitate to contact us via the issues or with a direct mail to me (yoann \[dot] dufresne \[at] pasteur \[dot] fr).
**WARNING**: This code is a proof of concept, not a usable software for production. If the code is too slow
for your tests or you are encountering some bugs (maybe it's a feature ? :p)
don't hesitate to contact us via the issues or with a direct mail to me (yoann \[dot] dufresne \[at] pasteur \[dot] fr).
## Nomenclature warnings
During the process of writing a scientific article, some of the datastructure names have been modified.
In this repository the majority of the names are old names.
So, here is a short list of equivalences:
During the process of writing the manuscript for this work, some of the datastructures have been renamed.
In the code, old names might still be present.
So, here is a short list of equivalences of terms:
- unit d-graph -> local clique pair
- udg -> lcp
- d²-graph (or d2-graph) -> lcp graph
...
...
@@ -50,9 +52,9 @@ Command to run the pipeline:
* generate_fake_barcode_graph.py: Take a barcode graph as input (gexf formatted) and outputs a barcode graph. The barcode graph is create by fusion of nodes from the molecule graph.
* use the snakefile "Snakemake_data_simu".
All the parameters can be an integer or a list of integer.
Each combination of parameter will generate a barcode graph.
Config parameters:
All the parameters can be integers or lists of integers.
Each combination of parameters will generate a barcode graph.
The parameters are:
* n: the number of initial molecules
* m: average number of node merged in each barcode
* d: average coverage of a molecule in the initial graph