Trying to deconvolve single tag assignment for multiple molecules
A compilation of scripts and pipelines to count and extract scaffolds of barcodes from linked reads datasets.
## Nomenclature warnings
During the process of writing a scientific article, some of the datastructure names have been modified.
In this repository the majority of the names are old names.
So, here is a short list of equivalences:
- unit d-graph -> local clique pair
- udg -> lcp
- d²-graph (or d2-graph) -> lcp graph
- udg divergence = lcp weight
- udg edge distance = lcp edge weight
## Installation
...
...
@@ -15,13 +25,13 @@ Install the package from the root directory.
## Scripts
For the majority of the scripts, argparse is used.
To know how to use it please use the -h option.
To know how to use it please use the -h command line option.
### Data simulation
* generate_fake_molecule_graph.py: Create a linear molecule graph, where the molecules are linked to the d molecules on their left and d molecules on their right.
* generate_fake_barcode_graph.py: Take a barcode graph as input (gexf formated) and outputs a barcode graph. The barcode graph is create by fusion of nodes from the molecule graph.
* generate_fake_barcode_graph.py: Take a barcode graph as input (gexf formatted) and outputs a barcode graph. The barcode graph is create by fusion of nodes from the molecule graph.
* use the snakefile "Snakemake_data_simu".
All the parameters can be an integer or a list of integer.
...
...
@@ -43,20 +53,6 @@ Config parameters:
* to_d2_graph.py: Mount a barcode graph into memory and create a d2 graph from it.
* evaluate.py: take a d2 graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a d2 graph where the truth is known in the node names (the format used to create fake data).
* analyse_d2_tsv.py: Take an tsv optimization file of a d2 graph and look for the variables coverage. Outputs the missing variables (if exists).
## Run the tests
export PYTHONPATH=deconvolution/
pytest tests
export PYTHONPATH=
* d2_to_path.py: take a d2 graph as input and explore the nodes to extract a udg path.
* evaluate.py: take a d2 graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a d2 graph where the truth is known in the node names (the format used to create fake data).