Commit 2403e42b authored by Yoann Dufresne's avatar Yoann Dufresne

update readme regarding the ongoing work

parent ed49295c
# 10X-deconvolve
# Linked Reads molecule separation
Trying to deconvolve single tag assignment for multiple molecules
A compilation of scripts and pipelines to count and extract scaffolds of barcodes from linked reads datasets.
## Nomenclature warnings
During the process of writing a scientific article, some of the datastructure names have been modified.
In this repository the majority of the names are old names.
So, here is a short list of equivalences:
- unit d-graph -> local clique pair
- udg -> lcp
- d²-graph (or d2-graph) -> lcp graph
- udg divergence = lcp weight
- udg edge distance = lcp edge weight
## Installation
......@@ -15,13 +25,13 @@ Install the package from the root directory.
## Scripts
For the majority of the scripts, argparse is used.
To know how to use it please use the -h option.
To know how to use it please use the -h command line option.
### Data simulation
* generate_fake_molecule_graph.py: Create a linear molecule graph, where the molecules are linked to the d molecules on their left and d molecules on their right.
* generate_fake_barcode_graph.py: Take a barcode graph as input (gexf formated) and outputs a barcode graph. The barcode graph is create by fusion of nodes from the molecule graph.
* generate_fake_barcode_graph.py: Take a barcode graph as input (gexf formatted) and outputs a barcode graph. The barcode graph is create by fusion of nodes from the molecule graph.
* use the snakefile "Snakemake_data_simu".
All the parameters can be an integer or a list of integer.
......@@ -43,20 +53,6 @@ Config parameters:
* to_d2_graph.py: Mount a barcode graph into memory and create a d2 graph from it.
* evaluate.py: take a d2 graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a d2 graph where the truth is known in the node names (the format used to create fake data).
* analyse_d2_tsv.py: Take an tsv optimization file of a d2 graph and look for the variables coverage. Outputs the missing variables (if exists).
## Run the tests
export PYTHONPATH=deconvolution/
pytest tests
export PYTHONPATH=
* d2_to_path.py: take a d2 graph as input and explore the nodes to extract a udg path.
## Tests for Cedric
```bash
snakemake -s Snakefile_data_simu --config n=10000 m=[4,6,8,10,12] m_dev=[0,0.5,1,2,3]
snakemake -s Snakefile_d2 --config input=[snake_exec/simu_bar_n10000_d5_m10-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m10-dev0.gexf,snake_exec/simu_bar_n10000_d5_m10-dev1.gexf,snake_exec/simu_bar_n10000_d5_m10-dev2.gexf,snake_exec/simu_bar_n10000_d5_m10-dev3.gexf,snake_exec/simu_bar_n10000_d5_m12-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m12-dev0.gexf,snake_exec/simu_bar_n10000_d5_m12-dev1.gexf,snake_exec/simu_bar_n10000_d5_m12-dev2.gexf,snake_exec/simu_bar_n10000_d5_m12-dev3.gexf,snake_exec/simu_bar_n10000_d5_m4-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m4-dev0.gexf,snake_exec/simu_bar_n10000_d5_m4-dev1.gexf,snake_exec/simu_bar_n10000_d5_m4-dev2.gexf,snake_exec/simu_bar_n10000_d5_m4-dev3.gexf,snake_exec/simu_bar_n10000_d5_m6-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m6-dev0.gexf,snake_exec/simu_bar_n10000_d5_m6-dev1.gexf,snake_exec/simu_bar_n10000_d5_m6-dev2.gexf,snake_exec/simu_bar_n10000_d5_m6-dev3.gexf,snake_exec/simu_bar_n10000_d5_m8-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m8-dev0.gexf,snake_exec/simu_bar_n10000_d5_m8-dev1.gexf,snake_exec/simu_bar_n10000_d5_m8-dev2.gexf,snake_exec/simu_bar_n10000_d5_m8-dev3.gexf]
```
* evaluate.py: take a d2 graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a d2 graph where the truth is known in the node names (the format used to create fake data).
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment