diff --git a/README.md b/README.md index b9f7294e5e08de101da4adbc51a148f1a2c8e393..8444f7902054eabb10296e73ea2d73d121c144ab 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,16 @@ -# 10X-deconvolve +# Linked Reads molecule separation -Trying to deconvolve single tag assignment for multiple molecules +A compilation of scripts and pipelines to count and extract scaffolds of barcodes from linked reads datasets. + +## Nomenclature warnings +During the process of writing a scientific article, some of the datastructure names have been modified. +In this repository the majority of the names are old names. +So, here is a short list of equivalences: +- unit d-graph -> local clique pair +- udg -> lcp +- d²-graph (or d2-graph) -> lcp graph +- udg divergence = lcp weight +- udg edge distance = lcp edge weight ## Installation @@ -15,13 +25,13 @@ Install the package from the root directory. ## Scripts For the majority of the scripts, argparse is used. -To know how to use it please use the -h option. +To know how to use it please use the -h command line option. ### Data simulation * generate_fake_molecule_graph.py: Create a linear molecule graph, where the molecules are linked to the d molecules on their left and d molecules on their right. -* generate_fake_barcode_graph.py: Take a barcode graph as input (gexf formated) and outputs a barcode graph. The barcode graph is create by fusion of nodes from the molecule graph. +* generate_fake_barcode_graph.py: Take a barcode graph as input (gexf formatted) and outputs a barcode graph. The barcode graph is create by fusion of nodes from the molecule graph. * use the snakefile "Snakemake_data_simu". All the parameters can be an integer or a list of integer. @@ -43,20 +53,6 @@ Config parameters: * to_d2_graph.py: Mount a barcode graph into memory and create a d2 graph from it. -* evaluate.py: take a d2 graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a d2 graph where the truth is known in the node names (the format used to create fake data). - -* analyse_d2_tsv.py: Take an tsv optimization file of a d2 graph and look for the variables coverage. Outputs the missing variables (if exists). - -## Run the tests - - export PYTHONPATH=deconvolution/ - pytest tests - export PYTHONPATH= +* d2_to_path.py: take a d2 graph as input and explore the nodes to extract a udg path. - -## Tests for Cedric - -```bash - snakemake -s Snakefile_data_simu --config n=10000 m=[4,6,8,10,12] m_dev=[0,0.5,1,2,3] - snakemake -s Snakefile_d2 --config input=[snake_exec/simu_bar_n10000_d5_m10-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m10-dev0.gexf,snake_exec/simu_bar_n10000_d5_m10-dev1.gexf,snake_exec/simu_bar_n10000_d5_m10-dev2.gexf,snake_exec/simu_bar_n10000_d5_m10-dev3.gexf,snake_exec/simu_bar_n10000_d5_m12-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m12-dev0.gexf,snake_exec/simu_bar_n10000_d5_m12-dev1.gexf,snake_exec/simu_bar_n10000_d5_m12-dev2.gexf,snake_exec/simu_bar_n10000_d5_m12-dev3.gexf,snake_exec/simu_bar_n10000_d5_m4-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m4-dev0.gexf,snake_exec/simu_bar_n10000_d5_m4-dev1.gexf,snake_exec/simu_bar_n10000_d5_m4-dev2.gexf,snake_exec/simu_bar_n10000_d5_m4-dev3.gexf,snake_exec/simu_bar_n10000_d5_m6-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m6-dev0.gexf,snake_exec/simu_bar_n10000_d5_m6-dev1.gexf,snake_exec/simu_bar_n10000_d5_m6-dev2.gexf,snake_exec/simu_bar_n10000_d5_m6-dev3.gexf,snake_exec/simu_bar_n10000_d5_m8-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m8-dev0.gexf,snake_exec/simu_bar_n10000_d5_m8-dev1.gexf,snake_exec/simu_bar_n10000_d5_m8-dev2.gexf,snake_exec/simu_bar_n10000_d5_m8-dev3.gexf] -``` +* evaluate.py: take a d2 graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a d2 graph where the truth is known in the node names (the format used to create fake data).