update readme regarding the ongoing work

2403e42b · Yoann Dufresne · ed49295c · 2403e42b
Commit 2403e42b authored 5 years ago by Yoann Dufresne
--- a/README.md
+++ b/README.md
-# 10X-deconvolve
+# Linked Reads molecule separation

-Trying to deconvolve single tag assignment for multiple molecules
+A compilation of scripts and pipelines to count and extract scaffolds of barcodes from linked reads datasets.
+
+## Nomenclature warnings
+During the process of writing a scientific article, some of the datastructure names have been modified.
+In this repository the majority of the names are old names.
+So, here is a short list of equivalences:
+- unit d-graph -> local clique pair
+- udg -> lcp
+- d²-graph (or d2-graph) -> lcp graph
+- udg divergence = lcp weight
+- udg edge distance = lcp edge weight

 ## Installation

@@ -15,13 +25,13 @@ Install the package from the root directory.
 ## Scripts

 For the majority of the scripts, argparse is used.
-To know how to use it please use the -h option.
+To know how to use it please use the -h command line option.

 ### Data simulation

 * generate_fake_molecule_graph.py: Create a linear molecule graph, where the molecules are linked to the d molecules on their left and d molecules on their right.

-* generate_fake_barcode_graph.py: Take a barcode graph as input (gexf formated) and outputs a barcode graph. The barcode graph is create by fusion of nodes from the molecule graph.
+* generate_fake_barcode_graph.py: Take a barcode graph as input (gexf formatted) and outputs a barcode graph. The barcode graph is create by fusion of nodes from the molecule graph.

 * use the snakefile "Snakemake_data_simu".
 All the parameters can be an integer or a list of integer.
@@ -43,20 +53,6 @@ Config parameters:

 * to_d2_graph.py: Mount a barcode graph into memory and create a d2 graph from it.

-* evaluate.py: take a d2 graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a d2 graph where the truth is known in the node names (the format used to create fake data).
-
-* analyse_d2_tsv.py: Take an tsv optimization file of a d2 graph and look for the variables coverage. Outputs the missing variables (if exists).
-
-## Run the tests
-
-    export PYTHONPATH=deconvolution/
-    pytest tests
-    export PYTHONPATH=
+* d2_to_path.py: take a d2 graph as input and explore the nodes to extract a udg path.

-
-## Tests for Cedric
-
-```bash
-    snakemake -s Snakefile_data_simu --config n=10000 m=[4,6,8,10,12] m_dev=[0,0.5,1,2,3]
-    snakemake -s Snakefile_d2 --config input=[snake_exec/simu_bar_n10000_d5_m10-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m10-dev0.gexf,snake_exec/simu_bar_n10000_d5_m10-dev1.gexf,snake_exec/simu_bar_n10000_d5_m10-dev2.gexf,snake_exec/simu_bar_n10000_d5_m10-dev3.gexf,snake_exec/simu_bar_n10000_d5_m12-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m12-dev0.gexf,snake_exec/simu_bar_n10000_d5_m12-dev1.gexf,snake_exec/simu_bar_n10000_d5_m12-dev2.gexf,snake_exec/simu_bar_n10000_d5_m12-dev3.gexf,snake_exec/simu_bar_n10000_d5_m4-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m4-dev0.gexf,snake_exec/simu_bar_n10000_d5_m4-dev1.gexf,snake_exec/simu_bar_n10000_d5_m4-dev2.gexf,snake_exec/simu_bar_n10000_d5_m4-dev3.gexf,snake_exec/simu_bar_n10000_d5_m6-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m6-dev0.gexf,snake_exec/simu_bar_n10000_d5_m6-dev1.gexf,snake_exec/simu_bar_n10000_d5_m6-dev2.gexf,snake_exec/simu_bar_n10000_d5_m6-dev3.gexf,snake_exec/simu_bar_n10000_d5_m8-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m8-dev0.gexf,snake_exec/simu_bar_n10000_d5_m8-dev1.gexf,snake_exec/simu_bar_n10000_d5_m8-dev2.gexf,snake_exec/simu_bar_n10000_d5_m8-dev3.gexf]
-```
+* evaluate.py: take a d2 graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a d2 graph where the truth is known in the node names (the format used to create fake data).