README.md 3.2 KB
Newer Older
Yoann  DUFRESNE's avatar
Yoann DUFRESNE committed
1
2
# 10X-deconvolve

Yoann Dufresne's avatar
Yoann Dufresne committed
3
4
Trying to deconvolve single tag assignment for multiple molecules

5
6
7
8
9
10
11
12
13
14
## Installation

Install the package from the root directory.
```bash
    # For users
    pip install . --user
    # For developers
    pip install -e . --user
```

15
## Scripts
Yoann Dufresne's avatar
Yoann Dufresne committed
16

17
18
19
20
21
22
23
24
25
For the majority of the scripts, argparse is used.
To know how to use it please use the -h option.

### Data simulation

* generate_fake_molecule_graph.py: Create a linear molecule graph, where the molecules are linked to the d molecules on their left and d molecules on their right.

* generate_fake_barcode_graph.py: Take a barcode graph as input (gexf formated) and outputs a barcode graph. The barcode graph is create by fusion of nodes from the molecule graph.

26
27
* use the snakefile "Snakemake_data_simu".
All the parameters can be an integer or a list of integer.
28
Each combination of parameter will generate a barcode graph.  
29
30
31
32
33
34
Config parameters:
  * n: the number of initial molecules
  * m: average number of node merged in each barcode
  * d: average coverage of a molecule in the initial graph
  * workdir: the directory to create and use as output

35
36
### Data structures and algorithms

37
38
39
40
41
42
43
* Create a d2 graph from barcode graph: use the snakemake "Snakefile_d2"  
The result will be generate as a compressed file in the workdir.  
Config parameters:

  * input: the input barcode graph (gexf format preferred).
  * workdir: The working and output directory.

44
45
46
47
48
* to_d2_graph.py: Mount a barcode graph into memory and create a d2 graph from it.

* evaluate.py: take a d2 graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a d2 graph where the truth is known in the node names (the format used to create fake data).

* analyse_d2_tsv.py: Take an tsv optimization file of a d2 graph and look for the variables coverage. Outputs the missing variables (if exists).
Yoann Dufresne's avatar
Yoann Dufresne committed
49
50
51

## Run the tests

52
    export PYTHONPATH=deconvolution/
Yoann Dufresne's avatar
Yoann Dufresne committed
53
    pytest tests
54
    export PYTHONPATH=
55
56
57
58
59
60


## Tests for Cedric

```bash
    snakemake -s Snakefile_data_simu --config n=10000 m=[4,6,8,10,12] m_dev=[0,0.5,1,2,3]
Yoann Dufresne's avatar
Yoann Dufresne committed
61
    snakemake -s Snakefile_d2 --config input=[snake_exec/simu_bar_n10000_d5_m10-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m10-dev0.gexf,snake_exec/simu_bar_n10000_d5_m10-dev1.gexf,snake_exec/simu_bar_n10000_d5_m10-dev2.gexf,snake_exec/simu_bar_n10000_d5_m10-dev3.gexf,snake_exec/simu_bar_n10000_d5_m12-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m12-dev0.gexf,snake_exec/simu_bar_n10000_d5_m12-dev1.gexf,snake_exec/simu_bar_n10000_d5_m12-dev2.gexf,snake_exec/simu_bar_n10000_d5_m12-dev3.gexf,snake_exec/simu_bar_n10000_d5_m4-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m4-dev0.gexf,snake_exec/simu_bar_n10000_d5_m4-dev1.gexf,snake_exec/simu_bar_n10000_d5_m4-dev2.gexf,snake_exec/simu_bar_n10000_d5_m4-dev3.gexf,snake_exec/simu_bar_n10000_d5_m6-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m6-dev0.gexf,snake_exec/simu_bar_n10000_d5_m6-dev1.gexf,snake_exec/simu_bar_n10000_d5_m6-dev2.gexf,snake_exec/simu_bar_n10000_d5_m6-dev3.gexf,snake_exec/simu_bar_n10000_d5_m8-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m8-dev0.gexf,snake_exec/simu_bar_n10000_d5_m8-dev1.gexf,snake_exec/simu_bar_n10000_d5_m8-dev2.gexf,snake_exec/simu_bar_n10000_d5_m8-dev3.gexf]
62
```