Skip to content
Snippets Groups Projects
user avatar
Yoann Dufresne authored
c9f90cbf
History

10X-deconvolve

Trying to deconvolve single tag assignment for multiple molecules

Installation

Install the package from the root directory.

    # For users
    pip install . --user
    # For developers
    pip install -e . --user

Scripts

For the majority of the scripts, argparse is used. To know how to use it please use the -h option.

Data simulation

  • generate_fake_molecule_graph.py: Create a linear molecule graph, where the molecules are linked to the d molecules on their left and d molecules on their right.

  • generate_fake_barcode_graph.py: Take a barcode graph as input (gexf formated) and outputs a barcode graph. The barcode graph is create by fusion of nodes from the molecule graph.

  • use the snakefile "Snakemake_data_simu". All the parameters can be an integer or a list of integer. Each combination of parameter will generate a barcode graph.
    Config parameters:

    • n: the number of initial molecules
    • m: average number of node merged in each barcode
    • d: average coverage of a molecule in the initial graph
    • workdir: the directory to create and use as output

Data structures and algorithms

  • Create a d2 graph from barcode graph: use the snakemake "Snakefile_d2"
    The result will be generate as a compressed file in the workdir.
    Config parameters:

    • input: the input barcode graph (gexf format preferred).
    • workdir: The working and output directory.
  • to_d2_graph.py: Mount a barcode graph into memory and create a d2 graph from it.

  • evaluate.py: take a d2 graph gexf file and and analyse it. Look for an approximation of the longest correct path to reconstruct a molecule graph. Take as input a d2 graph where the truth is known in the node names (the format used to create fake data).

  • analyse_d2_tsv.py: Take an tsv optimization file of a d2 graph and look for the variables coverage. Outputs the missing variables (if exists).

Run the tests

export PYTHONPATH=deconvolution/
pytest tests
export PYTHONPATH=

Tests for Cedric

    snakemake -s Snakefile_data_simu --config n=10000 m=[4,6,8,10,12] m_dev=[0,0.5,1,2,3]
    snakemake -s Snakefile_d2 --config input=[snake_exec/simu_bar_n10000_d5_m10-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m10-dev0.gexf,snake_exec/simu_bar_n10000_d5_m10-dev1.gexf,snake_exec/simu_bar_n10000_d5_m10-dev2.gexf,snake_exec/simu_bar_n10000_d5_m10-dev3.gexf,snake_exec/simu_bar_n10000_d5_m12-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m12-dev0.gexf,snake_exec/simu_bar_n10000_d5_m12-dev1.gexf,snake_exec/simu_bar_n10000_d5_m12-dev2.gexf,snake_exec/simu_bar_n10000_d5_m12-dev3.gexf,snake_exec/simu_bar_n10000_d5_m4-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m4-dev0.gexf,snake_exec/simu_bar_n10000_d5_m4-dev1.gexf,snake_exec/simu_bar_n10000_d5_m4-dev2.gexf,snake_exec/simu_bar_n10000_d5_m4-dev3.gexf,snake_exec/simu_bar_n10000_d5_m6-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m6-dev0.gexf,snake_exec/simu_bar_n10000_d5_m6-dev1.gexf,snake_exec/simu_bar_n10000_d5_m6-dev2.gexf,snake_exec/simu_bar_n10000_d5_m6-dev3.gexf,snake_exec/simu_bar_n10000_d5_m8-dev0.5.gexf,snake_exec/simu_bar_n10000_d5_m8-dev0.gexf,snake_exec/simu_bar_n10000_d5_m8-dev1.gexf,snake_exec/simu_bar_n10000_d5_m8-dev2.gexf,snake_exec/simu_bar_n10000_d5_m8-dev3.gexf]