# "Lets call a graph a d-graph if the vertices can be ordered on a line and any vertices within d/2 of each other are connected by an edge. Observe that a d-graph has all vertices of degree d, except at the very end and beginning. For example, for d=6, vertex at position 10 will be connected to 7,8,9,11,12,13. A d-graph (sort of?) captures what the neighborhood of a molecule would look like."

#

# alternatively, observe that a d-graph can be seen as a line in a special overlap graph, where the nodes are the set of neighbors of a certain barcode graph node node, and two nodes in the special overlap graph are linked by an edge if the neighbors intersect over d-1 (or d-2?) elements

# arguments: graphml file

# attempts to deconvolve a barcode graph using d-graph detection

# actually a very loose definition of d-graph, where consecutive neighbors of node on the line don't need to be exactly of cardinality +1/-1

# (for that, use strict_d_line_compatible_neighbors)

# but can be +3/-3

defis_d_graph(graph,all_neighbors_graph):

Gn=nx.Graph()# create a graph of 'neighbor-compatibility': whether two nodes in the original graphs have 'almost' the same neighbors-set, apart from one to the left and one to the right (typical in d-graphs)

#print("tentative d graph",list(nx.connected_components(Gn)))

iflen(list(nx.connected_components(Gn)))!=1:

returnFalse

# this is really a heuristic:

#

# for all nodes in the putative d-graph, make sure their neighbors are all in majority in the component rathe than the whole graph

# this is a critical filter to remove putative d-graphs that are made of multiple molecules

# see e.g. 100_5_2-neighbors_of_molecule_21_83 without that filter..

# 3 d-graphs found (2 ok), dont 1 pas ok:

# d graph found ['10:24_65', '12:66_19', '22:25_81', '37:22_42', '46:63_86', '9:85_45'] -> chimere des molecules voisines de 21 et 83 via le barcode 22:25_81

# number of molecule per barcode: 10 (~/10x/drosophila/chen_data_longranger_run_on_ref/outs$ samtools view phased_possorted_bam.bam | python ~/10x-barcode-graph/scripts/sam_stats.py)

# so in total, 500k molecules

# i.e. the molecule coverage is around 140Mbp/500kbp = 280x

# conservatively, it seems that we can get overlaps for at least 20 neighbor molecules

# so in that setting, considering each molecule as '1bp', i.e. scaling the genome down to 140Mbp/70kbp=2Mbp