Skip to content
Snippets Groups Projects
Commit 3c815d52 authored by Blaise Li's avatar Blaise Li
Browse files

Started removing duplicates when joining counts.

I should check to be sure that this is the correct thing to do. The
other possibility would be to sum using agg:
https://stackoverflow.com/a/35403903/1878788
parent c25cadd8
No related branches found
No related tags found
No related merge requests found
......@@ -992,7 +992,8 @@ rule join_all_counts:
output:
counts_table = OPJ(output_dir, "{trimmer}", aligner, "mapped_C_elegans", "{counter}", "all_on_C_elegans", "alltypes_{orientation}_counts.txt"),
run:
counts_data = pd.concat((pd.read_table(table, index_col="gene") for table in input.counts_tables))
counts_data = pd.concat((pd.read_table(table, index_col="gene") for table in input.counts_tables)).drop_duplicates()
assert len(counts_data.index.unique()) == len(counts_data.index), "Some genes appear several times in the counts table."
counts_data.index.names = ["gene"]
counts_data.to_csv(output.counts_table, sep="\t")
......
......@@ -258,7 +258,7 @@ rule sam2indexedbam:
resources:
io=45
threads:
4
8
wrapper:
"file:///pasteur/homes/bli/src/bioinfo_utils/snakemake_wrappers/sam2indexedbam"
......@@ -712,7 +712,8 @@ rule join_all_counts:
output:
counts_table = OPJ(mapping_dir, aligner, "mapped_C_elegans", "{counter}", "all_on_C_elegans", "alltypes_{orientation}_counts.txt"),
run:
counts_data = pd.concat((pd.read_table(table, index_col="gene") for table in input.counts_tables))
counts_data = pd.concat((pd.read_table(table, index_col="gene") for table in input.counts_tables)).drop_duplicates()
assert len(counts_data.index.unique()) == len(counts_data.index), "Some genes appear several times in the counts table."
counts_data.index.names = ["gene"]
counts_data.to_csv(output.counts_table, sep="\t")
......
......@@ -1236,6 +1236,7 @@ rule gather_small_RNA_counts:
counts_data.to_csv(output.counts_table, sep="\t")
# TODO: drop duplicates or sum counts when duplicate row indices?
rule join_si_counts:
"""concat SI_TYPES (prot_si, te_si, pseu_si, satel_si and simrep_si) into si"""
input:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment