Skip to content
Snippets Groups Projects
Commit 9b4e17e3 authored by Blaise Li's avatar Blaise Li
Browse files

Optional fillna in exclude_all_nan_cols.

The default behaviour has been changed to be of more general use.
The previous default was to fill the NaNs not belonging to
all NaN columns with 0, which was suitable for standardized usage
biases (which are expected to be centered on 0), but we might prefer
to keep NaNs if we want to later replace those values with something
else when dealing with other usage metrics than biases. If a single
filling value is desired, it can be set using argument fill_other_nas.

Implementation detail: The fillna still happens, but with a default
value of np.nan, which hopefully should be OK.
parent 2f2bba77
Branches
No related tags found
No related merge requests found
__copyright__ = "Copyright (C) 2022-2023 Blaise Li" __copyright__ = "Copyright (C) 2022-2023 Blaise Li"
__licence__ = "GNU GPLv3" __licence__ = "GNU GPLv3"
__version__ = "0.27.5" __version__ = "0.28.0"
from .libcodonusage import ( from .libcodonusage import (
aa2colour, aa2colour,
aa_usage, aa_usage,
......
...@@ -794,27 +794,28 @@ across genes) so that they are more comparable between amino-acids. ...@@ -794,27 +794,28 @@ across genes) so that they are more comparable between amino-acids.
return standardized_aa_usage_biases return standardized_aa_usage_biases
def exclude_all_nan_cols(standardized_usage_biases): def exclude_all_nan_cols(usage_table, fill_other_nas=np.nan):
""" """
Detect columns in *standardized_usage_biases* that contain only NaNs Detect columns in *usage_table* that contain only NaNs
and remove them from the table. and remove them from the table. Other NaN values are replaced
with *fill_other_nas*.
""" """
render_md(""" render_md("""
Standardization may result in division by zero for usage biases Standardization may result in division by zero for usage data
that have a zero standard deviation. that have a zero standard deviation.
This is expected to be the case for "by amino-acid" usage biases This is expected to be the case for "by amino-acid" usage biases
for codons corresponding to amino-acids having only one codon: for codons corresponding to amino-acids having only one codon:
methionine (M) and tryptophan (W). methionine (M) and tryptophan (W).
""") """)
all_nan_cols = standardized_usage_biases.columns[ all_nan_cols = usage_table.columns[
standardized_usage_biases.isna().all()] usage_table.isna().all()]
if len(all_nan_cols): if len(all_nan_cols):
render_md("The following columns contain only NaNs:") render_md("The following columns contain only NaNs:")
display(all_nan_cols) display(all_nan_cols)
render_md("This likely resulted from a division by zero.") render_md("This likely resulted from a division by zero.")
render_md("These columns will be excluded.") render_md("These columns will be excluded.")
return ( return (
standardized_usage_biases.drop(columns=all_nan_cols).fillna(0), usage_table.drop(columns=all_nan_cols).fillna(fill_other_nas),
all_nan_cols) all_nan_cols)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment