From edf49df94ad0b2f709c29c97750bd49153487bd1 Mon Sep 17 00:00:00 2001 From: Etienne Kornobis <ekornobis@gmail.com> Date: Mon, 19 Sep 2022 23:58:16 +0200 Subject: [PATCH] update seaborn practicals (2) --- notebooks/seaborn_TP.ipynb | 330 ++++++++++++++++++++++++++++++++++--- 1 file changed, 308 insertions(+), 22 deletions(-) diff --git a/notebooks/seaborn_TP.ipynb b/notebooks/seaborn_TP.ipynb index 0ef1a6f..47d0a4c 100644 --- a/notebooks/seaborn_TP.ipynb +++ b/notebooks/seaborn_TP.ipynb @@ -2,10 +2,10 @@ "cells": [ { "cell_type": "markdown", - "id": "rotary-designation", + "id": "instrumental-personal", "metadata": {}, "source": [ - "# <center>**TP**</center>\n", + "# <center><b>Hands-on</b></center>\n", "\n", "<div style=\"text-align:center\">\n", " <img src=\"images/seaborn.png\" width=\"600px\">\n", @@ -21,106 +21,392 @@ }, { "cell_type": "markdown", - "id": "respected-history", + "id": "compliant-basis", "metadata": {}, "source": [ "Practice your graphing skills using data from milieu intérieur in `data/mi.csv`:" ] }, + { + "cell_type": "markdown", + "id": "departmental-exhibition", + "metadata": {}, + "source": [ + "- Do a boxplot showing the differences in temperature between females and males:" + ] + }, { "cell_type": "code", "execution_count": null, - "id": "adolescent-spirituality", + "id": "98e904b6-6e90-4c74-a463-2339d3961250", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", - "id": "widespread-rendering", + "id": "portuguese-worse", "metadata": {}, "source": [ - "- Do a boxplot showing the differences in temperature between females and males:" + "- Using a histogram and continuous probability density curve, display the distribution of age in the dataset" ] }, { "cell_type": "code", "execution_count": null, - "id": "dressed-performer", + "id": "55756807-e1fb-4fb5-878c-5e46acea7a11", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", - "id": "acute-debut", + "id": "prepared-stephen", "metadata": {}, "source": [ - "- Using an histogram, display the distribution of age in the dataset (with kde as well)" + "- Use a barplot to show the count of vaccinated for yellow fever (see the documentation for a countplot)" ] }, { "cell_type": "code", "execution_count": null, - "id": "crucial-bracelet", + "id": "1425046c-a058-45fe-95b5-5eca6ebbd33a", "metadata": {}, "outputs": [], "source": [] }, + { + "cell_type": "markdown", + "id": "immediate-method", + "metadata": {}, + "source": [ + "- Plot the distribution of age for the people vaccinated for the flu" + ] + }, { "cell_type": "code", "execution_count": null, - "id": "minor-secretariat", + "id": "d567194c-3698-44c9-b5f8-b8a3d3493b0c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", - "id": "processed-diameter", + "id": "temporal-synthesis", "metadata": {}, "source": [ - "- Use a barplot to show the count of vaccinated for yellow fever (see the documentation for a countplot)" + "- Feel free to explore more of [seaborn](https://seaborn.pydata.org/examples/index.html) !" + ] + }, + { + "cell_type": "markdown", + "id": "db56d49a-4770-4f9e-af6b-78960574d338", + "metadata": {}, + "source": [ + "# Exploring count matrices from RNA-seq data" + ] + }, + { + "cell_type": "markdown", + "id": "5377668b-dea5-4c20-8249-5266f98774eb", + "metadata": {}, + "source": [ + "<img src=\"images/rnaseq.png\" style=\"margin:0 auto;width:800px\">" + ] + }, + { + "cell_type": "markdown", + "id": "ebf1606b-0b21-4821-a899-551ec33c977e", + "metadata": {}, + "source": [ + "- Import the count_matrix tsv file from the data folder" ] }, { "cell_type": "code", "execution_count": null, - "id": "indian-response", + "id": "eb53a1f5-9ea7-491e-bcfa-820cb1663af5", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", - "id": "scenic-adoption", + "id": "c80d9947-9ccf-4499-a1c2-9194377cd054", "metadata": {}, "source": [ - "- Plot the distribution of age for the people vaccinated for the flu" + "- Simplify the dataframe to only have the \"Geneid\", \"WTx\" and \"Cx\" columns" ] }, { "cell_type": "code", "execution_count": null, - "id": "weighted-terrain", + "id": "56e90032-75ce-47b5-9cd3-95219cd7b26e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", - "id": "operating-union", + "id": "eb65b51f-f689-4a66-b47c-e79f0e9eba52", "metadata": {}, "source": [ - "- Feel free to explore more of [seaborn](https://seaborn.pydata.org/examples/index.html) !" + "- Format properly your DataFrame to be able to use https://seaborn.pydata.org/generated/seaborn.clustermap.html to realize a heatmap." ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b422fcb-7cc1-4766-92e3-276742381ae6", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "f8d6188e-3a37-4ba5-b377-a11696054e9c", + "metadata": {}, + "source": [ + "- Explore the clustermap documentation to have a more visual heatmap by standardizing the data within genes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "06be3f98-2167-44ac-9318-955286d77903", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "2e61a207-223a-4c01-88ea-76b1b8c3a0b9", + "metadata": {}, + "source": [ + "- Reformat the counts_df dataframe to have genes in columns and samples in rows.\n", + "- Add a \"group\" column defining the grouping of the samples:\n", + " - \"WTx\" samples will be from the \"WT\" group.\n", + " - \"Cx\" samples will be from the \"C\" group." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eea3f521-6960-44ab-ac0b-fcf5a002237f", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "9a88ecb1-9ed3-4160-91ee-24a30e994b71", + "metadata": {}, + "source": [ + "- Display a barplot showing the mean expression for each group for a particular gene (for example \"gene-LEPBI_RS00065\")." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf74e85e-eef3-4023-bb88-5a864cf3c3f9", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "99e2455a-cb7d-44d5-a4a0-2cf272c814ab", + "metadata": {}, + "source": [ + "- Try plotting a swarmplot on top of the previous barplot:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7cf225f9-aea7-4cd9-ac90-a99592799527", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "d200d375-362e-4c1d-a88e-130b094e6feb", + "metadata": {}, + "source": [ + "- Now plot the same data using a boxplot. Can you see the problem of displaying boxplots for this kind of data ?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e4daf00e-9a2c-4ec4-9d26-aa18aae5d82d", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "2e1cabe0-aab7-4f0e-888e-81aae7d5df8d", + "metadata": {}, + "source": [ + "- Compute the median of each genes by groups:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6ffd0f59-0fd7-41b9-a87a-c6e1a74145e8", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "308cc10b-6727-4bc5-b05d-4777037e252e", + "metadata": {}, + "source": [ + "We are going now to add extra annotations to this median table in order to identify genes of interest.\n", + "- Import the annotation.csv table from the data folder: " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9be6ee5b-d497-47fa-8ac5-cf5514fd52c0", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "50fa81a7-3f34-4160-ad2d-f77d21be9ac0", + "metadata": {}, + "source": [ + "Annotations in this table are available for many types of loci (the \"genetic_type\" column), but here we will focus on the \"gene\" genetic_type. \n", + "- Filter the annotation dataframe to have only \"gene\" as \"genetic_type\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9a8bcf7-0bcc-43e8-828a-ec204658e528", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "f8a4e744-e7e2-43b6-b3d4-e59feb40d3ff", + "metadata": {}, + "source": [ + "- Concatenate the dataframe with median by group and the annotation dataframe together:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "afd8467a-33e1-4b9e-8f6d-b2229099c874", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "af9f8e1f-5f8b-4152-b08a-44e957f13cec", + "metadata": {}, + "source": [ + "- Calculate an estimate of the gene expression fold change for each gene (by dividing the C median expressions by WT median expressions).\n", + "- Add it as a \"FoldChange\" column to the previous dataframe." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bb617d00-2c2d-45cc-ace0-3656dc999b17", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "d70eb26b-0a26-4bbc-af03-ba8781b09fb5", + "metadata": {}, + "source": [ + "- Use a barplot to display fold changes and using the new gene annotation (The \"Name\" column)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4dd4cbee-547f-43f1-9ed7-173f3040b8d5", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "34a26492-7c6b-4a07-a4de-67ec8f693cdc", + "metadata": {}, + "source": [ + "- By calculating the length of each gene and using a visualisation, does gene expression appears correlated with gene length ?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f35b696-0807-4df4-9310-cb9197e7bf85", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "a2627322-e6a5-422f-8a69-b89dbd4b777e", + "metadata": {}, + "source": [ + "- Create a function which produce a single image with four different plots of your choice and save it to pdf file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70e001a1-2848-4fb7-9f33-7beb4475e0fc", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "0d05aba4-3c85-4cd9-85f3-5296b19308fb", + "metadata": {}, + "source": [ + "# Extras" + ] + }, + { + "cell_type": "markdown", + "id": "66d6668e-683f-462e-a72f-28bdda8736f2", + "metadata": {}, + "source": [ + "- Using ipywidget, make a function to display barplot of gene expression by groups with the gene being selected by the user (using a Dropdown widget for example)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e587f202-7ca4-43fb-ac3c-015c740c69d2", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python [conda env:dev]", "language": "python", - "name": "python3" + "name": "conda-env-dev-py" }, "language_info": { "codemirror_mode": { @@ -132,7 +418,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.10.4" } }, "nbformat": 4, -- GitLab