"- Import the count_matrix tsv file from the data folder"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "indian-response",
"id": "eb53a1f5-9ea7-491e-bcfa-820cb1663af5",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "scenic-adoption",
"id": "c80d9947-9ccf-4499-a1c2-9194377cd054",
"metadata": {},
"source": [
"- Plot the distribution of age for the people vaccinated for the flu"
"- Simplify the dataframe to only have the \"Geneid\", \"WTx\" and \"Cx\" columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "weighted-terrain",
"id": "56e90032-75ce-47b5-9cd3-95219cd7b26e",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "operating-union",
"id": "eb65b51f-f689-4a66-b47c-e79f0e9eba52",
"metadata": {},
"source": [
"- Feel free to explore more of [seaborn](https://seaborn.pydata.org/examples/index.html) !"
"- Format properly your DataFrame to be able to use https://seaborn.pydata.org/generated/seaborn.clustermap.html to realize a heatmap."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9b422fcb-7cc1-4766-92e3-276742381ae6",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "f8d6188e-3a37-4ba5-b377-a11696054e9c",
"metadata": {},
"source": [
"- Explore the clustermap documentation to have a more visual heatmap by standardizing the data within genes."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "06be3f98-2167-44ac-9318-955286d77903",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "2e61a207-223a-4c01-88ea-76b1b8c3a0b9",
"metadata": {},
"source": [
"- Reformat the counts_df dataframe to have genes in columns and samples in rows.\n",
"- Add a \"group\" column defining the grouping of the samples:\n",
" - \"WTx\" samples will be from the \"WT\" group.\n",
" - \"Cx\" samples will be from the \"C\" group."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eea3f521-6960-44ab-ac0b-fcf5a002237f",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "9a88ecb1-9ed3-4160-91ee-24a30e994b71",
"metadata": {},
"source": [
"- Display a barplot showing the mean expression for each group for a particular gene (for example \"gene-LEPBI_RS00065\")."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cf74e85e-eef3-4023-bb88-5a864cf3c3f9",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "99e2455a-cb7d-44d5-a4a0-2cf272c814ab",
"metadata": {},
"source": [
"- Try plotting a swarmplot on top of the previous barplot:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7cf225f9-aea7-4cd9-ac90-a99592799527",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "d200d375-362e-4c1d-a88e-130b094e6feb",
"metadata": {},
"source": [
"- Now plot the same data using a boxplot. Can you see the problem of displaying boxplots for this kind of data ?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e4daf00e-9a2c-4ec4-9d26-aa18aae5d82d",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "2e1cabe0-aab7-4f0e-888e-81aae7d5df8d",
"metadata": {},
"source": [
"- Compute the median of each genes by groups:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ffd0f59-0fd7-41b9-a87a-c6e1a74145e8",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "308cc10b-6727-4bc5-b05d-4777037e252e",
"metadata": {},
"source": [
"We are going now to add extra annotations to this median table in order to identify genes of interest.\n",
"- Import the annotation.csv table from the data folder: "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9be6ee5b-d497-47fa-8ac5-cf5514fd52c0",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "50fa81a7-3f34-4160-ad2d-f77d21be9ac0",
"metadata": {},
"source": [
"Annotations in this table are available for many types of loci (the \"genetic_type\" column), but here we will focus on the \"gene\" genetic_type. \n",
"- Filter the annotation dataframe to have only \"gene\" as \"genetic_type\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f9a8bcf7-0bcc-43e8-828a-ec204658e528",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "f8a4e744-e7e2-43b6-b3d4-e59feb40d3ff",
"metadata": {},
"source": [
"- Concatenate the dataframe with median by group and the annotation dataframe together:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "afd8467a-33e1-4b9e-8f6d-b2229099c874",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "af9f8e1f-5f8b-4152-b08a-44e957f13cec",
"metadata": {},
"source": [
"- Calculate an estimate of the gene expression fold change for each gene (by dividing the C median expressions by WT median expressions).\n",
"- Add it as a \"FoldChange\" column to the previous dataframe."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb617d00-2c2d-45cc-ace0-3656dc999b17",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "d70eb26b-0a26-4bbc-af03-ba8781b09fb5",
"metadata": {},
"source": [
"- Use a barplot to display fold changes and using the new gene annotation (The \"Name\" column)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4dd4cbee-547f-43f1-9ed7-173f3040b8d5",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "34a26492-7c6b-4a07-a4de-67ec8f693cdc",
"metadata": {},
"source": [
"- By calculating the length of each gene and using a visualisation, does gene expression appears correlated with gene length ?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6f35b696-0807-4df4-9310-cb9197e7bf85",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "a2627322-e6a5-422f-8a69-b89dbd4b777e",
"metadata": {},
"source": [
"- Create a function which produce a single image with four different plots of your choice and save it to pdf file."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "70e001a1-2848-4fb7-9f33-7beb4475e0fc",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "0d05aba4-3c85-4cd9-85f3-5296b19308fb",
"metadata": {},
"source": [
"# Extras"
]
},
{
"cell_type": "markdown",
"id": "66d6668e-683f-462e-a72f-28bdda8736f2",
"metadata": {},
"source": [
"- Using ipywidget, make a function to display barplot of gene expression by groups with the gene being selected by the user (using a Dropdown widget for example)."
- Using ipywidget, make a function to display barplot of gene expression by groups with the gene being selected by the user (using a Dropdown widget for example).