diff --git a/notebooks/jupyter_TP.ipynb b/notebooks/jupyter_TP.ipynb index ce6afd5ac52e821553139ef77be418f226df68f8..67a96a43f5b91c050fe3da391bb73794d1ef69ad 100644 --- a/notebooks/jupyter_TP.ipynb +++ b/notebooks/jupyter_TP.ipynb @@ -2,7 +2,26 @@ "cells": [ { "cell_type": "markdown", - "id": "cultural-palestine", + "id": "functional-attraction", + "metadata": {}, + "source": [ + "# <center>**TP**</center>\n", + "\n", + "<div style=\"text-align:center\">\n", + " <img src=\"images/jupyter.png\" width=\"600px\">\n", + " <div>\n", + " Bertrand Néron, François Laurent, Etienne Kornobis\n", + " <br />\n", + " <a src=\" https://research.pasteur.fr/en/team/bioinformatics-and-biostatistics-hub/\">Bioinformatics and Biostatistiqucs HUB</a>\n", + " <br />\n", + " © Institut Pasteur, 2021\n", + " </div> \n", + "</div>" + ] + }, + { + "cell_type": "markdown", + "id": "advance-vaccine", "metadata": {}, "source": [ "# Introduction to JupyterLab\n", @@ -115,7 +134,7 @@ { "cell_type": "code", "execution_count": 13, - "id": "public-nightlife", + "id": "respective-prize", "metadata": {}, "outputs": [ { @@ -140,7 +159,7 @@ }, { "cell_type": "markdown", - "id": "marine-arctic", + "id": "pharmaceutical-college", "metadata": {}, "source": [ "- The exclamation mark character ``!`` can be used as well to execute the following line in a bash subprocess. For example:" @@ -149,7 +168,7 @@ { "cell_type": "code", "execution_count": 21, - "id": "considerable-fleet", + "id": "drawn-soldier", "metadata": {}, "outputs": [ { @@ -166,7 +185,7 @@ }, { "cell_type": "markdown", - "id": "satellite-disposal", + "id": "natural-submission", "metadata": {}, "source": [ "- `%timeit` can be used to check for execution times:" @@ -175,7 +194,7 @@ { "cell_type": "code", "execution_count": 18, - "id": "delayed-thunder", + "id": "under-embassy", "metadata": {}, "outputs": [ { @@ -192,7 +211,7 @@ }, { "cell_type": "markdown", - "id": "vocational-jacksonville", + "id": "constant-driving", "metadata": {}, "source": [ "- Load more extension for the notebook, for example `autoreload` is useful extension to automatically reload a module imported in a Jupyter notebook if the module has changed locally:" @@ -201,7 +220,7 @@ { "cell_type": "code", "execution_count": 22, - "id": "physical-steering", + "id": "waiting-credit", "metadata": {}, "outputs": [ { @@ -220,7 +239,7 @@ }, { "cell_type": "markdown", - "id": "regular-tiger", + "id": "laden-seeker", "metadata": {}, "source": [ "# Exercices" @@ -228,7 +247,7 @@ }, { "cell_type": "markdown", - "id": "rotary-bouquet", + "id": "fitted-insert", "metadata": {}, "source": [ "The aim here is to get comfortable in Jupyterlab.\n", @@ -245,14 +264,14 @@ { "cell_type": "code", "execution_count": null, - "id": "chinese-values", + "id": "international-thomson", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", - "id": "useful-segment", + "id": "olympic-shoot", "metadata": {}, "source": [ "## Exercise\n", @@ -268,14 +287,14 @@ { "cell_type": "code", "execution_count": null, - "id": "classical-extraction", + "id": "creative-conditioning", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", - "id": "iraqi-wholesale", + "id": "bibliographic-concern", "metadata": {}, "source": [ "## Exercise\n", @@ -293,14 +312,14 @@ { "cell_type": "code", "execution_count": null, - "id": "refined-relation", + "id": "ruled-bottle", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", - "id": "manufactured-treatment", + "id": "verbal-field", "metadata": {}, "source": [ "## Exercise\n", @@ -317,14 +336,14 @@ { "cell_type": "code", "execution_count": null, - "id": "featured-converter", + "id": "identified-calculation", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", - "id": "constant-thriller", + "id": "unauthorized-carter", "metadata": {}, "source": [ "## Exercise\n", @@ -335,14 +354,14 @@ { "cell_type": "code", "execution_count": null, - "id": "illegal-preserve", + "id": "abandoned-shareware", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", - "id": "written-bidding", + "id": "molecular-census", "metadata": {}, "source": [ "## Exercise\n", @@ -353,14 +372,14 @@ { "cell_type": "code", "execution_count": null, - "id": "waiting-concord", + "id": "overall-assurance", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", - "id": "varying-providence", + "id": "static-array", "metadata": {}, "source": [ "# More documentation\n", diff --git a/notebooks/pandas_TP_solution.ipynb b/notebooks/pandas_TP_solution.ipynb index ee623af1451cb9e82d3ee8e622a86fe8bbd6560b..42fb09a32044037b6030dac06fcd94a272e538e7 100644 --- a/notebooks/pandas_TP_solution.ipynb +++ b/notebooks/pandas_TP_solution.ipynb @@ -2,14 +2,14 @@ "cells": [ { "cell_type": "markdown", - "id": "right-artwork", + "id": "integral-thermal", "metadata": {}, "source": [ "# <center>**TP**</center>\n", "\n", "<img src=\"./images/pandas_logo.svg\">\n", "<div style=\"text-align:center\">\n", - " Bertrand Néron\n", + " Bertrand Néron, François Laurent, Etienne Kornobis\n", " <br />\n", " <a src=\" https://research.pasteur.fr/en/team/bioinformatics-and-biostatistics-hub/\">Bioinformatics and Biostatistiqucs HUB</a>\n", " <br />\n", @@ -19,7 +19,7 @@ }, { "cell_type": "markdown", - "id": "sacred-breathing", + "id": "trained-fighter", "metadata": {}, "source": [ "# Exploring Blast results" @@ -27,7 +27,7 @@ }, { "cell_type": "markdown", - "id": "technical-crystal", + "id": "manufactured-cursor", "metadata": {}, "source": [ "- Import the file data/blast.txt into a pandas dataframe variable (named `blast_res`). Verify that its type is a pandas\n", @@ -40,7 +40,7 @@ { "cell_type": "code", "execution_count": 1, - "id": "recreational-seller", + "id": "musical-violence", "metadata": {}, "outputs": [], "source": [ @@ -50,7 +50,7 @@ { "cell_type": "code", "execution_count": 4, - "id": "major-dream", + "id": "loaded-transfer", "metadata": {}, "outputs": [], "source": [ @@ -61,7 +61,7 @@ { "cell_type": "code", "execution_count": 5, - "id": "parliamentary-heaven", + "id": "streaming-regulation", "metadata": {}, "outputs": [ { @@ -82,7 +82,7 @@ { "cell_type": "code", "execution_count": 6, - "id": "changing-drive", + "id": "unsigned-coast", "metadata": {}, "outputs": [ { @@ -332,7 +332,7 @@ }, { "cell_type": "markdown", - "id": "productive-chorus", + "id": "dominant-knowing", "metadata": {}, "source": [ "Explore ``blast_res`` dataframe:\n", @@ -346,7 +346,7 @@ { "cell_type": "code", "execution_count": 7, - "id": "yellow-matthew", + "id": "simplified-progress", "metadata": {}, "outputs": [ { @@ -492,7 +492,7 @@ { "cell_type": "code", "execution_count": 8, - "id": "handled-details", + "id": "narrow-smell", "metadata": {}, "outputs": [ { @@ -689,7 +689,7 @@ { "cell_type": "code", "execution_count": 9, - "id": "virgin-forestry", + "id": "identical-guest", "metadata": {}, "outputs": [ { @@ -868,7 +868,7 @@ { "cell_type": "code", "execution_count": 10, - "id": "superb-papua", + "id": "alpine-cleveland", "metadata": {}, "outputs": [ { @@ -888,7 +888,7 @@ }, { "cell_type": "markdown", - "id": "fourth-pennsylvania", + "id": "imposed-squad", "metadata": {}, "source": [ "- Extract 3rd line from the ``blast_res`` dataframe. Which type of data structure is returned by this extraction ?" @@ -897,7 +897,7 @@ { "cell_type": "code", "execution_count": 11, - "id": "binding-interest", + "id": "complicated-football", "metadata": {}, "outputs": [ { @@ -930,7 +930,7 @@ { "cell_type": "code", "execution_count": 12, - "id": "careful-dining", + "id": "administrative-biodiversity", "metadata": {}, "outputs": [ { @@ -950,7 +950,7 @@ }, { "cell_type": "markdown", - "id": "common-sixth", + "id": "equipped-amendment", "metadata": {}, "source": [ "- Extract the *sseqid* column from the ``blast_res`` dataframe. " @@ -959,7 +959,7 @@ { "cell_type": "code", "execution_count": 20, - "id": "located-waters", + "id": "seasonal-europe", "metadata": {}, "outputs": [ { @@ -994,7 +994,7 @@ }, { "cell_type": "markdown", - "id": "searching-coach", + "id": "major-leave", "metadata": {}, "source": [ "- Get the minimum and maximum value of a the *evalue* column." @@ -1003,7 +1003,7 @@ { "cell_type": "code", "execution_count": 21, - "id": "square-airplane", + "id": "varied-influence", "metadata": {}, "outputs": [ { @@ -1024,7 +1024,7 @@ { "cell_type": "code", "execution_count": 22, - "id": "innovative-audio", + "id": "little-recipient", "metadata": {}, "outputs": [ { @@ -1044,7 +1044,7 @@ }, { "cell_type": "markdown", - "id": "broad-password", + "id": "sitting-blackberry", "metadata": {}, "source": [ "- Get the median and the mean of the *bitscore* column." @@ -1053,7 +1053,7 @@ { "cell_type": "code", "execution_count": 23, - "id": "tamil-aggregate", + "id": "polyphonic-retro", "metadata": {}, "outputs": [ { @@ -1074,7 +1074,7 @@ { "cell_type": "code", "execution_count": 24, - "id": "sitting-metallic", + "id": "advisory-symphony", "metadata": {}, "outputs": [ { @@ -1094,7 +1094,7 @@ }, { "cell_type": "markdown", - "id": "excessive-tournament", + "id": "friendly-extra", "metadata": {}, "source": [ "- Filter in all hits with a percentage of identity (*pident*) superior to 75%." @@ -1103,7 +1103,7 @@ { "cell_type": "code", "execution_count": 25, - "id": "duplicate-ghana", + "id": "rough-globe", "metadata": {}, "outputs": [ { @@ -1266,7 +1266,7 @@ { "cell_type": "code", "execution_count": 35, - "id": "developing-browser", + "id": "novel-turkey", "metadata": {}, "outputs": [ { @@ -1429,7 +1429,7 @@ }, { "cell_type": "markdown", - "id": "nonprofit-fitting", + "id": "several-light", "metadata": {}, "source": [ "- Based on the bitscore alone, extract only the best hit(s) (i.e. the highest(s) bitscore(s))." @@ -1438,7 +1438,7 @@ { "cell_type": "code", "execution_count": 26, - "id": "chronic-wallace", + "id": "arbitrary-style", "metadata": {}, "outputs": [ { @@ -1518,7 +1518,7 @@ }, { "cell_type": "markdown", - "id": "saving-homeless", + "id": "heated-poultry", "metadata": {}, "source": [ "- Filter in all hits which are corresponding to human hits in the database (*sseqid*)." @@ -1527,7 +1527,7 @@ { "cell_type": "code", "execution_count": 28, - "id": "western-language", + "id": "failing-crossing", "metadata": {}, "outputs": [ { @@ -1776,7 +1776,7 @@ { "cell_type": "code", "execution_count": 29, - "id": "taken-palmer", + "id": "trained-durham", "metadata": {}, "outputs": [ { @@ -2025,7 +2025,7 @@ { "cell_type": "code", "execution_count": 38, - "id": "tracked-reform", + "id": "structural-hybrid", "metadata": {}, "outputs": [ { @@ -2153,7 +2153,7 @@ }, { "cell_type": "markdown", - "id": "reliable-dream", + "id": "incorporate-interface", "metadata": {}, "source": [ "- Plot a histogram of the bitscores. " @@ -2162,7 +2162,7 @@ { "cell_type": "code", "execution_count": 32, - "id": "suspected-substance", + "id": "liable-wheat", "metadata": {}, "outputs": [ { @@ -2194,7 +2194,7 @@ }, { "cell_type": "markdown", - "id": "attempted-development", + "id": "reflected-intervention", "metadata": {}, "source": [ "- Plot a barplot of the number of hits per species (species are considered the last code after the \"_\" in the sseqid column)" @@ -2203,7 +2203,7 @@ { "cell_type": "code", "execution_count": 60, - "id": "swiss-provider", + "id": "normal-glenn", "metadata": {}, "outputs": [ { @@ -2323,7 +2323,7 @@ { "cell_type": "code", "execution_count": 61, - "id": "russian-mystery", + "id": "arranged-intervention", "metadata": {}, "outputs": [ { @@ -2356,7 +2356,7 @@ }, { "cell_type": "markdown", - "id": "reliable-shark", + "id": "generous-regression", "metadata": {}, "source": [ "# Extra exercise" @@ -2365,7 +2365,7 @@ { "cell_type": "code", "execution_count": 1, - "id": "fabulous-endorsement", + "id": "experienced-prediction", "metadata": {}, "outputs": [], "source": [ @@ -2374,7 +2374,7 @@ }, { "cell_type": "markdown", - "id": "boxed-basin", + "id": "purple-legend", "metadata": {}, "source": [ "read the 'data/city_temperature.csv'\n", @@ -2390,7 +2390,7 @@ { "cell_type": "code", "execution_count": 2, - "id": "positive-gateway", + "id": "arctic-pickup", "metadata": {}, "outputs": [ { @@ -2409,7 +2409,7 @@ { "cell_type": "code", "execution_count": 3, - "id": "noble-economics", + "id": "conventional-section", "metadata": {}, "outputs": [ { @@ -2431,7 +2431,7 @@ }, { "cell_type": "markdown", - "id": "international-glenn", + "id": "authentic-hearts", "metadata": {}, "source": [ "We will work only on Europe Region. so creat data named europe with only these data" @@ -2440,7 +2440,7 @@ { "cell_type": "code", "execution_count": 4, - "id": "exciting-founder", + "id": "strong-skirt", "metadata": {}, "outputs": [], "source": [ @@ -2449,7 +2449,7 @@ }, { "cell_type": "markdown", - "id": "dressed-carbon", + "id": "protected-desperate", "metadata": {}, "source": [ "wich country are in europe?" @@ -2458,7 +2458,7 @@ { "cell_type": "code", "execution_count": 5, - "id": "crude-pillow", + "id": "unable-establishment", "metadata": {}, "outputs": [ { @@ -2484,7 +2484,7 @@ }, { "cell_type": "markdown", - "id": "dated-guest", + "id": "engaged-republican", "metadata": {}, "source": [ "remove columns 'Region' and 'State' from the data" @@ -2493,7 +2493,7 @@ { "cell_type": "code", "execution_count": 6, - "id": "valued-minutes", + "id": "anticipated-illinois", "metadata": {}, "outputs": [], "source": [ @@ -2502,7 +2502,7 @@ }, { "cell_type": "markdown", - "id": "million-blank", + "id": "amino-grant", "metadata": {}, "source": [ "from europe data create a new dataset containing countries: 'France', 'Spain', 'Italy'" @@ -2511,7 +2511,7 @@ { "cell_type": "code", "execution_count": 7, - "id": "textile-proof", + "id": "organized-tender", "metadata": {}, "outputs": [], "source": [ @@ -2520,7 +2520,7 @@ }, { "cell_type": "markdown", - "id": "statutory-hierarchy", + "id": "endangered-weekly", "metadata": {}, "source": [ "group the data on 'City' and 'Year' compute the mean of each group and keep only the 'AvgTemperature' column." @@ -2529,7 +2529,7 @@ { "cell_type": "code", "execution_count": 8, - "id": "induced-finish", + "id": "sacred-secret", "metadata": {}, "outputs": [ { @@ -2562,7 +2562,7 @@ }, { "cell_type": "markdown", - "id": "bibliographic-bidding", + "id": "north-condition", "metadata": {}, "source": [ "do the same but compute the standard deviation" @@ -2571,7 +2571,7 @@ { "cell_type": "code", "execution_count": 9, - "id": "valued-smooth", + "id": "absent-envelope", "metadata": {}, "outputs": [ { @@ -2604,7 +2604,7 @@ }, { "cell_type": "markdown", - "id": "outdoor-content", + "id": "continued-tiger", "metadata": {}, "source": [ "* reset the index fo the mean data and std data\n", @@ -2615,7 +2615,7 @@ { "cell_type": "code", "execution_count": 10, - "id": "dangerous-republican", + "id": "geological-newman", "metadata": {}, "outputs": [], "source": [ @@ -2627,7 +2627,7 @@ }, { "cell_type": "markdown", - "id": "equivalent-grove", + "id": "ongoing-armenia", "metadata": {}, "source": [ "merge the two table data_mean and data_std" @@ -2636,7 +2636,7 @@ { "cell_type": "code", "execution_count": 11, - "id": "appreciated-europe", + "id": "liquid-brighton", "metadata": {}, "outputs": [ { @@ -2778,7 +2778,7 @@ }, { "cell_type": "markdown", - "id": "asian-evanescence", + "id": "egyptian-restoration", "metadata": {}, "source": [ "save the data in a file" @@ -2787,7 +2787,7 @@ { "cell_type": "code", "execution_count": 12, - "id": "analyzed-beaver", + "id": "agreed-diesel", "metadata": {}, "outputs": [], "source": [ @@ -2796,7 +2796,7 @@ }, { "cell_type": "markdown", - "id": "elect-percentage", + "id": "interim-interstate", "metadata": {}, "source": [ "# Teasing\n", @@ -2807,7 +2807,7 @@ { "cell_type": "code", "execution_count": 13, - "id": "beneficial-coordinator", + "id": "animated-alert", "metadata": {}, "outputs": [ { @@ -2903,7 +2903,7 @@ { "cell_type": "code", "execution_count": null, - "id": "neither-popularity", + "id": "random-mediterranean", "metadata": {}, "outputs": [], "source": []