Commit 7bddf7fb authored by Etienne Kornobis's avatar Etienne Kornobis
Browse files

minor author/cosmetic edits

parent 21a2434b
......@@ -2,7 +2,26 @@
"cells": [
{
"cell_type": "markdown",
"id": "cultural-palestine",
"id": "functional-attraction",
"metadata": {},
"source": [
"# <center>**TP**</center>\n",
"\n",
"<div style=\"text-align:center\">\n",
" <img src=\"images/jupyter.png\" width=\"600px\">\n",
" <div>\n",
" Bertrand Néron, François Laurent, Etienne Kornobis\n",
" <br />\n",
" <a src=\" https://research.pasteur.fr/en/team/bioinformatics-and-biostatistics-hub/\">Bioinformatics and Biostatistiqucs HUB</a>\n",
" <br />\n",
" © Institut Pasteur, 2021\n",
" </div> \n",
"</div>"
]
},
{
"cell_type": "markdown",
"id": "advance-vaccine",
"metadata": {},
"source": [
"# Introduction to JupyterLab\n",
......@@ -115,7 +134,7 @@
{
"cell_type": "code",
"execution_count": 13,
"id": "public-nightlife",
"id": "respective-prize",
"metadata": {},
"outputs": [
{
......@@ -140,7 +159,7 @@
},
{
"cell_type": "markdown",
"id": "marine-arctic",
"id": "pharmaceutical-college",
"metadata": {},
"source": [
"- The exclamation mark character ``!`` can be used as well to execute the following line in a bash subprocess. For example:"
......@@ -149,7 +168,7 @@
{
"cell_type": "code",
"execution_count": 21,
"id": "considerable-fleet",
"id": "drawn-soldier",
"metadata": {},
"outputs": [
{
......@@ -166,7 +185,7 @@
},
{
"cell_type": "markdown",
"id": "satellite-disposal",
"id": "natural-submission",
"metadata": {},
"source": [
"- `%timeit` can be used to check for execution times:"
......@@ -175,7 +194,7 @@
{
"cell_type": "code",
"execution_count": 18,
"id": "delayed-thunder",
"id": "under-embassy",
"metadata": {},
"outputs": [
{
......@@ -192,7 +211,7 @@
},
{
"cell_type": "markdown",
"id": "vocational-jacksonville",
"id": "constant-driving",
"metadata": {},
"source": [
"- Load more extension for the notebook, for example `autoreload` is useful extension to automatically reload a module imported in a Jupyter notebook if the module has changed locally:"
......@@ -201,7 +220,7 @@
{
"cell_type": "code",
"execution_count": 22,
"id": "physical-steering",
"id": "waiting-credit",
"metadata": {},
"outputs": [
{
......@@ -220,7 +239,7 @@
},
{
"cell_type": "markdown",
"id": "regular-tiger",
"id": "laden-seeker",
"metadata": {},
"source": [
"# Exercices"
......@@ -228,7 +247,7 @@
},
{
"cell_type": "markdown",
"id": "rotary-bouquet",
"id": "fitted-insert",
"metadata": {},
"source": [
"The aim here is to get comfortable in Jupyterlab.\n",
......@@ -245,14 +264,14 @@
{
"cell_type": "code",
"execution_count": null,
"id": "chinese-values",
"id": "international-thomson",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "useful-segment",
"id": "olympic-shoot",
"metadata": {},
"source": [
"## Exercise\n",
......@@ -268,14 +287,14 @@
{
"cell_type": "code",
"execution_count": null,
"id": "classical-extraction",
"id": "creative-conditioning",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "iraqi-wholesale",
"id": "bibliographic-concern",
"metadata": {},
"source": [
"## Exercise\n",
......@@ -293,14 +312,14 @@
{
"cell_type": "code",
"execution_count": null,
"id": "refined-relation",
"id": "ruled-bottle",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "manufactured-treatment",
"id": "verbal-field",
"metadata": {},
"source": [
"## Exercise\n",
......@@ -317,14 +336,14 @@
{
"cell_type": "code",
"execution_count": null,
"id": "featured-converter",
"id": "identified-calculation",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "constant-thriller",
"id": "unauthorized-carter",
"metadata": {},
"source": [
"## Exercise\n",
......@@ -335,14 +354,14 @@
{
"cell_type": "code",
"execution_count": null,
"id": "illegal-preserve",
"id": "abandoned-shareware",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "written-bidding",
"id": "molecular-census",
"metadata": {},
"source": [
"## Exercise\n",
......@@ -353,14 +372,14 @@
{
"cell_type": "code",
"execution_count": null,
"id": "waiting-concord",
"id": "overall-assurance",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "varying-providence",
"id": "static-array",
"metadata": {},
"source": [
"# More documentation\n",
......
%% Cell type:markdown id:cultural-palestine tags:
%% Cell type:markdown id:functional-attraction tags:
# <center>**TP**</center>
<div style="text-align:center">
<img src="images/jupyter.png" width="600px">
<div>
Bertrand Néron, François Laurent, Etienne Kornobis
<br />
<a src=" https://research.pasteur.fr/en/team/bioinformatics-and-biostatistics-hub/">Bioinformatics and Biostatistiqucs HUB</a>
<br />
© Institut Pasteur, 2021
</div>
</div>
%% Cell type:markdown id:advance-vaccine tags:
# Introduction to JupyterLab
## Aim of this section
- Whet your appetite on notebook technologies.
- Discover, get comfortable with Jupyterlab and create notebooks.
## Install
After creating a folder for the course, use `venv` to create a virtual environment named for example `sp_env`:
```shell
python3 -m venv sp_env
```
This will create a folder `sp_env` in your working directory. The corresponding virtual environment can be activated with:
```shell
source sp_env/bin/activate
```
You are now in a virtual environment. You can install librairies in it using pip and these will be installed specifically in this environment (and not globally on your machine). For more on virtual environment, [see the documentation](https://docs.python.org/3/library/venv.html).
Once the virtal environment activated, we can start composing this environment, now with jupyterlab
```shell
pip install jupyterlab
```
You can now start the jupyter server as follows:
```shell
jupyter lab
```
And open the specified URL in your internet browser (Chrome or Firefox are
better supported). By default, the address will be http://localhost:8888 and you will be automatically redirected to this tab.
Once all you work is done, you can exit the virtual environment with:
```shell
deactivate
```
You will need to reactivate it (with `source sp_env/bin/activate`) in order to use it again.
For more exhaustive guidelines on JupyterLab installation, you can see [the official Jupyter documentation](https://jupyter.org/install)
## Basic functioning of Jupyter notebooks
### Using the interface
Jupyter notebooks are organized in **cells** which are executed
separately. Therefore the code execution is not necessarily sequential as
in classical scripts. Cell execution order can be witnessed in the value in
between squared brackets `[]` on the left of the corresponding cell. **You
should be careful with cell execution order in jupyter**.
Jupyter cells can be of various **types**:
- **Code**: actual code blocks of your notebooks (which will be interpreted).
- **Markdown**: To integrate explanations within your notebooks.
- **Raw**: Raw it is...
- And more...
Jupyter use **2 editing modes**:
- **Command mode** (``Esc``): To organize cells and browse the notebook.
Using keystrokes, you can:
| key | effect |
|-|-|
| ↑, ↓ | Move up and down in cells |
| a | Add cell above |
| b | Add cell below |
| dd | Delete cell |
You can as well use drag and drop with your mouse to move cells or groups of cells around.
Cell type can be changed in command mode using the graphical interface or shortcuts:
|key|Switch cell to this mode|
|-|-|
|y|Code|
|m|Markdown|
|r|Raw|
Markdown is a lightweight markup language with plain text formatting syntax. To help you remember the markdown syntax and format your markdown cells, here is a [cheatsheet](https://www.markdownguide.org/cheat-sheet)
- **Edit mode** (`Enter/Return`): To edit the active cell. Then to execute the cell you have 2 options:
|keys|effect|
|-|-|
|`Ctrl + Enter`|Execute current cell|
|`Shift + Enter`|Execute current cell and move to the next cell|
### Jupyter magic commands
Jupyter provides some functionalities which can be added at the beginning of a code cell called **magic commands**. Here is [an exhaustive list of Jupyter magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html)
Here are some example of useful magic commands:
- Run cell with bash in subprocess:
%% Cell type:code id:public-nightlife tags:
%% Cell type:code id:respective-prize tags:
``` python
%%bash
echo "This is a bash script"
for i in {1..3}; do echo $i; done
echo "Over and out"
```
%%%% Output: stream
This is a bash script
1
2
3
Over and out
%% Cell type:markdown id:marine-arctic tags:
%% Cell type:markdown id:pharmaceutical-college tags:
- The exclamation mark character ``!`` can be used as well to execute the following line in a bash subprocess. For example:
%% Cell type:code id:considerable-fleet tags:
%% Cell type:code id:drawn-soldier tags:
``` python
! echo "This is executed in a bash subprocess"
```
%%%% Output: stream
This is executed in a bash subprocess
%% Cell type:markdown id:satellite-disposal tags:
%% Cell type:markdown id:natural-submission tags:
- `%timeit` can be used to check for execution times:
%% Cell type:code id:delayed-thunder tags:
%% Cell type:code id:under-embassy tags:
``` python
%timeit for _ in range(1000): True
```
%%%% Output: stream
14.1 µs ± 436 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%% Cell type:markdown id:vocational-jacksonville tags:
%% Cell type:markdown id:constant-driving tags:
- Load more extension for the notebook, for example `autoreload` is useful extension to automatically reload a module imported in a Jupyter notebook if the module has changed locally:
%% Cell type:code id:physical-steering tags:
%% Cell type:code id:waiting-credit tags:
``` python
%load_ext autoreload
%autoreload 2
```
%%%% Output: stream
The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
%% Cell type:markdown id:regular-tiger tags:
%% Cell type:markdown id:laden-seeker tags:
# Exercices
%% Cell type:markdown id:rotary-bouquet tags:
%% Cell type:markdown id:fitted-insert tags:
The aim here is to get comfortable in Jupyterlab.
## Exercise
- Start a Jupyterlab server.
- Create a new notebook with a python3 kernel.
- Create, delete and move cells around using shortcuts and graphical interface.
NB: A kernel provides a programming language support in Jupyter. Kernels are available for Python, R, Julia, and many more.
%% Cell type:code id:chinese-values tags:
%% Cell type:code id:international-thomson tags:
``` python
```
%% Cell type:markdown id:useful-segment tags:
%% Cell type:markdown id:olympic-shoot tags:
## Exercise
In the notebook, create a code cell with simple python code inside with a
``print`` statement, execute the cell and witness its output.
For example::
print("Hello World !")
%% Cell type:code id:classical-extraction tags:
%% Cell type:code id:creative-conditioning tags:
``` python
```
%% Cell type:markdown id:iraqi-wholesale tags:
%% Cell type:markdown id:bibliographic-concern tags:
## Exercise
In the notebook, create a markdown cell with:
- A Header
- Bold text
- A list
- A link to the jupyter documentation ie https://jupyter.org/documentation
Render (execute) the cell to display the cell with a pretty formatting.
%% Cell type:code id:refined-relation tags:
%% Cell type:code id:ruled-bottle tags:
``` python
```
%% Cell type:markdown id:manufactured-treatment tags:
%% Cell type:markdown id:verbal-field tags:
## Exercise
Grasp the concept of cell execution by creating three cells:
- 1 cell defining a variable with a simple value. (e.g. `myvar=12`)
- 1 cell defining the same variable with a different value from the previous cell (e.g. `myvar=42`)
- 1 cell printing the value of the variable (`print(myvar)`).
Witness how execution order of your cells can affect the result of the cell printing the output. This is potentially dangerous when using notebooks and has to be kept in mind when coded and used.
%% Cell type:code id:featured-converter tags:
%% Cell type:code id:identified-calculation tags:
``` python
```
%% Cell type:markdown id:constant-thriller tags:
%% Cell type:markdown id:unauthorized-carter tags:
## Exercise
Using a Jupyter magic command, create a cell listing the files in the current directory using a bash subprocess.
%% Cell type:code id:illegal-preserve tags:
%% Cell type:code id:abandoned-shareware tags:
``` python
```
%% Cell type:markdown id:written-bidding tags:
%% Cell type:markdown id:molecular-census tags:
## Exercise
Using the graphical interface, export your notebook as html file.
%% Cell type:code id:waiting-concord tags:
%% Cell type:code id:overall-assurance tags:
``` python
```
%% Cell type:markdown id:varying-providence tags:
%% Cell type:markdown id:static-array tags:
# More documentation
JupyterLab: https://jupyterlab.readthedocs.io/en/latest/
## A note about Extensions
JupyterLab is highly extensible. A lot of extensions are developed by the Jupyter community and can allow you to tune your Jupyter configuration. Here is a couple of examples:
- Visualize and fold your code according to the table of content of your notebook: toc
- Import/Export your notebook as simple script/markdown files: jupytext.
- Deal with your conda environments in JupyterLab: nbconda
You can discover and install much more extensions using the Extension Manager in the JupyterLab interface.
......
......@@ -2,14 +2,14 @@
"cells": [
{
"cell_type": "markdown",
"id": "right-artwork",
"id": "integral-thermal",
"metadata": {},
"source": [
"# <center>**TP**</center>\n",
"\n",
"<img src=\"./images/pandas_logo.svg\">\n",
"<div style=\"text-align:center\">\n",
" Bertrand Néron\n",
" Bertrand Néron, François Laurent, Etienne Kornobis\n",
" <br />\n",
" <a src=\" https://research.pasteur.fr/en/team/bioinformatics-and-biostatistics-hub/\">Bioinformatics and Biostatistiqucs HUB</a>\n",
" <br />\n",
......@@ -19,7 +19,7 @@
},
{
"cell_type": "markdown",
"id": "sacred-breathing",
"id": "trained-fighter",
"metadata": {},
"source": [
"# Exploring Blast results"
......@@ -27,7 +27,7 @@
},
{
"cell_type": "markdown",
"id": "technical-crystal",
"id": "manufactured-cursor",
"metadata": {},
"source": [
"- Import the file data/blast.txt into a pandas dataframe variable (named `blast_res`). Verify that its type is a pandas\n",
......@@ -40,7 +40,7 @@
{
"cell_type": "code",
"execution_count": 1,
"id": "recreational-seller",
"id": "musical-violence",
"metadata": {},
"outputs": [],
"source": [
......@@ -50,7 +50,7 @@
{
"cell_type": "code",
"execution_count": 4,
"id": "major-dream",
"id": "loaded-transfer",
"metadata": {},
"outputs": [],
"source": [
......@@ -61,7 +61,7 @@
{
"cell_type": "code",
"execution_count": 5,
"id": "parliamentary-heaven",
"id": "streaming-regulation",
"metadata": {},
"outputs": [
{
......@@ -82,7 +82,7 @@
{
"cell_type": "code",
"execution_count": 6,
"id": "changing-drive",
"id": "unsigned-coast",
"metadata": {},
"outputs": [
{
......@@ -332,7 +332,7 @@
},
{
"cell_type": "markdown",
"id": "productive-chorus",
"id": "dominant-knowing",
"metadata": {},
"source": [
"Explore ``blast_res`` dataframe:\n",
......@@ -346,7 +346,7 @@
{
"cell_type": "code",
"execution_count": 7,
"id": "yellow-matthew",
"id": "simplified-progress",
"metadata": {},
"outputs": [
{
......@@ -492,7 +492,7 @@
{
"cell_type": "code",
"execution_count": 8,
"id": "handled-details",
"id": "narrow-smell",
"metadata": {},
"outputs": [
{
......@@ -689,7 +689,7 @@
{
"cell_type": "code",
"execution_count": 9,
"id": "virgin-forestry",
"id": "identical-guest",
"metadata": {},
"outputs": [
{
......@@ -868,7 +868,7 @@
{
"cell_type": "code",
"execution_count": 10,
"id": "superb-papua",
"id": "alpine-cleveland",
"metadata": {},
"outputs": [
{
......@@ -888,7 +888,7 @@
},
{
"cell_type": "markdown",
"id": "fourth-pennsylvania",
"id": "imposed-squad",
"metadata": {},
"source": [
"- Extract 3rd line from the ``blast_res`` dataframe. Which type of data structure is returned by this extraction ?"
......@@ -897,7 +897,7 @@
{
"cell_type": "code",
"execution_count": 11,
"id": "binding-interest",
"id": "complicated-football",
"metadata": {},
"outputs": [
{
......@@ -930,7 +930,7 @@
{
"cell_type": "code",
"execution_count": 12,
"id": "careful-dining",
"id": "administrative-biodiversity",
"metadata": {},
"outputs": [
{
......@@ -950,7 +950,7 @@
},
{
"cell_type": "markdown",
"id": "common-sixth",
"id": "equipped-amendment",
"metadata": {},
"source": [
"- Extract the *sseqid* column from the ``blast_res`` dataframe. "
......@@ -959,7 +959,7 @@
{
"cell_type": "code",
"execution_count": 20,
"id": "located-waters",
"id": "seasonal-europe",
"metadata": {},
"outputs": [
{
......@@ -994,7 +994,7 @@
},
{
"cell_type": "markdown",
"id": "searching-coach",
"id": "major-leave",
"metadata": {},
"source": [
"- Get the minimum and maximum value of a the *evalue* column."
......@@ -1003,7 +1003,7 @@
{
"cell_type": "code",
"execution_count": 21,
"id": "square-airplane",
"id": "varied-influence",
"metadata": {},
"outputs": [
{
......@@ -1024,7 +1024,7 @@
{
"cell_type": "code",
"execution_count": 22,
"id": "innovative-audio",
"id": "little-recipient",
"metadata": {},
"outputs": [
{
......@@ -1044,7 +1044,7 @@
},
{
"cell_type": "markdown",
"id": "broad-password",
"id": "sitting-blackberry",
"metadata": {},
"source": [
"- Get the median and the mean of the *bitscore* column."
......@@ -1053,7 +1053,7 @@
{
"cell_type": "code",
"execution_count": 23,
"id": "tamil-aggregate",
"id": "polyphonic-retro",
"metadata": {},
"outputs": [
{
......@@ -1074,7 +1074,7 @@
{
"cell_type": "code",
"execution_count": 24,
"id": "sitting-metallic",
"id": "advisory-symphony",
"metadata": {},
"outputs": [
{
......@@ -1094,7 +1094,7 @@
},
{
"cell_type": "markdown",
"id": "excessive-tournament",
"id": "friendly-extra",
"metadata": {},
"source": [
"- Filter in all hits with a percentage of identity (*pident*) superior to 75%."
......@@ -1103,7 +1103,7 @@
{
"cell_type": "code",
"execution_count": 25,
"id": "duplicate-ghana",
"id": "rough-globe",
"metadata": {},
"outputs": [
{
......@@ -1266,7 +1266,7 @@
{
"cell_type": "code",
"execution_count": 35,
"id": "developing-browser",
"id": "novel-turkey",
"metadata": {},
"outputs": [
{
......@@ -1429,7 +1429,7 @@
},
{
"cell_type": "markdown",
"id": "nonprofit-fitting",
"id": "several-light",
"metadata": {},
"source": [
"- Based on the bitscore alone, extract only the best hit(s) (i.e. the highest(s) bitscore(s))."
......@@ -1438,7 +1438,7 @@
{
"cell_type": "code",
"execution_count": 26,
"id": "chronic-wallace",
"id": "arbitrary-style",
"metadata": {},
"outputs": [
{
......@@ -1518,7 +1518,7 @@
},
{
"cell_type": "markdown",
"id": "saving-homeless",
"id": "heated-poultry",
"metadata": {},
"source": [
"- Filter in all hits which are corresponding to human hits in the database (*sseqid*)."
......@@ -1527,7 +1527,7 @@
{
"cell_type": "code",
"execution_count": 28,
"id": "western-language",
"id": "failing-crossing",
"metadata": {},
"outputs": [
{
......@@ -1776,7 +1776,7 @@
{
"cell_type": "code",
"execution_count": 29,
"id": "taken-palmer",
"id": "trained-durham",
"metadata": {},
"outputs": [
{
......@@ -2025,7 +2025,7 @@
{
"cell_type": "code",
"execution_count": 38,
"id": "tracked-reform",
"id": "structural-hybrid",
"metadata": {},
"outputs": [
{
......@@ -2153,7 +2153,7 @@
},
{
"cell_type": "markdown",
"id": "reliable-dream",
"id": "incorporate-interface",
"metadata": {},
"source": [
"- Plot a histogram of the bitscores. "
......@@ -2162,7 +2162,7 @@
{
"cell_type": "code",
"execution_count": 32,
"id": "suspected-substance",
"id": "liable-wheat",
"metadata": {},
"outputs": [
{
......@@ -2194,7 +2194,7 @@
},
{
"cell_type": "markdown",
"id": "attempted-development",
"id": "reflected-intervention",
"metadata": {},
"source": [
"- Plot a barplot of the number of hits per species (species are considered the last code after the \"_\" in the sseqid column)"
......@@ -2203,7 +2203,7 @@
{
"cell_type": "code",
"execution_count": 60,
"id": "swiss-provider",
"id": "normal-glenn",
"metadata": {},
"outputs": [
{
......@@ -2323,7 +2323,7 @@
{
"cell_type": "code",
"execution_count": 61,
"id": "russian-mystery",
"id": "arranged-intervention",
"metadata": {},
"outputs": [
{
......@@ -2356,7 +2356,7 @@
},
{
"cell_type": "markdown",
"id": "reliable-shark",