diff --git a/source/Good_Practices/conda.rst b/source/Good_Practices/conda.rst index 2d0bfe2c4894ecf61982f70ce814d82ae876759e..5cb2abbb9f438e844bb2334d14829df21cd6fe00 100644 --- a/source/Good_Practices/conda.rst +++ b/source/Good_Practices/conda.rst @@ -24,7 +24,7 @@ Miniconda will create a conda environment with the **conda** command line interf For Linux users, you should see a section like this one: -.. image:: ../_static/conda/miniconda.png +.. image:: ../_static/good_practices/conda_miniconda.png .. note:: diff --git a/source/Good_Practices/containers.rst b/source/Good_Practices/containers.rst new file mode 100644 index 0000000000000000000000000000000000000000..e9a508ad43080d60c15bb48e2e3b968046b6c877 --- /dev/null +++ b/source/Good_Practices/containers.rst @@ -0,0 +1,530 @@ +.. Pasteur Network Course: Bioinformatics for SARS-CoV-2 sequence analysis + +.. role:: red + +.. _containers: + +========== +Containers +========== + +There exists several containers solution we will see only two of them Docker and Apptainer. + +The containers use Linux kernel primitives, so it can works only on Linux. +On other systems it works in wsl or linux virtual machine even this apsect is hidden to the user (Docker on windows/Mac). + + +Architecture +============ + +Few reminder about os/application architecture. + +Architecture of Linux bar metal +------------------------------- + +.. image:: ../_static/good_practices/container_bar_metal_arch.png + :width: 300px + +The Applications communicates + +* with the root file system +* Linux kernel +* Physical hardware (through the Linux kernel) + +Architecture of virtual machine +------------------------------- + +.. image:: ../_static/good_practices/container_vm_arch.png + :width: 400px + +In virtual machine a complete machine is emulated. + +* The physical layer +* Te Kenel +* The file system + +The application cannot communicate with the hosts. +The hypervisor translate all command to the host native layers + + +Architecture of Docker container +-------------------------------- + +.. image:: ../_static/good_practices/container_docker_arch.png + :width: 600px + +In docker the applications + +* are isolated from the other native applications +* the file system is also isolated from the bar metal file systems + (except if we explicitly mounting some parts inside the container) +* inside the container commands :red:`are executed as **ROOT**` + + +Architecture of Apptainer container +----------------------------------- + +.. image:: ../_static/good_practices/container_apptainer_arch.png + :width: 700px + +In Apptainer the applications + +* are isolated from the other native applications +* some part of the filesystems (/tmp $HOME) are accessible inside the containers +* the applications are executed under the identity of the user who run the container +* it can also easily access to physical layer for instance the video card (useful for gpu computation) + + +Architectures comparisons +------------------------- + +.. image:: ../_static/good_practices/container_compare_arch.png + :width: 600px + + +Image VS Container +------------------ + +Whatever the container solution used. There is one main concept to understand. + +.. graphviz:: + + digraph builtin_exception_hierarchy { + rankdir="LR"; + graph[fontsize = 8]; + node[fontsize = 12]; + edge[dir=forward]; + "Recipe" -> "Image" [label="build"] + "Image"-> "Container" [label="exec / run"]; + } + + +#. We write a recipe (text file) which describe how to build an image +#. We build an image with the command `build` and the recipe. + + An image is a kind of template. + +#. The command `run`/`exec` instanciate a container and execute code. + + * Each time you run the command `run`/`exec` you create a new container, even if you use the same image. + * It can exists several containers from one image. + +Container vs Environment +======================== + +A virtual environment is a tool that helps to keep dependencies required by different projects separated. + +| **Virtualenv** is a virtual environment with packaging system. But it is OS dependent, and the Python language dependent. +| **Conda** is also a virtual environment and a packaging system. Instead of virtualenv, it is is language agnostic. But it’s also OS dependent. + +So you cannot share these environments between different hosts (not same operating systems, not same systems libraries versions). + + + + +Docker +====== + +For what “traditional” containers are designed? +----------------------------------------------- + +.. image:: ../_static/good_practices/container_boat_old.png + + +Once upon a time, the administrator installed +Applications on a physical machine. +The administrator had to make the different applications coexist. Each of these apps had dependencies... +Sometimes these were not compatibles + +.. image:: ../_static/good_practices/container_boat.png + + +Here come the containers. + +Each application is in a container with its specific dependencies. +From the administrator point of view, whatever are inside these containers, he just have to execute them side by side. + +Docker main usages +"""""""""""""""""" + +Mainly used for encapsulating, deploying and running large web applications + + +.. figure:: ../_static/good_practices/container_docker_main_usage.png + :width: 500px + + Extract from Course: + `Good practices for reproducible bioinformatics data analysis <https://reproducibility.pages.pasteur.fr/teaching/pasteur_formation_repro_2022/>`_ + + +Docker usage in science +""""""""""""""""""""""" + +We will use it for encapsulating and running bioinformatics tools + + +.. figure:: ../_static/good_practices/container_docker_workflow.png + :width: 700px + + Extract from Course: + `Good practices for reproducible bioinformatics data analysis. <https://reproducibility.pages.pasteur.fr/teaching/pasteur_formation_repro_2022/>`_ + + + + +Docker in a nutshell +-------------------- + +Installation +"""""""""""" + +Set up the repository +''''''''''''''''''''' + +Update the apt package index and install packages to allow apt to use a repository over HTTPS: + +:code:`sudo apt-get update` + +install some depnedencies + +.. code-block:: + + sudo apt-get install \ + ca-certificates \ + curl \ + gnupg + + +Add Docker’s official GPG key: +'''''''''''''''''''''''''''''' + +.. code-block:: + + sudo mkdir -p /etc/apt/keyrings + + curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg + + +Use the following command to set up the repository: + +.. code-block:: + + echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ + $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null + +Install Docker Engine +''''''''''''''''''''' + +Update the apt package index, and install the latest version of Docker Engine: + +.. code-block:: + + sudo apt-get update + + sudo apt-get install docker-ce docker-ce-cli + + +Verify that Docker Engine is installed correctly by running the hello-world image. + +:code:`sudo docker run hello-world` + + +run docker images +""""""""""""""""" + +Unfortunately we do not learn to build docker from scratch (it's beyond the scope of this course). +But we going to learn to run existing images + +To get image from docker hub: + +explore the docker hub https://hub.docker.com/search?q= + +Exercise +"""""""" + +for instance search for +* *bwa* +* choose evolbioinfo/bwa +* click on `tags` +* choose v0.7.17 tag and pull the image + + +check your local images +''''''''''''''''''''''' + +.. code-block:: + + $ docker images + REPOSITORY TAG IMAGE ID CREATED SIZE + gempasteur/macsyfinder 2.0 355dfb3aa017 8 days ago 438MB + + +To run an image :code:`docker run <image_name:tag> <arguments specific to the image>` + +.. code-block:: + + $ docker run gempasteur/macsyfinder:2.0 macsyfinder --version + Macsyfinder 2.0 + using: + - Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] + - NetworkX 2.8.6 + - Pandas 1.4.3 + + MacsyFinder is distributed under the terms of the GNU General Public License (GPLv3). + See the COPYING file for details. + + If you use this software please cite: + Abby SS, Néron B, Ménager H, Touchon M, Rocha EPC (2014) + MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems. + PLoS ONE 9(10): e110726. doi:10.1371/journal.pone.0110726 + + and don't forget to cite models used: + macsydata cite <model> + + +Now it's time to run macsyfinder on real data +'''''''''''''''''''''''''''''''''''''''''''''' + +The problem with docker, if you remind the architecture, is that the file system in the container is totally isolate from the host filesystem. +Our input data are on the host file system, we want to have also the results on the host file system. +So we need to find a way to docker container to share a part of it's file system with the host. + +We can do this bay binding a directory on the host file system inside the container file system. +The syntax is :code:`-v dir_on_host:dir_on_container` + +for instance + +:code:`docker run -v $PWD:/home/msf gempasteur/macsyfinder:2:0 <macsyfinder args and option>` + +.. note:: + + The `-v` option must be placed just after the `run` subcommand + The path must be absolute + +So with the line above I bind the current directory ($PWD) in the container in /home/msf. + +So if PWD is /home/login/Project/msf_analyse and in this directory there is a file genome.fasta . +This file will be accessible at the following path /home/msf/genome.fasta + +So the command line will be + +:code:`docker run -v $PWD:/home/msf gempasteur/macsyfinder:2.0 macsyfinder --sequence-db /home/masf/genome.fasta` + +The other problem is that inside the docker container the user is `msf` (see documentation) but `msf` user does not exists on the host. +So it will not have access to the bound directory. We have to tell to docker container how to map my id on host to the user in the container. + +This is done by option `-u` :code:` -u $(id -u ${USER}):$(id -g ${USER})` + +So the complete command line will be: + +.. code-block:: + + docker run -v $PWD:/home/msf -u $(id -u ${USER}):$(id -g ${USER}) gempasteur/macsyfinder:2.0 macsyfinder --sequence-db /home/msf/genome.fasta ... + + + +Exercise +"""""""" + +Run bwa on real data + +* Two compressed fastq files containing paired-end reads. (download reads here : https://www.ebi.ac.uk/ena/browser/view/SRX9443330?show=reads) +* The reference genome to map the reads against (download ref here: https://www.ncbi.nlm.nih.gov/nuccore/MN908947 choose fasta format) + +**Note** that the user in evlobioinfo/bwa is **root**. + +#. index the reference with the command :code:`bwa index reference.fa` (translate this command to use the docker evlobioinfo/bwa image) +#. map the reads against the reference genome :code:`bwa mem -t 1 reference.fa reads1.fq reads2.fq > tmp.sam` (translate this command to use the docker evlobioinfo/bwa image) + + +Container and Image management +"""""""""""""""""""""""""""""" + +Now you have a Image and containers. + +You can monitor containers with :code:`docker ps` + +Without argument :code:`docker ps` display only running containers. +To see all containers running and exited ones add option `-a` or `--all` + +You can also use :code:`docker container ls -a` + +.. code-block:: + + $ docker ps --all + CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES + 8eedc25d84d2 gempasteur/macsyfinder:2.0 "/usr/local/bin/macs…" 4 minutes ago Exited (0) 4 minutes ago relaxed_wiles + 2d3f3a3632cb gempasteur/macsyfinder:2.0 "/usr/local/bin/macs…" 4 minutes ago Exited (0) 4 minutes ago pedantic_jackson + 6a75049dd213 gempasteur/macsyfinder:2.0 "/usr/local/bin/macs…" 4 minutes ago Exited (127) 4 minutes ago reverent_rubin + 80a2a6206c21 continuumio/miniconda3 "/bin/bash" 5 days ago Exited (0) 5 days ago determined_curie + 97854079d0f4 continuumio/miniconda3 "/bin/bash" 5 days ago Exited (0) 5 days ago nifty_pascal + a6f74f6e370a continuumio/miniconda3 "/bin/bash" 5 days ago Exited (1) 5 days ago laughing_nightingale + eb3119dc48b7 satellite_finder:0.9 "/usr/local/lib/sate…" 7 days ago Exited (0) 7 days ago strange_turing + 4b08009c87c8 satellite_finder:0.9 "/usr/local/lib/sate…" 7 days ago Exited (1) 7 days ago beautiful_hertz + 3c3501f0497c 22849e5c9dbf "/usr/local/lib/sate…" 7 days ago Exited (0) 7 days ago angry_kowalevski + ... + +Beware all these containers take huge places on disk, :red:`You have to clean your local repository` + +:code:`docker rm <container ID>, ..` + +It can be tedious to remove container one by one, or copy/paste all container ID + +:code:`docker rm $(docker ps --all -q -f status=exited)` + +will remove all exited containers. + + +to remove a docker image use :code:`docker rmi <IMAGE ID>` + +There is another command to remove containers **and** images + +:code:`docker system prune` + +.. code-block:: + + $ docker system prune + WARNING! This will remove: + - all stopped containers + - all networks not used by at least one container + - all dangling images + - all dangling build cache + + Are you sure you want to continue? [y/N] y + Deleted Containers: + 0af5663b31dedea08c93c32177122ac46ab680b0084ee77cf2b01bb23a27df22 + + Deleted Images: + deleted: sha256:22849e5c9dbffcdce6f07d674caf9ff86b4fa8eeacd368009e40a2a9dc93ccaf + deleted: sha256:331c1b7705749e03da8308b9df0bde3b4e27f100ce1769abf5d70e2b1dba2ded + deleted: sha256:b2c7583232c0c7f9ec6b80b5d460f4744bfbe134acfa29854323b3e5b2a8b165 + deleted: sha256:8ce0511142024004772071aba48fedac943b6a1df0ca0f830bdd96df5d7d858f + + Deleted build cache objects: + 594cygos858w6n0ysg5is6trt + mkopfhw6p50897z3d3bhtgog9 + w1tipmo21sin5hkhjx5e9m5ml + tig11nb3qp7kfihiwjhvjedmb + pw4yq9zmf509o8zukc70oyl4z + suedvqgi2gc0xax6t54jlmw01 + ... + +for all :code:`docker system prune` options https://docs.docker.com/engine/reference/commandline/system_prune/ + +.. warning:: + + By default you execute code as root in a Docker container, so if you mount filesystem in your docker container + You can access as root. So It's a real security issue. This is why Docker is forbidden on clusters. + And it is dangerous to run image you don't know anything. + + Do not execute Docker image if you do not :red:`TRUST` the builder or you don't have the Dockerfile. + So do not trust any sources on docker hub. + + +Apptainer +========= + +Why we need containers in science? +---------------------------------- + +.. image:: ../_static/good_practices/container_boat_failed.png + +The problems with this traditional containers are: + +* By default they are executed as **root** +* By default they do not communicate with the host file systems, so it’s hard to share data between containers + +.. image:: ../_static/good_practices/container_pirates.png + +* We do not need to execute tons of containers at the same time. +* We still need container to ship the application with the dependencies. +* We need to run them as “regular” user. +* We need to share easily data between the container and the host file systems + + +Apptainer overview +------------------ + +.. image:: ../_static/good_practices/container_apptainer_build_vs_run.png + +Apptainer workflow. + +we have to considered tow step on 2 environment + +* **build**: for mots build operations we need to be **root** +* **execution** any user are able to to run apptainer images + + +Apptainer in a nutshell +----------------------- + +Installation +"""""""""""" + +Visit the releases page https://github.com/apptainer/apptainer/releases + + +.. image:: ../_static/good_practices/container_apptainer_releases.png + + +For Centos/RedHat guys (from .rpm) +'''''''''''''''''''''''''''''''''' + +Download the `.rpm` from a release on the github repository + +.. code-block:: + + sudo dnf install ./package.rpm + # or + sudo yum localinstall apptainer-1.0.2-1.x86_64.rpm + +For Debian/Ubuntu guys (from .deb) +'''''''''''''''''''''''''''''''''' + +Download the `.deb` from a release on the github repository + +.. code-block:: + + wget https://github.com/apptainer/apptainer/releases/download/v1.0.3/apptainer_1.0.3_amd64.deb + + sudo dpkg -i apptainer_1.0.3_amd64.deb + +For Gentoo guys +''''''''''''''' + +.. code-block:: + + sudo emerge apptainer + +For the other guys (mac, windows, ...) +'''''''''''''''''''''''''''''''''''''' + +You have to install from the sources see https://apptainer.org/docs/user/main/quick_start.html#quick-start + + + +Exercise +"""""""" + +#. build an apptainer `bwa` image from the evlobioinfo/bwa:v0.7.17 docker image +#. do the same operations as with docker (bwa index and bwa mapping) but with the apptainer image you just build. + + +.. warning:: + + Apptainer by default mount your HOME and /tmp in the image. It does the same during the building phase. + It's mean tah it access to the ~/.cache or .~/loca/cache. + So it will use or put data on this cache especially when you use pip conda on so on. This is not what we want to do + to be reproducible. + + So if you build/install data with pip add environment variable + PYTHONNOUSERSITE=1 + export PYTHONNOUSERSITE + in your reciepe at build and run time + + If PYTHONNOUSERSITE is set, Python won’t add the user site-packages directory to sys.path. + https://docs.python.org/3/using/cmdline.html#envvar-PYTHONNOUSERSITE + + or/and use the `pip` option `--no-cache-dir` in your recipes. + + diff --git a/source/Good_Practices/index.rst b/source/Good_Practices/index.rst index 270c0bd0b917af44b80ed30608da11fadd9530a8..3f77990451d83ee0b01cd3809aff8f1fdcf29d6b 100644 --- a/source/Good_Practices/index.rst +++ b/source/Good_Practices/index.rst @@ -11,3 +11,4 @@ Good Practices cookiecutter conda + containers \ No newline at end of file diff --git a/source/_static/good_practices/container_apptainer_arch.png b/source/_static/good_practices/container_apptainer_arch.png new file mode 100644 index 0000000000000000000000000000000000000000..c7b78ba35142bdc9ddeafe70954012ee5e9ce94a Binary files /dev/null and b/source/_static/good_practices/container_apptainer_arch.png differ diff --git a/source/_static/good_practices/container_apptainer_build.png b/source/_static/good_practices/container_apptainer_build.png new file mode 100644 index 0000000000000000000000000000000000000000..2d4011e425eb5ff120dd1f6315b651edb945ddcf Binary files /dev/null and b/source/_static/good_practices/container_apptainer_build.png differ diff --git a/source/_static/good_practices/container_apptainer_build_vs_run.png b/source/_static/good_practices/container_apptainer_build_vs_run.png new file mode 100644 index 0000000000000000000000000000000000000000..6914fe9a8517cebb01aa3451c2c441a03eb3013b Binary files /dev/null and b/source/_static/good_practices/container_apptainer_build_vs_run.png differ diff --git a/source/_static/good_practices/container_apptainer_mount.png b/source/_static/good_practices/container_apptainer_mount.png new file mode 100644 index 0000000000000000000000000000000000000000..352e02fe5e0c99bde0f70a6133c2b2b58d4ac753 Binary files /dev/null and b/source/_static/good_practices/container_apptainer_mount.png differ diff --git a/source/_static/good_practices/container_apptainer_releases.png b/source/_static/good_practices/container_apptainer_releases.png new file mode 100644 index 0000000000000000000000000000000000000000..b755bcef50ded94961fba3da831cc703385078d9 Binary files /dev/null and b/source/_static/good_practices/container_apptainer_releases.png differ diff --git a/source/_static/good_practices/container_apptainer_time_line.png b/source/_static/good_practices/container_apptainer_time_line.png new file mode 100644 index 0000000000000000000000000000000000000000..d0ded32ac58f50c8b35a916d3ad9dad39bef66a8 Binary files /dev/null and b/source/_static/good_practices/container_apptainer_time_line.png differ diff --git a/source/_static/good_practices/container_bar_metal_arch.png b/source/_static/good_practices/container_bar_metal_arch.png new file mode 100644 index 0000000000000000000000000000000000000000..f951ff3e07a83a238440a8314616f2f139d37f86 Binary files /dev/null and b/source/_static/good_practices/container_bar_metal_arch.png differ diff --git a/source/_static/good_practices/container_boat.png b/source/_static/good_practices/container_boat.png new file mode 100644 index 0000000000000000000000000000000000000000..b0ee3973230c15bf10546c6b7c4dbd5e0b5764a0 Binary files /dev/null and b/source/_static/good_practices/container_boat.png differ diff --git a/source/_static/good_practices/container_boat_failed.png b/source/_static/good_practices/container_boat_failed.png new file mode 100644 index 0000000000000000000000000000000000000000..09941f51c5ebd360b544122a560ffa55f135f521 Binary files /dev/null and b/source/_static/good_practices/container_boat_failed.png differ diff --git a/source/_static/good_practices/container_boat_old.png b/source/_static/good_practices/container_boat_old.png new file mode 100644 index 0000000000000000000000000000000000000000..92df63542c329e50837292e844669721c0b4c7ed Binary files /dev/null and b/source/_static/good_practices/container_boat_old.png differ diff --git a/source/_static/good_practices/container_compare_arch.png b/source/_static/good_practices/container_compare_arch.png new file mode 100644 index 0000000000000000000000000000000000000000..4d317190afc878ed5ef58059e9c6616d10b4e255 Binary files /dev/null and b/source/_static/good_practices/container_compare_arch.png differ diff --git a/source/_static/good_practices/container_docker_arch.png b/source/_static/good_practices/container_docker_arch.png new file mode 100644 index 0000000000000000000000000000000000000000..2db785e23301fdd3e8bff32b2d1c4a9d2c3c589c Binary files /dev/null and b/source/_static/good_practices/container_docker_arch.png differ diff --git a/source/_static/good_practices/container_docker_main_usage.png b/source/_static/good_practices/container_docker_main_usage.png new file mode 100644 index 0000000000000000000000000000000000000000..929bbf8b6f5641b2c91a9d2c6d6d0e75fe6a585b Binary files /dev/null and b/source/_static/good_practices/container_docker_main_usage.png differ diff --git a/source/_static/good_practices/container_docker_workflow.png b/source/_static/good_practices/container_docker_workflow.png new file mode 100644 index 0000000000000000000000000000000000000000..054b07514d226b114a6a1eaa593544eb32f186ed Binary files /dev/null and b/source/_static/good_practices/container_docker_workflow.png differ diff --git a/source/_static/good_practices/container_performance.png b/source/_static/good_practices/container_performance.png new file mode 100644 index 0000000000000000000000000000000000000000..409e41b5c22c6bdb799884b85b09f1073ac50b2b Binary files /dev/null and b/source/_static/good_practices/container_performance.png differ diff --git a/source/_static/good_practices/container_pirates.png b/source/_static/good_practices/container_pirates.png new file mode 100644 index 0000000000000000000000000000000000000000..5da7f35de0f982725c1c0f80bde6dab26311f527 Binary files /dev/null and b/source/_static/good_practices/container_pirates.png differ diff --git a/source/_static/good_practices/container_vm_arch.png b/source/_static/good_practices/container_vm_arch.png new file mode 100644 index 0000000000000000000000000000000000000000..329799892f49edc8c6f00ec9893921e97f941cc1 Binary files /dev/null and b/source/_static/good_practices/container_vm_arch.png differ