README.md 4.03 KB
Newer Older
Amine  GHOZLANE's avatar
Amine GHOZLANE committed
1
# Identification and quantification of bacteria present in mouse microbiota from line OligoMM12
Amine  GHOZLANE's avatar
Amine GHOZLANE committed
2
[Amine Ghozlane](https://research.pasteur.fr/fr/member/amine-ghozlane/) (amine.ghozlane@pasteur.fr)  
Amine  GHOZLANE's avatar
Amine GHOZLANE committed
3
Quentin Letourneur  
Amine  GHOZLANE's avatar
Amine GHOZLANE committed
4
Fabien Mareuil
Amine  GHOZLANE's avatar
Amine GHOZLANE committed
5 6 7 8 9 10 11 12 13

## Contents

- [Introduction](#introduction)
- [Usage in galaxy](#usage in galaxy)
- [Usage in command line](#usage in command line)

## Introduction

Amine  GHOZLANE's avatar
Amine GHOZLANE committed
14
The project [#68](https://biomics.pasteur.fr/projects/Project/?id=68) is aimed at determine the relative proportions of murine bacteria in the gut of the controlled microbiota mouse line OligoMM12,  
Amine  GHOZLANE's avatar
Amine GHOZLANE committed
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
bred at the animal facility of the Institut Pasteur. The workflow consist in the following steps:  
<img src="img/workflow.png" align="center" />

4 types of file are generated:
- Count file: table with number of reads aligned per sample against each reference bacterial genomes of MM12 line.
- Annotation file: table that provide the annotation of reads at blast and diamond step.
- Krona: html file containing number of reads per sample at quantification step (corresponding to count file) and after the two steps of annotation by blast and diamond for not-annotated-sample_name (counted from annotation file)/
- Resume: Table that summary the number of reads obtained filtering, trimming, aligned against reference bacterial genomes and annotated with nt and nr

## Usage in galaxy
The workflow is deployed on galaxy.pasteur.fr in section Animalerie/animalerie-wf (1) as follow:
<img src="img/galaxy.png" align="center" />

First, fastq files need to be loaded in an history to be available for the workflow (2). Here 2 samples were loaded 1:MM12-W1 and 2:MM12-W2.  
If fastq file size is below 2Go, it can be loaded directly in galaxy with upload button.  
Otherwise, they need to be uploaded using filezilla as described in slides 25-30 :
[galaxy doc](https://c3bi-pasteur-fr.github.io/Galaxy_training_material/galaxy_initiation/slides/galaxy_initiation#1)

Then, in section (3),  each fastq sample need to be added separately using “insert inputs data” and a name must be indicated for each sample in section sample name (W1, W2).

## Usage in command line
Amine  GHOZLANE's avatar
Amine GHOZLANE committed
36
The workflow is also available on Institut Pasteur’s cluster named tars.pasteur.fr, as follow:  
Amine  GHOZLANE's avatar
Amine GHOZLANE committed
37 38 39 40 41 42 43

Load the workflow
```
$ module use /pasteur/projets/Matrix/modules  
$ module add java/1.8.0 animalerie-wf/tars
```

Amine  GHOZLANE's avatar
Amine GHOZLANE committed
44
Check the help
Amine  GHOZLANE's avatar
Amine GHOZLANE committed
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
```
$ animalerie-wf --help
N E X T F L O W  ~  version 0.27.6  
Launching `/pasteur/projets/policy01/Matrix/metagenomics/animalerie-wf/blast_approach_tars.nf` [irreverent_swirles] - revision: c739c17047  
animalerie-wf.nf --in <input_dir> --out <output_dir>  
--in Directory containing fastq files (Single end only, in format .fastq or .fastq.gz, default /pasteur/projets/policy01/Matrix/metagenomics/animalerie-wf/test/).  
--out Output directory (default /pasteur/homes/aghozlan/).  
--kronaout Output krona file (default /pasteur/homes/aghozlan/krona.html)  
--resumeout Output resume file (default /pasteur/homes/aghozlan/resume_table.tsv)  
--annotationout Output annotation file (default /pasteur/homes/aghozlan/annotation_table.tsv)  
--countout Output count file (default /pasteur/homes/aghozlan/count_table.tsv)  
--cpus Number of cpus for process (default 6)  
-w Temporary output (usually /pasteur/scratch/animalerie-wf)  
--identity Read minimum identity with blast in percent (Default 50)  
--coverage Read minimum coverage with blast in percent (Default 50)  
--dia_identity Read minimum identity with diamond in percent (Default 40)  
--dia_coverage Read minimum coverage with diamond in percent (Default 40)  
--evalue E-value threshold (Default 0.001)  
```

Reads in fastq or fastq.gz format need to be in one directory (here fastq_dir) and the workflow also need an output directory as follow: 
```
$ animalerie-wf.nf --in fastq_dir/ --out ouput_dir/
```

When running calculation in batch mode, a good practice is to use the following command:
```
$ sbatch --partition common --qos normal --mem 4000 --wrap="animalerie-wf --in fastq_dir/ --out ouput_dir/ > progress.txt"
```
Check progress:
```
$ tail -f  progress.txt
```