Commit df6230e8 authored by Amine  GHOZLANE's avatar Amine GHOZLANE

Add README file

parent 8449fa7d
# Identification and quantification of bacteria present in mouse microbiota from line OligoMM12
[Amine Ghozlane]( (
[Quentin Letourneur]
## Contents
- [Introduction](#introduction)
- [Usage in galaxy](#usage in galaxy)
- [Usage in command line](#usage in command line)
## Introduction
The project #68 ( is aimed at determine the relative proportions of murine bacteria in the gut of the controlled microbiota mouse line OligoMM12,
bred at the animal facility of the Institut Pasteur. The workflow consist in the following steps:
<img src="img/workflow.png" align="center" />
4 types of file are generated:
- Count file: table with number of reads aligned per sample against each reference bacterial genomes of MM12 line.
- Annotation file: table that provide the annotation of reads at blast and diamond step.
- Krona: html file containing number of reads per sample at quantification step (corresponding to count file) and after the two steps of annotation by blast and diamond for not-annotated-sample_name (counted from annotation file)/
- Resume: Table that summary the number of reads obtained filtering, trimming, aligned against reference bacterial genomes and annotated with nt and nr
## Usage in galaxy
The workflow is deployed on in section Animalerie/animalerie-wf (1) as follow:
<img src="img/galaxy.png" align="center" />
First, fastq files need to be loaded in an history to be available for the workflow (2). Here 2 samples were loaded 1:MM12-W1 and 2:MM12-W2.
If fastq file size is below 2Go, it can be loaded directly in galaxy with upload button.
Otherwise, they need to be uploaded using filezilla as described in slides 25-30 :
[galaxy doc](
Then, in section (3), each fastq sample need to be added separately using “insert inputs data” and a name must be indicated for each sample in section sample name (W1, W2).
## Usage in command line
The workflow is also available on Institut Pasteur’s cluster named, as follow:
Load the workflow
$ module use /pasteur/projets/Matrix/modules
$ module add java/1.8.0 animalerie-wf/tars
# Help
$ animalerie-wf --help
N E X T F L O W ~ version 0.27.6
Launching `/pasteur/projets/policy01/Matrix/metagenomics/animalerie-wf/` [irreverent_swirles] - revision: c739c17047 --in <input_dir> --out <output_dir>
--in Directory containing fastq files (Single end only, in format .fastq or .fastq.gz, default /pasteur/projets/policy01/Matrix/metagenomics/animalerie-wf/test/).
--out Output directory (default /pasteur/homes/aghozlan/).
--kronaout Output krona file (default /pasteur/homes/aghozlan/krona.html)
--resumeout Output resume file (default /pasteur/homes/aghozlan/resume_table.tsv)
--annotationout Output annotation file (default /pasteur/homes/aghozlan/annotation_table.tsv)
--countout Output count file (default /pasteur/homes/aghozlan/count_table.tsv)
--cpus Number of cpus for process (default 6)
-w Temporary output (usually /pasteur/scratch/animalerie-wf)
--identity Read minimum identity with blast in percent (Default 50)
--coverage Read minimum coverage with blast in percent (Default 50)
--dia_identity Read minimum identity with diamond in percent (Default 40)
--dia_coverage Read minimum coverage with diamond in percent (Default 40)
--evalue E-value threshold (Default 0.001)
Reads in fastq or fastq.gz format need to be in one directory (here fastq_dir) and the workflow also need an output directory as follow:
$ --in fastq_dir/ --out ouput_dir/
When running calculation in batch mode, a good practice is to use the following command:
$ sbatch --partition common --qos normal --mem 4000 --wrap="animalerie-wf --in fastq_dir/ --out ouput_dir/ > progress.txt"
Check progress:
$ tail -f progress.txt
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment