RNAsig
Authors
- Rachel Legendre (@rlegendr)
What is RNAsig ?
RNAsig is a snakemake pipeline dedicated to identify RNA signature on Retinoic acid-inducible gene I (RIG-I)-like receptors (RLRs) during viral infection. These analysis are based on high-throughput identification of viral RNA ligands for RIG-I, MDA5 or LGP2 cytoplasmic sensors.
How to install RNAsig ?
Installation with singularity
You need to install:
- python >= 3.6
- snakemake >=4.8.0
- pandas
- singularity
Download the singularity container.
singularity pull --arch amd64 --name rnasig.img library://rlegendre/default/rnasig:1.0
Manual installation
In addition to above tools, you need to install pipeline-related tools:
- cutadapt
- fastqc
- samtools
- bowtie2
- bedtools
- R (>= 4.0.2)
- ggplot2
- tidyverse
- gggenes
- rtracklayer
How to run RNAsig ?
Usage
- Step 1: Install workflow
If you simply want to use this workflow, download and extract the latest release.
git clone git@gitlab.pasteur.fr:rlegendr/rnasig.git
In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository.
- Step 2: Configure workflow
Configure the workflow according to your needs via editing the config.yaml, design.txt and multiqc_config.yaml files in the config/ directory.
- Step 3: Execute workflow
Test your configuration by performing a dry-run via
snakemake --use-singularity -n
run it in a cluster environment via:
snakemake --use-singularity --singularity-args "-B '/home/login/'" --cluster-config config/cluster_config.json --cluster "sbatch --mem={cluster.ram} --cpus-per-task={threads} " -j 200 --nolock
Visualize how the rules are connected via:
snakemake -s Snakefile --rulegraph --nolock | dot -Tsvg > rulegraph.svg
or how the files are processed via:
snakemake -s Snakefile -j 10 --dag --nolock | dot -Tsvg > dag.svg
Structure of data
graph TD
A[TOTAL] -->|Receptor| F(Cherry)
D[BEAD] --> F(Cherry)
A[TOTAL] -->|Receptor| G(RIG-I)
D[BEAD] --> G(RIG-I)
A[TOTAL] -->|Receptor| H(MDA5)
D[BEAD] --> H(MDA5)
A[TOTAL] -->|Receptor| E(LGP2)
D[BEAD] --> E(LGP2)
F --> C(#1)
F --> |replicate| B(#2)
F --> I(#3)
G --> C(#1)
G --> |replicate| B(#2)
G --> I(#3)
H --> C(#1)
H --> |replicate| B(#2)
H --> I(#3)
E --> C(#1)
E --> |replicate| B(#2)
E --> I(#3)
Rename FASTQ files
All FASTQ files have to observe the following name nomenclature: CONDITION-RECEPTOR-REPLICATE_MATE.fastq.gz
.
Wildcard | Description |
---|---|
CONDITION | Fraction of ST-RLR and ST-CH RNA, could be Total or Bead |
RECEPTOR | Cytoplasmic sensors names (i.e. Cherry, RIGI, LGP2, MDA5) |
REPLICATE | Replicate number (i.e. Rep1 or Rep2) |
MATE | Identification of mate pair sequencing (i.e. R1) |
All the FASTQ files must be stored in the same directory.
Example of FASTQ file names:
Total-Cherry-Rep1_R1.fastq.gz
Bead-RIGI-Rep1_R1.fastq.gz
TOTAL-LGP2-REP1_R1.fastq.gz
BEAD-MDA5-rep3_R1.fastq.gz
How to fill the design
The experimental analysis design is summarised in a tabulated design file that the user have to fill before running the pipeline.
Design columns:
Column | Description |
---|---|
File | FASTQ files prefix (i.e. Bead-Cherry-Rep1) |
Cond | Condition (i.e. BEAD or TOTAL) |
Receptor | RLR receptor name (i.e. Cherry or LGP2 or RIG-I or MDA5) |
replicate | Number of replicates of file (specify one line by raplicate) |
Link to an Example: design.txt
How to fill the config file
- Genome Section
genome:
genome_directory: /path/to/genome/directory/
name: measle
fasta_file: /path/to/genome/directory/measle.fa
gff_file: /path/to/genome/directory/measle.gff
host_mapping: true
host_name: hg38
host_fasta_file: /path/to/genome/directory/hg38.fa
- Read trimming
Cutadapt version 3.4 is used for reads trimming and cleaning.
adapters:
remove: yes
adapter_list: file:config/TruSeq_Stranded_RNA.fa
m: 25
mode: a
options: -O 6 --trim-n --max-n 1
quality: 30
threads: 4
- Read mapping
Bowtie version 2.3.5.1 was used for alignment on the reference genome (both viral and host genomes). Then bedtools v2.27.1 is used to compute read coverage at each genome position for each strand.
bowtie2_mapping:
options: "--very-sensitive "
threads: 4
Plot RNA Signatures
Analyses were performed with R version 4.1.0, and bioconductor packages ggplot2 and dplyr as described in Chazal et al. Bead samples read coverage were normalized by mean read coverage of their Total sample. Then, normalized Bead samples were normalized by the mean of the triplicates for Cherry samples, at each genomic position, to obtain RLR binding. For each receptor (and cherry), RLR binding were plotted using ggplot2.
Example of plot with Measle data from Chazal et al
Note that the R script run only on viral genome with one uniq chromosome. If studied virus has several segment, each segment need to be analysed separately.
literature
https://elifesciences.org/articles/11275
https://www.cell.com/cell-reports/fulltext/S2211-1247(18)30957-4