Skip to content
Snippets Groups Projects

RNAsig

Snakemake Python 3.6

Authors

What is RNAsig ?

RNAsig is a snakemake pipeline dedicated to identify RNA signature on Retinoic acid-inducible gene I (RIG-I)-like receptors (RLRs) during viral infection. These analysis are based on high-throughput identification of viral RNA ligands for RIG-I, MDA5 or LGP2 cytoplasmic sensors.

How to install RNAsig ?

Installation with singularity

You need to install:

  • python >= 3.6
  • snakemake >=4.8.0
  • pandas
  • singularity

Download the singularity container. singularity pull --arch amd64 --name rnasig.img library://rlegendre/default/rnasig:1.0

Manual installation

In addition to above tools, you need to install pipeline-related tools:

  • cutadapt
  • fastqc
  • samtools
  • bowtie2
  • bedtools
  • R (>= 4.0.2)
    • ggplot2
    • tidyverse
    • gggenes
    • rtracklayer

How to run RNAsig ?

Usage

  • Step 1: Install workflow

If you simply want to use this workflow, download and extract the latest release.

git clone git@gitlab.pasteur.fr:rlegendr/rnasig.git

In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository.

  • Step 2: Configure workflow

Configure the workflow according to your needs via editing the config.yaml, design.txt and multiqc_config.yaml files in the config/ directory.

  • Step 3: Execute workflow

Test your configuration by performing a dry-run via

snakemake --use-singularity -n

run it in a cluster environment via:

snakemake --use-singularity --singularity-args "-B '/home/login/'" --cluster-config config/cluster_config.json --cluster "sbatch --mem={cluster.ram} --cpus-per-task={threads} " -j 200 --nolock

Visualize how the rules are connected via:

snakemake -s Snakefile --rulegraph --nolock | dot -Tsvg > rulegraph.svg

or how the files are processed via:

snakemake -s Snakefile -j 10 --dag --nolock | dot -Tsvg > dag.svg

Structure of data

graph TD
    A[TOTAL] -->|Receptor| F(Cherry)
    D[BEAD] --> F(Cherry)
    A[TOTAL] -->|Receptor| G(RIG-I)
    D[BEAD] --> G(RIG-I)
    A[TOTAL] -->|Receptor| H(MDA5)
    D[BEAD] --> H(MDA5)
    A[TOTAL] -->|Receptor| E(LGP2)
    D[BEAD] -->  E(LGP2)
    F --> C(#1)
    F --> |replicate| B(#2)
    F --> I(#3)
    G --> C(#1)
    G --> |replicate| B(#2)
    G --> I(#3)
    H --> C(#1)
    H --> |replicate| B(#2)
    H --> I(#3)
    E --> C(#1)
    E --> |replicate| B(#2)
    E --> I(#3)

Rename FASTQ files

All FASTQ files have to observe the following name nomenclature: CONDITION-RECEPTOR-REPLICATE_MATE.fastq.gz.

Wildcard Description
CONDITION Fraction of ST-RLR and ST-CH RNA, could be Total or Bead
RECEPTOR Cytoplasmic sensors names (i.e. Cherry, RIGI, LGP2, MDA5)
REPLICATE Replicate number (i.e. Rep1 or Rep2)
MATE Identification of mate pair sequencing (i.e. R1)

All the FASTQ files must be stored in the same directory.

Example of FASTQ file names:

  • Total-Cherry-Rep1_R1.fastq.gz
  • Bead-RIGI-Rep1_R1.fastq.gz
  • TOTAL-LGP2-REP1_R1.fastq.gz
  • BEAD-MDA5-rep3_R1.fastq.gz

How to fill the design

The experimental analysis design is summarised in a tabulated design file that the user have to fill before running the pipeline.

Design columns:

Column Description
File FASTQ files prefix (i.e. Bead-Cherry-Rep1)
Cond Condition (i.e. BEAD or TOTAL)
Receptor RLR receptor name (i.e. Cherry or LGP2 or RIG-I or MDA5)
replicate Number of replicates of file (specify one line by raplicate)

Link to an Example: design.txt

How to fill the config file

  1. Genome Section
genome:
    genome_directory: /path/to/genome/directory/
    name: measle
    fasta_file: /path/to/genome/directory/measle.fa
    gff_file: /path/to/genome/directory/measle.gff
    host_mapping: true
    host_name: hg38
    host_fasta_file: /path/to/genome/directory/hg38.fa
  1. Read trimming

Cutadapt version 3.4 is used for reads trimming and cleaning.

adapters:
    remove: yes
    adapter_list: file:config/TruSeq_Stranded_RNA.fa
    m: 25
    mode: a
    options: -O 6 --trim-n --max-n 1 
    quality: 30
    threads: 4
  1. Read mapping

Bowtie version 2.3.5.1 was used for alignment on the reference genome (both viral and host genomes). Then bedtools v2.27.1 is used to compute read coverage at each genome position for each strand.

bowtie2_mapping:
    options: "--very-sensitive "
    threads: 4

Plot RNA Signatures

Analyses were performed with R version 4.1.0, and bioconductor packages ggplot2 and dplyr as described in Chazal et al. Bead samples read coverage were normalized by mean read coverage of their Total sample. Then, normalized Bead samples were normalized by the mean of the triplicates for Cherry samples, at each genomic position, to obtain RLR binding. For each receptor (and cherry), RLR binding were plotted using ggplot2.

Example of plot with Measle data from Chazal et al

Note that the R script run only on viral genome with one uniq chromosome. If studied virus has several segment, each segment need to be analysed separately.

literature

https://elifesciences.org/articles/11275

https://www.cell.com/cell-reports/fulltext/S2211-1247(18)30957-4