README.md 18.9 KB
Newer Older
1
<img align="left" width="100" height="100" src="images/EpiFlow_logo.png" alt="ePeak">
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
2

3
# ePeak: from replicated chromatin profiling data to epigenomic dynamics
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
4
5


Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
6
7
8

[![Snakemake](https://img.shields.io/badge/snakemake-≥5.2.1-brightgreen.svg)](https://snakemake.readthedocs.io/en/stable/) [![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/)

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
9
## Table of content
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
10

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
11
[[_TOC_]] 
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
12

13
# What is ePeak ?
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
14

15
ePeak is a snakemake-based workflow for the analysis of ChIP-seq data from raw FASTQ files to differential analysis of transcription factor binding or histone modification marking. It streamlines critical steps like the quality assessment of the immunoprecipitation using the cross correlation and the replicate comparison for both narrow and broad peaks. For the differential analysis ePeak provides linear and non linear methods for normalisation between samples as well as conservative and stringent models for estimating the variance and testing the significance of the observed differences (see [chipflowr](https://gitlab.pasteur.fr/hub/chipflowr)).
16

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
17
<img src="images/epeak_workflow.svg" width="700">
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
18
19


Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
20

21
# How to install ePeak ?
Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
22
23

## Installation with singularity
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
24

25
Pre-required tools:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
26
27
28
29
30
31
- python >= 3.8
- snakemake >=4.8.0
- pandas
- pysam
- singularity

32
A tutorial to create a conda environment with all dependencies is available here : [env.sh](https://gitlab.pasteur.fr/hub/ePeak/-/blob/master/env.sh)
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
33
34
35
    
Download the singularity container:
 
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
36
` singularity pull --arch amd64 --name epeak.img  library://rlegendre/epeak/epeak:1.0` 
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
37

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
38
## Manual installation 
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
39

40
List of tools to install:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
41

42
43
44
45
46
- python >= 3.8
- snakemake >=4.8.0
- pandas
- pysam
- singularity
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
47
48
49
50
51
52
53
54
55
56
57
- cutadapt >= 3.2
- fastqc == 0.11.9
- samtools >= 1.10
- bowtie2 >= 2.3.5
- macs2 >= 2.2.7
- picard >= 2.18.25
- featureCounts (from subread) >= 2.0.0
- bedtools >= 2.27.1
- idr == 2.0.3
- R >= 4.0.3
- spp == 1.15.2
58
59
    - R cran packages: snow, snowfail, bitops, caTools, RCurl, Rcpp
    - R bioconductor packages: GenomeInfoDB, genomicRanges, Rsamtools
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
60
- [chipflowr](https://gitlab.pasteur.fr/hub/chipflowr)
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
61

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
62
## Run it on IFB Core Cluster without any installation
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
63

64
* Load modules:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
65
66
67
68
69
70
71
72
73

```
module load snakemake/6.5.0
module load python/3.7
module load singularity
module load git-lfs/2.13.1
module load pysam
```

74
* Clone workflow:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
75

76
`git clone https://gitlab.pasteur.fr/hub/ePeak.git`
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
77
78
79
80

* Download singularity container:

```
81
cd ePeak
82
singularity pull --arch amd64 --name epeak.img  library://rlegendre/epeak/epeak:1.0
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
83
84
```

85
Then, you can configure the workflow.
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
86

87
# How to run ePeak ?
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
88

89
> Note: you can test the pipeline with an example datasets: [here](https://gitlab.pasteur.fr/hub/ePeak#run-the-pipeline-on-test-data)
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
90

91
## Usage
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
92

93
* Step 1: Download workflow
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
94

95
`git clone https://gitlab.pasteur.fr/hub/ePeak.git`
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
96

97
In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and, its DOI (see [above](https://gitlab.pasteur.fr/hub/ePeak#how-to-cite-ePeak-))
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
98

99
If you are using Singularity, you need to copy the Singularity image in the cloned ePeak directory.
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
100

101
*  Step 2: Configure workflow
102

103
To configure your analysis, you need to fill 2 configuration files, one to specify the design experimental of you project and one to parameter each step of the pipeline (stored in `config/`):
104
105
- [config.yaml](https://gitlab.pasteur.fr/hub/ePeak#how-to-fill-the-design)
- [design.txt](https://gitlab.pasteur.fr/hub/ePeak#how-to-fill-the-design)
106

107
In addition, you can customize the MultiQC report by configuring this file: [multiqc_config.yaml](https://gitlab.pasteur.fr/hub/ePeak#how-to-fill-the-multiqc-config)
108

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
109
110
*  Step 3: Execute workflow

111
Test your configuration by performing a dry-run via:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
112

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
113
`snakemake --use-singularity -n --cores 1`
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
114

115
Execute the workflow locally using $N cores via:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
116

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
117
`snakemake --use-singularity --singularity-args "-B '/home/login/'" --cores $N`
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
118

119
Run it in a cluster environment via this standard command line:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
120

121
`snakemake --use-singularity --singularity-args "-B '$HOME'" --cluster-config config/cluster_config.json --cluster "sbatch --mem={cluster.ram} --cpus-per-task={threads} " -j 200 --nolock --cores 1`
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
122

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
123
124
125
126
127
128
129
130
Or specifically on Slurm cluster:

`sbatch snakemake --use-singularity --singularity-args "-B '$HOME'" --cluster-config config/cluster_config.json --cluster "sbatch --mem={cluster.ram} --cpus-per-task={threads} " -j 200 --nolock --cores $SLURM_JOB_CPUS_PER_NODE`

or on SGE cluster:

`snakemake --use-singularity --singularity-args "-B '$HOME'" --cluster-config config/cluster_config.json --cluster "qsub -cwd -V -b y -l h_vmem={cluster.ram} -pe [PE] {threads}" -j 200 --nolock --cores 1`

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
131

132
Visualize how the rules are connected via:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
133
134
135

`snakemake -s Snakefile --rulegraph --nolock | dot -Tsvg > rulegraph.svg`

136
or how the files are processed via:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
137
138
139
140
141
142
143
144
145
146
147

`snakemake -s Snakefile -j 10 --dag --nolock | dot -Tsvg > dag.svg`


### Rename FASTQ files

All FASTQ files have to observe the following name nomenclature: `MARK_COND_REP_MATE.fastq.gz`. For FASTQ files with only one mate (single-end sequencing), or if no replicates are available, set MATE to R1 or REP to Rep1, respectively.

| Wildcard |                          Description                                |
|----------|---------------------------------------------------------------------|
|   MARK   | Histone mark or Transcription Factor (TF) name (i.e. H3K4me1, Klf4) |
148
|   COND   | Biological condition name (i.e. Normal, Cancer, Cells)              |
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
149
150
151
152
153
|   REP    | Replicate number (i.e. Rep1 or Rep2)                                |
|   MATE   | Identification of mate pair sequencing (i.e. R1 or R2)              |

All the FASTQ files must be stored in the same directory.

154
155
156
157
158
159
Example of FASTQ file names:

- `H3K27ac_shCtrl_Rep1_R1.fastq.gz` and its corresponding INPUT file: `INPUT_shCtrl_Rep1_R1.fastq.gz`
- `TF4_HeLa_rep1_R1.fastq.gz` and its corresponding INPUT file: `Input_HeLa_rep1_R1.fastq.gz`
- `CTCF_WT_REP1_R1.fastq.gz` and its corresponding INPUT file: `INPUT_WT_REP1_R1.fastq.gz`

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
160
161
### How to fill the design

162
The experimental analysis design is summarised in a tabulated design file that the user have to fill before running the pipeline.
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
163
164
165
166
167
168
169

Design columns:

|    Column    |                          Description                          |
|--------------|---------------------------------------------------------------|
|   IP_NAME    | IP FASTQ files prefix  (i.e. MARK_COND1)                      |
|   INPUT_NAME | INPUT FASTQ files prefix (i.e. INPUT_COND1)                   |
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
170
171
|   NB_IP      | Replicate number of the histone mark or TFs (i.e. 1 or 2)     |
|   NB_INPUT   | Replicate number of INPUT file (i.e. 1 or 2)                  |
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
172

173
Link to an Example: [design.txt](https://gitlab.pasteur.fr/hub/ePeak/-/blob/master/test/design.txt)
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
174
175
176
177


### How to fill the config file

178
All the parameters to run the pipeline are gathered in a YAML config file that the user has to fill before running the pipeline. Here is an filled example: [config.yaml](https://gitlab.pasteur.fr/hub/ePeak/-/blob/master/test/config.yaml)
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224

This config file is divided in 2 sections:

1. Section 1: Running pipeline chunks

This first chunk provides input information and assigns working directories. 
`input_dir` path to FASTQ files directory. 
`input_mate` mate pair format (i.e. `_R[12]` for *MATE* = R1 or R2) , must match the *MATE* parameter in FASTQ files.
`input_extension` filename extension format (i.e. `fastq.gz` or `fq.gz`).
`analysis_dir` path to analysis directory.
`tmpdir` path to temporary directory (i.e. `/tmp/` or `/local/scratch/`)

```
input_dir: /path/to/raw_data
input_mate: '_R[12]'
input_extension: '.fastq.gz'
analysis_dir: /path/to/directory/analysis
tmpdir: "/tmp/"
```

The design chunk aims to check that the FASTQ files name match the design file information. The `marks`, `conditions` and `replicates` parameters must respectively match the *MARK*, *COND* and *REP* parameters of the FASTQ files name and the design file. 
For spike-in data, set `spike` on "True" and provide the spike-in genome FASTA file path through the `spike_genome_file` parameter.

```
design:
    design_file: /path/to/directory/analysis/config/design.txt    
    marks: K4Me3
    condition: C, U
    replicates: Rep
    spike: false
    spike_genome_file: /path/to/genome/directory/dmel9.fa
```

2. Section 2: Rule parameters chunks

Each step has it proper setting in independent chunk.
The second section of the configuration file is divided in multiple chunks. Each chunk gather the parameters of one step of the pipeline: `adapters`, `bowtie2_mapping`, `mark_duplicates`, `remove_blacklist`, `peak_calling`, `compute_idr` and `differential_analysis`. 

The `options` parameter present in `adapters`, `bowtie2_mapping` and `peak_calling` allows you to provide any parameter recognised by cutadapt, bowtie2 and macs2 respectively. For example for the `bowtie2_mapping` chunk, `options` can be fill with "--very-sensitive".

```
bowtie2_mapping:
    options: "--very-sensitive"
    threads: 4
```

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
225
226
227
228
229
230
231
232
### How to fill the multiqc config

At the beginning of `config/multiqc_config.yaml` file, you have the possibility to customize header of MultiQC report according to your experiment as you can see below: 

```
# Title to use for the report.
title: "ChIP-seq analysis"
subtitle: "ChIP-seq analysis of CTCF factor in breast tumor cells"                  # Set your own text
233
intro_text: "MultiQC reports summarise analysis results produced from ePeak"     
Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
234
235
236
237
238
239
240
241
242
243
244
245
246

# Add generic information to the top of reports
report_header_info:
    - Contact E-mail: 'rlegendre@pasteur.fr'                                        # Set your own text
    - Application Type: 'ChIP-seq'                                                  # Set your own text
    - Project Type: 'Differential peak expression'                                  # Set your own text
    - Sequencing Platform: 'NextSeq2000'                                            # Set your own text
    - Sequencing Setup: 'PE75'                                                      # Set your own text
```
<img src="images/multiqc_header.png" width="600">



247
## ePeak running modes
248
249
250
251
252
253
254
255
256
257
258

After the read pre-processing steps of the pipeline, MACS2 peak calling software is used to identify enriched regions. Several settings of MACS2 are possible: 
- To estimate the fragment size you can: use MACS2 model (default) or use PhantomPeakQualTool.
- To call the peaks you can either choose narrow or broad.

For the selection of reproducible peaks between replicates we implemented 2 methods:
- The IDR that is the recommended method by ENCODE to measure consistency between replicates for narrow peaks.
- The intersection approach that select the peaks that overlap (default:80%) between the replicates. We optimised this approach for broad peaks. Indeed, the IDR is not optimised for broad peaks. This because the broader size of the peaks induces a bigger position and signal variability making that very few peaks are passing the IDR.

In summary, the user as the possibility to choose multiple fragment size estimation method, multiple peak calling modes and multiple methods of selection of reproducible peaks between replicates. The following subsection shows example of running mode that we advise:

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
259
### Default narrow mode with IDR
260
261
262
263
264
265
266
267
268
269
270

```
peak_calling:
    mode_choice: 'narrow'    
    no_model: no             
    options: "--keep-dup all "
    cutoff: 0.1
    genomeSize: mm


compute_idr:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
271
    do: yes
272
273
    rank: 'signal.value'
    thresh: 0.05
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
274
275
276
277


intersectionApproach: 
    do: no
278
    ia_overlap: 0.8
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
279

280
281
282
283
```

> If none or very few peaks pass the IDR, this means that there is to much variability between your replicates aka they are probably not good replicates. If you want to proceed anyway with the analysis, you can use the intersection approach (less stringent) instead of the IDR by setting `intersectionApproach` to 'yes'.

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
284
### Default broad mode with intersection approach
285
286
287
288
289
290
291
292
293
294
295

In broad mode, perform a stringent peak calling (cutoff of 0.01) and use Intersection Approach. Overlapping parameter of IA on broad peaks is set at 0.8.

```
peak_calling:
    mode_choice: 'broad'    
    no_model: no             
    options: "--keep-dup all --broad-cutoff 0.01"
    cutoff: 0.01
    genomeSize: mm
    
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
296

297
compute_idr:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
298
    do: no
299
300
    rank: 'signal.value'
    thresh: 0.01
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
301
302
303
304


intersectionApproach: 
    do: yes
305
    ia_overlap: 0.8
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
306

307
308
```

309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
### Default mode for cut&run

With cut&run data, make deduplication only on INPUT/IgG data (dedup_IP to False). Then perform a stringent peak calling with SEACR and use Intersection Approach. Overlapping parameter of IA on peaks is set at 0.8.

```
mark_duplicates:
    do: yes
    dedup_IP: 'False' 
    threads: 4

seacr:
    do: yes
    threshold: 'stringent'
    norm: 'norm'

compute_idr:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
325
    do: no
326
327
    rank: 'signal.value'
    thresh: 0.01
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
328
329
330
331


intersectionApproach: 
    do: yes
332
333
334
    ia_overlap: 0.8
```

335

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
336
## Run the pipeline on test data 
337
338
339
340
341

You need to have sra-toolkit installed before to download test data.

```
# Download genome references
342
343
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/mm10.fa.gz
gunzip mm10.fa.gz
344
345
346
347
348
349


# get blacklisted regions
wget http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/mm10-mouse/mm10.blacklist.bed.gz
gunzip mm10.blacklist.bed.gz

350
351
352
353
# download bed file for GeneBody plot
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/mm10.refGene.gtf.gz
gunzip mm10.refGene.gtf.gz

354
355
# create genome directory
mkdir genome
356
mv mm10.* genome/
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378

# copy config file
cp test/config.yaml config/config.yaml
cp test/design.txt config/design.txt

# Download FastQ files from GEO (GSE99009) https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99009
# Only include the H3K27ac histone mark and Klf4 transcription factor with their associated inputs for the shUbc9 and shCtrl conditions
SRR=("SRR5572646" "SRR5572647" "SRR5572658" "SRR5572659" "SRR5572668" "SRR5572669" "SRR5572676" "SRR5572677" "SRR5572652" "SRR5572653" "SRR5572664" "SRR5572665")
sample=("H3K27ac_shCtrl_Rep1_R1" "H3K27ac_shCtrl_Rep2_R1" "H3K27ac_shUbc9_Rep1_R1" "H3K27ac_shUbc9_Rep2_R1" "Klf4_shCtrl_Rep1_R1" "Klf4_shCtrl_Rep2_R1" "Klf4_shUbc9_Rep1_R1" "Klf4_shUbc9_Rep2_R1" "INPUT_shCtrl_Rep1_R1" "INPUT_shCtrl_Rep2_R1" "INPUT_shUbc9_Rep1_R1" "INPUT_shUbc9_Rep2_R1")

mkdir data
cd data
for i in  ${!SRR[*]} ; do
    echo ${SRR[$i]}, ${sample[$i]}
    prefetch ${SRR[$i]} -o ${sample[$i]}.sra
    fastq-dump ${sample[$i]}.sra
done

rm *.sra 
for file in *.fastq ; do 
    pigz $file ; 
done
379

380
381
382
```


Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
383
# Q&A
384

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
385
386
**What's a IP?**

387
- IP stand for ImmunoPrecipitation. The IP samples contains the DNA that bind the protein of interest (i.e. H3K4me1, CTCF, TFIIA). These samples has been cross-linked, sonicated and immuno-precipitated.
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
388

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
389
**What's an INPUT?**
390
391
- INPUT stand for control DNA. The control DNA has been cross-linked and sonicated but not immuno-precipitated.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
392
**Replicate format in FASTQ file name?**
393
394
395
 - Rep1, rep1 and REP1 accepted, all samples must have the same format:
 - Rep1 and Rep2 works but Rep1 and REP2 doesn't

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
396
**What is the `IP_NAME` format columns? (i.e. What if I want to specify something more (species, incubation time...))**
Maëlle Daunesse's avatar
Maëlle Daunesse committed
397
 - Inside `IP_NAME` you can use "-" but do not "\_" because this is used to separate `MARK`, `COND`, `REP` and `MATE` from FASTQ filenames. For example: `TF4_Ctl-HeLa_rep1_R1.fastq.gz`
398

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
399
**Can I use relative path in config ?**
400
- yes, but you need to consider ePeak directory as origin.
401
 
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
402
**What if I have 3 INPUT replicates?**
403
 - You can use ePeak with more than 2 replicates, replicates will be merged in a Maxi Pool.
404

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
405
**What if I have 3 IP replicate?**
406
407
 - The IDR for 3 IP replicates is not yet implemented.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
408
**What INPUT is used when I have 2 INPUTs?**
409
 - First INPUT could be used for peak calling on all IP to reduce technical variability, but each IP could be associated to a specific INPUT in the design file.
410

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
411
**What if I have no INPUT ?**
412
413
- The pipeline failed to run if no INPUT are done. Following ENCODE guidelines, control experiment must be performed and are used during peak calling.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
414
**What if I have one INPUT for one mark, and two for another ?**
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
415
- Experimental design with different numbers of replicates for IP and/or INPUT are included to ChIPflow. 
416

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
417
**What if I have one replicate for one mark ?**
418
419
- The pipeline will stop after peak calling, without selection of peaks as reproducibility cannot be ensure with only one replicate.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
420
**What are the default ADAPTERS for cutadapt?**
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
421
- By default, indexes from TruSeq and NEBNext Kits for Illumina are done in the config/adapt.fa.
422

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
423
**Can I give my own adapters list for cutadapt?**
424
 - In-house adapters could be added to the config/adapt.fa or in an independent file. Don't forget to change the value of `adapter_list` in cutadapt rule in `config.yaml` file.
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
425

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
426
**I want to test other options for Bowtie2, how can I re-run the mapping step ?**
427
- As any Snakemake pipeline, you can force the re-calculation of one or more rule. After changing the options of `bowtie2_mapping` chunk, you can restart the pipeline adding this snakemake option : `--forcerun bowtie2_mapping`. All rules depending of `bowtie2_mapping` will be re-calculated.
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
428

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
429
**Can I force the re-calculation of all the steps ?**
430
431
- Yes, you can add this snakemake option `--forceall` to force the execution of the first rule.

432
**Can I rename ePeak directory ?**
433
434
- yes, but must to be the same as in config.yaml (`analysis_dir`) or use relative path

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
435
**The pipeline fails because the IDR doesn't select enough reads?**
436
- If none or very few peaks pass the IDR, this means that there is to much variability between your replicates aka they are probably not good replicates. If you want to proceed anyway with the analysis, you can use the intersection approach (less stringent) instead of the IDR by setting `intersectionApproach:do` to 'yes'.
437

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
438
**When should I use PhantompeakCalling's fragment size estimation instead of MACS2's one?**
439
440
- If MACS2 cannot compute the fragment size estimation (or if you want), set `no_model` to yes, and the fragment length use for MACS2 will be the one computed by PhantompeakQualTools for each sample.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
441
**What if I don't know if my chromatim factor in narrow or broad?**
442
- The output directories names of peak Calling, peak reproducibility and differential analysis steps includes the peak calling mode name, the peak reproducibility method name and the normalisation and variance estimation method name. This allows ePeak to test multiple combinations of peak calling, peak reproducibility and differential analysis parameters without erasing any output.
443
- For example, if you have run the pipeline in narrow mode, and you want broad mode, you just need to modify the corresponding parameter in the configuration YAML file. The pipeline will then restart at the peak calling step and all the output will be stored in "06-PeakCalling/{}" directories.
444
- In case of unknown chromatin factor, we advice to run ePeak in narrow mode with IDR and IA, and afterward in broad mode. Results from narrow peak calling will be stored in "06-PeakCalling/Narrow" directory, and in "06-PeakCalling/Broad" for broad peak calling. 
445
446
447


---
Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
448
# Authors
449
450
451
452
453
454
455
456

* Rachel Legendre (@rlegendr)
* Maëlle Daunesse
* Adrien Pain
* Hugo Varet
* Claudia Chica


457
# How to cite ePeak ?
458
459

https://doi.org/10.1101/2021.02.02.429342