README.md 18.4 KB
Newer Older
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
1
2
3
4
5
<img align="left" width="100" height="100" src="images/EpiFlow_logo.png" alt="ChIPflow">

# ChIPflow: from replicated ChIP-seq raw data to epigenomic dynamics  


Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
6
7
8

[![Snakemake](https://img.shields.io/badge/snakemake-≥5.2.1-brightgreen.svg)](https://snakemake.readthedocs.io/en/stable/) [![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/)

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
9
## Table of content
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
10

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
11
[[_TOC_]] 
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
12

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
13
# What is ChIPflow ?
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
14

15
16
17
ChIPflow is a snakemake-based workflow for the analysis of ChIP-seq data from raw FASTQ files to differential analysis of transcription factor binding or histone modification marking. It streamlines critical steps like the quality assessment of the immunoprecipitation using the cross correlation and the replicate comparison for both narrow and broad peaks. For the differential analysis ChIPflow provides linear and non linear methods for normalisation between samples as well as conservative and stringent models for estimating the variance and testing the significance of the observed differences (see [chipflowr](https://gitlab.pasteur.fr/hub/chipflowr)).

<img src="images/chipflow_pipeline.svg" width="700">
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
18
19


Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
20
21
22
23

# How to install ChIPflow ?

## Installation with singularity
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
24

25
Pre-required tools:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
26
27
28
29
30
31
- python >= 3.8
- snakemake >=4.8.0
- pandas
- pysam
- singularity

32
A tutorial to create a conda environment with all dependencies is available here : [env.sh](https://gitlab.pasteur.fr/hub/chipflow/-/blob/master/env.sh)
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
33
34
35
    
Download the singularity container:
 
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
36
` singularity pull --arch amd64 --name chipflow.img  library://rlegendre/default/chipflow:latest ` 
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
37

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
38
## Manual installation 
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
39

40
List of tools to install:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
41

42
43
44
45
46
- python >= 3.8
- snakemake >=4.8.0
- pandas
- pysam
- singularity
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
47
48
49
50
51
52
53
54
55
56
57
- cutadapt >= 3.2
- fastqc == 0.11.9
- samtools >= 1.10
- bowtie2 >= 2.3.5
- macs2 >= 2.2.7
- picard >= 2.18.25
- featureCounts (from subread) >= 2.0.0
- bedtools >= 2.27.1
- idr == 2.0.3
- R >= 4.0.3
- spp == 1.15.2
58
59
    - R cran packages: snow, snowfail, bitops, caTools, RCurl, Rcpp
    - R bioconductor packages: GenomeInfoDB, genomicRanges, Rsamtools
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
60
- [chipflowr](https://gitlab.pasteur.fr/hub/chipflowr)
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
61

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
62
## Run it on IFB Core Cluster without any installation
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
63

64
* Load modules:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
65
66
67
68
69
70
71
72
73

```
module load snakemake/6.5.0
module load python/3.7
module load singularity
module load git-lfs/2.13.1
module load pysam
```

74
* Clone workflow:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
75
76
77
78
79
80

`git clone https://gitlab.pasteur.fr/hub/chipflow.git`

* Download singularity container:

```
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
81
82
cd chipflow
singularity pull --arch amd64 --name chipflow.img  library://rlegendre/default/chipflow:latest
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
83
84
```

85
Then, you can configure the workflow.
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
86
87
88

# How to run ChIPflow ?

89
> Note: you can test the pipeline with an example datasets: [here](https://gitlab.pasteur.fr/hub/chipflow#run-the-pipeline-on-test-data)
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
90

91
## Usage
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
92

93
* Step 1: Download workflow
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
94
95
96

`git clone https://gitlab.pasteur.fr/hub/chipflow.git`

97
In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and, its DOI (see [above](https://gitlab.pasteur.fr/hub/chipflow#how-to-cite-chipflow-))
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
98

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
99
If you are using Singularity, you need to copy the Singularity image in the cloned ChIPflow directory.
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
100

101
*  Step 2: Configure workflow
102

103
104
105
To configure your analysis, you need to fill 2 configuration files, one to specify the design experimental of you project and one to parameter each step of the pipeline (stored in `config/`):
- [config.yaml](https://gitlab.pasteur.fr/hub/chipflow#how-to-fill-the-design)
- [design.txt](https://gitlab.pasteur.fr/hub/chipflow#how-to-fill-the-design)
106

107
In addition, you can customize the MultiQC report by configuring this file: [multiqc_config.yaml](https://gitlab.pasteur.fr/hub/chipflow#how-to-fill-the-multiqc-config)
108

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
109
110
*  Step 3: Execute workflow

111
Test your configuration by performing a dry-run via:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
112

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
113
`snakemake --use-singularity -n --cores 1`
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
114

115
Execute the workflow locally using $N cores via:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
116

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
117
`snakemake --use-singularity --singularity-args "-B '/home/login/'" --cores $N`
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
118

119
Run it in a cluster environment via this standard command line:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
120

121
`snakemake --use-singularity --singularity-args "-B '$HOME'" --cluster-config config/cluster_config.json --cluster "sbatch --mem={cluster.ram} --cpus-per-task={threads} " -j 200 --nolock --cores 1`
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
122

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
123
124
125
126
127
128
129
130
Or specifically on Slurm cluster:

`sbatch snakemake --use-singularity --singularity-args "-B '$HOME'" --cluster-config config/cluster_config.json --cluster "sbatch --mem={cluster.ram} --cpus-per-task={threads} " -j 200 --nolock --cores $SLURM_JOB_CPUS_PER_NODE`

or on SGE cluster:

`snakemake --use-singularity --singularity-args "-B '$HOME'" --cluster-config config/cluster_config.json --cluster "qsub -cwd -V -b y -l h_vmem={cluster.ram} -pe [PE] {threads}" -j 200 --nolock --cores 1`

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
131

132
Visualize how the rules are connected via:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
133
134
135

`snakemake -s Snakefile --rulegraph --nolock | dot -Tsvg > rulegraph.svg`

136
or how the files are processed via:
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
137
138
139
140
141
142
143
144
145
146
147

`snakemake -s Snakefile -j 10 --dag --nolock | dot -Tsvg > dag.svg`


### Rename FASTQ files

All FASTQ files have to observe the following name nomenclature: `MARK_COND_REP_MATE.fastq.gz`. For FASTQ files with only one mate (single-end sequencing), or if no replicates are available, set MATE to R1 or REP to Rep1, respectively.

| Wildcard |                          Description                                |
|----------|---------------------------------------------------------------------|
|   MARK   | Histone mark or Transcription Factor (TF) name (i.e. H3K4me1, Klf4) |
148
|   COND   | Biological condition name (i.e. Normal, Cancer, Cells)              |
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
149
150
151
152
153
|   REP    | Replicate number (i.e. Rep1 or Rep2)                                |
|   MATE   | Identification of mate pair sequencing (i.e. R1 or R2)              |

All the FASTQ files must be stored in the same directory.

154
155
156
157
158
159
Example of FASTQ file names:

- `H3K27ac_shCtrl_Rep1_R1.fastq.gz` and its corresponding INPUT file: `INPUT_shCtrl_Rep1_R1.fastq.gz`
- `TF4_HeLa_rep1_R1.fastq.gz` and its corresponding INPUT file: `Input_HeLa_rep1_R1.fastq.gz`
- `CTCF_WT_REP1_R1.fastq.gz` and its corresponding INPUT file: `INPUT_WT_REP1_R1.fastq.gz`

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
160
161
### How to fill the design

162
The experimental analysis design is summarised in a tabulated design file that the user have to fill before running the pipeline.
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224

Design columns:

|    Column    |                          Description                          |
|--------------|---------------------------------------------------------------|
|   IP_NAME    | IP FASTQ files prefix  (i.e. MARK_COND1)                      |
|   INPUT_NAME | INPUT FASTQ files prefix (i.e. INPUT_COND1)                   |
|   NB_IP      | Number of replicates of the histone mark or TFs (i.e. 1 or 2) |
|   NB_INPUT   | Number of replicates of INPUT files (i.e. 1 or 2)             |

Link to an Example: [design.txt](https://gitlab.pasteur.fr/hub/chipflow/-/blob/master/test/design.txt)


### How to fill the config file

All the parameters to run the pipeline are gathered in a YAML config file that the user has to fill before running the pipeline. Here is an filled example: [config.yaml](https://gitlab.pasteur.fr/hub/chipflow/-/blob/master/test/config.yaml)

This config file is divided in 2 sections:

1. Section 1: Running pipeline chunks

This first chunk provides input information and assigns working directories. 
`input_dir` path to FASTQ files directory. 
`input_mate` mate pair format (i.e. `_R[12]` for *MATE* = R1 or R2) , must match the *MATE* parameter in FASTQ files.
`input_extension` filename extension format (i.e. `fastq.gz` or `fq.gz`).
`analysis_dir` path to analysis directory.
`tmpdir` path to temporary directory (i.e. `/tmp/` or `/local/scratch/`)

```
input_dir: /path/to/raw_data
input_mate: '_R[12]'
input_extension: '.fastq.gz'
analysis_dir: /path/to/directory/analysis
tmpdir: "/tmp/"
```

The design chunk aims to check that the FASTQ files name match the design file information. The `marks`, `conditions` and `replicates` parameters must respectively match the *MARK*, *COND* and *REP* parameters of the FASTQ files name and the design file. 
For spike-in data, set `spike` on "True" and provide the spike-in genome FASTA file path through the `spike_genome_file` parameter.

```
design:
    design_file: /path/to/directory/analysis/config/design.txt    
    marks: K4Me3
    condition: C, U
    replicates: Rep
    spike: false
    spike_genome_file: /path/to/genome/directory/dmel9.fa
```

2. Section 2: Rule parameters chunks

Each step has it proper setting in independent chunk.
The second section of the configuration file is divided in multiple chunks. Each chunk gather the parameters of one step of the pipeline: `adapters`, `bowtie2_mapping`, `mark_duplicates`, `remove_blacklist`, `peak_calling`, `compute_idr` and `differential_analysis`. 

The `options` parameter present in `adapters`, `bowtie2_mapping` and `peak_calling` allows you to provide any parameter recognised by cutadapt, bowtie2 and macs2 respectively. For example for the `bowtie2_mapping` chunk, `options` can be fill with "--very-sensitive".

```
bowtie2_mapping:
    options: "--very-sensitive"
    threads: 4
```

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
### How to fill the multiqc config

At the beginning of `config/multiqc_config.yaml` file, you have the possibility to customize header of MultiQC report according to your experiment as you can see below: 

```
# Title to use for the report.
title: "ChIP-seq analysis"
subtitle: "ChIP-seq analysis of CTCF factor in breast tumor cells"                  # Set your own text
intro_text: "MultiQC reports summarise analysis results produced from ChIPflow"     

# Add generic information to the top of reports
report_header_info:
    - Contact E-mail: 'rlegendre@pasteur.fr'                                        # Set your own text
    - Application Type: 'ChIP-seq'                                                  # Set your own text
    - Project Type: 'Differential peak expression'                                  # Set your own text
    - Sequencing Platform: 'NextSeq2000'                                            # Set your own text
    - Sequencing Setup: 'PE75'                                                      # Set your own text
```
<img src="images/multiqc_header.png" width="600">



## ChIPflow running modes
248
249
250
251
252
253
254
255
256
257
258

After the read pre-processing steps of the pipeline, MACS2 peak calling software is used to identify enriched regions. Several settings of MACS2 are possible: 
- To estimate the fragment size you can: use MACS2 model (default) or use PhantomPeakQualTool.
- To call the peaks you can either choose narrow or broad.

For the selection of reproducible peaks between replicates we implemented 2 methods:
- The IDR that is the recommended method by ENCODE to measure consistency between replicates for narrow peaks.
- The intersection approach that select the peaks that overlap (default:80%) between the replicates. We optimised this approach for broad peaks. Indeed, the IDR is not optimised for broad peaks. This because the broader size of the peaks induces a bigger position and signal variability making that very few peaks are passing the IDR.

In summary, the user as the possibility to choose multiple fragment size estimation method, multiple peak calling modes and multiple methods of selection of reproducible peaks between replicates. The following subsection shows example of running mode that we advise:

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
259
### Default narrow mode with IDR
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278

```
peak_calling:
    mode_choice: 'narrow'    
    no_model: no             
    options: "--keep-dup all "
    cutoff: 0.1
    genomeSize: mm


compute_idr:
    rank: 'signal.value'
    thresh: 0.05
    intersectionApproach: no
    ia_overlap: 0.8
```

> If none or very few peaks pass the IDR, this means that there is to much variability between your replicates aka they are probably not good replicates. If you want to proceed anyway with the analysis, you can use the intersection approach (less stringent) instead of the IDR by setting `intersectionApproach` to 'yes'.

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
279
### Default broad mode with intersection approach
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297

In broad mode, perform a stringent peak calling (cutoff of 0.01) and use Intersection Approach. Overlapping parameter of IA on broad peaks is set at 0.8.

```
peak_calling:
    mode_choice: 'broad'    
    no_model: no             
    options: "--keep-dup all --broad-cutoff 0.01"
    cutoff: 0.01
    genomeSize: mm
    
compute_idr:
    rank: 'signal.value'
    thresh: 0.01
    intersectionApproach: yes
    ia_overlap: 0.8
```

298

Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
299
## Run the pipeline on test data 
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341

You need to have sra-toolkit installed before to download test data.

```
# Download genome references
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/GCF_000001635.26_GRCm38.p6/GCF_000001635.26_GRCm38.p6_genomic.fna.gz
gunzip GCF_000001635.26_GRCm38.p6_genomic.fna.gz


# get blacklisted regions
wget http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/mm10-mouse/mm10.blacklist.bed.gz
gunzip mm10.blacklist.bed.gz

# create genome directory
mkdir genome
mv GCF_000001635.26_GRCm38.p6_genomic.fna genome/mm10.fa
mv mm10.blacklist.bed genome/mm10.blacklist.bed

# copy config file
cp test/config.yaml config/config.yaml
cp test/design.txt config/design.txt

# Download FastQ files from GEO (GSE99009) https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99009
# Only include the H3K27ac histone mark and Klf4 transcription factor with their associated inputs for the shUbc9 and shCtrl conditions
SRR=("SRR5572646" "SRR5572647" "SRR5572658" "SRR5572659" "SRR5572668" "SRR5572669" "SRR5572676" "SRR5572677" "SRR5572652" "SRR5572653" "SRR5572664" "SRR5572665")
sample=("H3K27ac_shCtrl_Rep1_R1" "H3K27ac_shCtrl_Rep2_R1" "H3K27ac_shUbc9_Rep1_R1" "H3K27ac_shUbc9_Rep2_R1" "Klf4_shCtrl_Rep1_R1" "Klf4_shCtrl_Rep2_R1" "Klf4_shUbc9_Rep1_R1" "Klf4_shUbc9_Rep2_R1" "INPUT_shCtrl_Rep1_R1" "INPUT_shCtrl_Rep2_R1" "INPUT_shUbc9_Rep1_R1" "INPUT_shUbc9_Rep2_R1")

mkdir data
cd data
for i in  ${!SRR[*]} ; do
    echo ${SRR[$i]}, ${sample[$i]}
    prefetch ${SRR[$i]} -o ${sample[$i]}.sra
    fastq-dump ${sample[$i]}.sra
done

rm *.sra 
for file in *.fastq ; do 
    pigz $file ; 
done
```


Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
342
# Q&A
343

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
344
345
**What's a IP?**

346
- IP stand for ImmunoPrecipitation. The IP samples contains the DNA that bind the protein of interest (i.e. H3K4me1, CTCF, TFIIA). These samples has been cross-linked, sonicated and immuno-precipitated.
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
347

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
348
**What's an INPUT?**
349
350
- INPUT stand for control DNA. The control DNA has been cross-linked and sonicated but not immuno-precipitated.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
351
**Replicate format in FASTQ file name?**
352
353
354
 - Rep1, rep1 and REP1 accepted, all samples must have the same format:
 - Rep1 and Rep2 works but Rep1 and REP2 doesn't

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
355
**What is the `IP_NAME` format columns? (i.e. What if I want to specify something more (species, incubation time...))**
Maëlle Daunesse's avatar
Maëlle Daunesse committed
356
 - Inside `IP_NAME` you can use "-" but do not "\_" because this is used to separate `MARK`, `COND`, `REP` and `MATE` from FASTQ filenames. For example: `TF4_Ctl-HeLa_rep1_R1.fastq.gz`
357

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
358
**Can I use relative path in config ?**
359
360
- yes, but you need to consider ChIPflow directory as origin.
 
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
361
**What if I have 3 INPUT replicates?**
362
363
 - You can use ChIPflow with more than 2 replicates, replicates will be merged in a Maxi Pool.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
364
**What if I have 3 IP replicate?**
365
366
 - The IDR for 3 IP replicates is not yet implemented.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
367
**What INPUT is used when I have 2 INPUTs?**
368
369
 - INPUT1 only (variabilité technique, on prend la même base de bruit)

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
370
**What if I have no INPUT ?**
371
372
- The pipeline failed to run if no INPUT are done. Following ENCODE guidelines, control experiment must be performed and are used during peak calling.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
373
**What if I have one INPUT for one mark, and two for another ?**
374
375
- We advice to run 2 different analysis. Experimental design with different numbers of replicates for IP or INPUT is not included for now.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
376
**What if I have one replicate for one mark ?**
377
378
- The pipeline will stop after peak calling, without selection of peaks as reproducibility cannot be ensure with only one replicate.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
379
380
**What are the default ADAPTERS for cutadapt?**
- By default, TruSeq and NEBNext Kits for Illumina are done in the config/adapt.fa.
381

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
382
**Can I give my own adapters list for cutadapt?**
383
 - In-house adapters could be added to the config/adapt.fa or in an independent file. Don't forget to change the value of `adapter_list` in cutadapt rule in `config.yaml` file.
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
384

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
385
**I want to test other options for Bowtie2, how can I re-run the mapping step ?**
386
- As any Snakemake pipeline, you can force the re-calculation of one or more rule. After changing the options of `bowtie2_mapping` chunk, you can restart the pipeline adding this snakemake option : `--forcerun bowtie2_mapping`. All rules depending of `bowtie2_mapping` will be re-calculated.
Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
387

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
388
**Can I force the re-calculation of all the steps ?**
389
390
- Yes, you can add this snakemake option `--forceall` to force the execution of the first rule.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
391
**Can I rename ChIPflow directory ?**
392
393
- yes, but must to be the same as in config.yaml (`analysis_dir`) or use relative path

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
394
**The pipeline fails because the IDR doesn't select enough reads?**
395
396
- If none or very few peaks pass the IDR, this means that there is to much variability between your replicates aka they are probably not good replicates. If you want to proceed anyway with the analysis, you can use the intersection approach (less stringent) instead of the IDR by setting `intersectionApproach` to 'yes'.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
397
**When should I use PhantompeakCalling's fragment size estimation instead of MACS2's one?**
398
399
- If MACS2 cannot compute the fragment size estimation (or if you want), set `no_model` to yes, and the fragment length use for MACS2 will be the one computed by PhantompeakQualTools for each sample.

Rachel  LEGENDRE's avatar
Rachel LEGENDRE committed
400
**What if I don't know if my chromatim factor in narrow or broad?**
401
402
403
404
405
406
- The output directories names of peak Calling, peak reproducibility and differential analysis steps includes the peak calling mode name, the peak reproducibility method name and the normalisation and variance estimation method name. This allows ChIPflow to test multiple combinations of peak calling, peak reproducibility and differential analysis parameters without erasing any output.
- For example, if you have run the pipeline in narrow mode, and you want broad mode, you just need to modify the corresponding parameter in the configuration YAML file. The pipeline will then restart at the peak calling step and all the output will be stored in "06-PeakCalling/{}" directories.
- In case of unknown chromatin factor, we advice to run ChIPflow in narrow mode with IDR and IA, and afterward in broad mode. Results from narrow peak calling will be stored in "06-PeakCalling/Narrow" directory, and in "06-PeakCalling/Broad" for broad peak calling. 


---
Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
407
# Authors
408
409
410
411
412
413
414
415

* Rachel Legendre (@rlegendr)
* Maëlle Daunesse
* Adrien Pain
* Hugo Varet
* Claudia Chica


Rachel  LEGENDRE's avatar
add ToC    
Rachel LEGENDRE committed
416
# How to cite ChIPflow ?
417
418

https://doi.org/10.1101/2021.02.02.429342