+# Hands-On: Annotation basics 
+## Insstallations
+Download data from server :
+`wget https://dl.pasteur.fr/fop/HJfzm2Py/ChIP_data.tar`
+Untar data:
+`tar xvf ChIP_data.tar`
+Download reference genomes files from server:
+`wget https://dl.pasteur.fr/fop/lroDilwn/ReferenceGenome.tar`
+Untar data:
+`tar xvf ReferenceGenome.tar`
+## get ePeak on your home
+* Load modules (ON CLUSTER ONLY)
+module load snakemake/6.5.0
+module load python/3.7
+module load singularity
+module load git-lfs/2.13.1
+module load pysam
+* Clone workflow:
+`git clone https://gitlab.pasteur.fr/hub/ePeak.git`
+* Download singularity container:
+cd ePeak
+singularity pull --arch amd64 --name epeak.img  library://rlegendre/epeak/epeak:1.0
+## configure ePeak
+Open config/config.yaml and config/design.txt files
+* **Design file:** tabulated file of 4 columns.
+**Column 1** is the name of the IP file
+**Column 2** is the name of the corresponding INPUT file
+**Column 3** is the replicate number of IP file
+**Column 4** is the replicate number of the corresponding INPUT file
+H3K27ac_shCtrl	INPUT_shCtrl	1	1
+H3K27ac_shCtrl	INPUT_shCtrl	2	1
+H3K27ac_shUbc9	INPUT_shUbc9	1	1
+H3K27ac_shUbc9	INPUT_shUbc9	2	1
+Klf4_shCtrl	INPUT_shCtrl	1	1
+Klf4_shCtrl	INPUT_shCtrl	2	2
+Klf4_shUbc9	INPUT_shUbc9	1	1
+Klf4_shUbc9	INPUT_shUbc9	2	2
+* **Config file:** yaml file containing all tools parameters
+This file is divided into _chunks_. Each chunk correspond to one step or one tool.
+This first chunk provides input information and assigns working directories. 
+`input_dir` path to FASTQ files directory. 
+`input_mate` mate pair format (i.e. `_R[12]` for *MATE* = R1 or R2) , must match the *MATE* parameter in FASTQ files.
+`input_extension` filename extension format (i.e. `fastq.gz` or `fq.gz`).
+`analysis_dir` path to analysis directory.
+`tmpdir` path to temporary directory (i.e. `/tmp/` or other)
+input_dir: ../ChIP_data
+input_mate: '_R[12]'
+input_extension: '.fastq.gz'
+analysis_dir: $HOME #define for each user
+tmpdir: $TMPDIR
+The design chunk aims to check that the FASTQ files name match the design file information. The `marks`, `conditions` and `replicates` parameters must respectively match the *MARK*, *COND* and *REP* parameters of the FASTQ files name and the design file. 
+For spike-in data, set `spike` on "True" and provide the spike-in genome FASTA file path through the `spike_genome_file` parameter.
+    design_file: config/design.txt    
+    marks: H3K27ac, Klf4
+    condition: shCtrl, shUbc9
+    replicates: Rep
+    spike: false
+    spike_genome_file: genome/dmel9.fa
+This genome chunk provides information about reference genome - directory, name of the index and path to fasta file.
+    index: yes
+    genome_directory: genome/
+    name: mm10
+    fasta_file: genome/mm10_chr1.fa
+The fastqc chunk provides quality control checking of fastq files.
+    options: ''
+    threads: 4   
+The adapters chunk is relative to quality trimming and adapter removal with cutadapt. A list of common adapters is provided under config directory and give to cutadapt (adapter_list). Then, different parameters are tuned to match precisely with the data.
+    remove: yes
+    adapter_list: file:config/adapt.fa
+    m: 25
+    mode: a
+    options: -O 6 --trim-n --max-n 1 
+    quality: 30
+    threads: 4
+The bowtie2_mapping chunk is relative to the reads mapping against genome file (provided by the genome chunk)
+    options: "--very-sensitive --no-unal"
+    threads: 4
+The mark duplicates chunk allows to mark PCR duplicate in BAM files. For ChIPseq data, IP and INPUT need to be deduplicated, so the dedup_IP parameter is set to True.
+    do: yes
+    dedup_IP: 'True' 
+    threads: 4
+The remove_biasedRegions chunk is relative to remove biased genomic regions (previously named blacklisted regions)
+    do: yes
+    bed_file: genome/mm10.blacklist.bed
+    threads: 1
+To produce metaregion profiles, coverages from each samples need to be producted.
+See https://deeptools.readthedocs.io/en/latest/content/feature/effectiveGenomeSize.html
+    do: yes
+    options: "--binSize 10 --effectiveGenomeSize 2652783500 --normalizeUsing RPGC" 
+    spike-in: no
+    threads: 4
+Set yes to geneBody chunk to produce metaregion profiles. This step need a gene model file in bed format.
+    do: yes
+    regionsFileName: genome/mm10_chr1_RefSeq.bed
+    threads: 4
+Set all following chunks 'do' to 'no' for now.
+## run ePeak
+Test your configuration by performing a dry-run via:
+`snakemake --use-singularity -n --cores 1`
+Execute the workflow locally using $N cores via:
+snakemake --use-singularity --singularity-args "-B '/home/'" --cores $N
+Run it specifically on Slurm cluster:
+`sbatch snakemake --use-singularity --singularity-args "-B '$HOME'" --cluster-config config/cluster_config.json --cluster "sbatch --mem={cluster.ram} --cpus-per-task={threads} " -j 200 --nolock --cores $SLURM_JOB_CPUS_PER_NODE`
+## analyse QC reports
+### Look at MultiQC report
+- General statistics
+<img src="images/Multiqc_mainStats.png" width="1000" align="center" >
+- Mapping with bowtie2
+<img src="images/bowtie2_se_plot.png" width="700" align="center" > 
+- Deduplication with MarkDuplicates
+<img src="images/picard_deduplication.png" width="700" align="center" >
+- Fingerplot
+<img src="images/deeptools_fingerprint_plot.png" width="700" align="center" >
+### Look at 05-QC directory
+- Cross correlation
+ <img src="images/H3K27ac_shCtrl_ppqt.png" width="700" align="center" >  <img src="images/Klf4_shCtrl_ppqt.png" width="700" align="center" >
+- GeneBody plot/heatmap
+<img src="images/geneBodyplot.png" width="700" align="center" >
+Would you proceed to the analysis ? justify why