Skip to content
Snippets Groups Projects
Commit 53909846 authored by Blaise Li's avatar Blaise Li
Browse files

Pipeline to process iCLIP data.

Currently goes from demultiplexing to mapping, via trimming and
deduplicating. The mapping is performed on 3 read type:
- adapt_nodedup (the adaptor was found, and the reads were trimmed but
  not deduplicated)
- adapt_deduped (the adaptor was found, and the reads were trimmed and
  deduplicated)
- noadapt_deduped (the adaptor was not found, and the reads were trimmed
  and deduplicated)

The trim_and_dedup script currenly assumes that two low-diversity zones
are present, and ignores them for deduplication:

NNNNNGCACTANNNWWW[YYYY]NNNN
1---5 : 5' UMI
     6--11: barcode (lower diversity)
          12-14: UMI
            15-17: AT(or GC?)-rich (low diversity)
                [fragment]
                       -4 -> -1: 3' UMI

It may be a problem to deduplicate taking into account the end of the
reads, which tends to be of lower quality. The reads with errors will be
over-represented. That is why we decided to also look at the
non-deduplicated reads.
parent eeec5bf1
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment