Commit 53909846 authored 7 years ago by Blaise Li

Pipeline to process iCLIP data.

Currently goes from demultiplexing to mapping, via trimming and
deduplicating. The mapping is performed on 3 read type:
- adapt_nodedup (the adaptor was found, and the reads were trimmed but
  not deduplicated)
- adapt_deduped (the adaptor was found, and the reads were trimmed and
  deduplicated)
- noadapt_deduped (the adaptor was not found, and the reads were trimmed
  and deduplicated)

The trim_and_dedup script currenly assumes that two low-diversity zones
are present, and ignores them for deduplication:

NNNNNGCACTANNNWWW[YYYY]NNNN
1---5 : 5' UMI
     6--11: barcode (lower diversity)
          12-14: UMI
            15-17: AT(or GC?)-rich (low diversity)
                [fragment]
                       -4 -> -1: 3' UMI

It may be a problem to deduplicate taking into account the end of the
reads, which tends to be of lower quality. The reads with errors will be
over-represented. That is why we decided to also look at the
non-deduplicated reads.

parent eeec5bf1

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 542 additions and 0 deletions

Please register or to comment