Skip to content
Snippets Groups Projects
  • Blaise Li's avatar
    53909846
    Pipeline to process iCLIP data. · 53909846
    Blaise Li authored
    Currently goes from demultiplexing to mapping, via trimming and
    deduplicating. The mapping is performed on 3 read type:
    - adapt_nodedup (the adaptor was found, and the reads were trimmed but
      not deduplicated)
    - adapt_deduped (the adaptor was found, and the reads were trimmed and
      deduplicated)
    - noadapt_deduped (the adaptor was not found, and the reads were trimmed
      and deduplicated)
    
    The trim_and_dedup script currenly assumes that two low-diversity zones
    are present, and ignores them for deduplication:
    
    NNNNNGCACTANNNWWW[YYYY]NNNN
    1---5 : 5' UMI
         6--11: barcode (lower diversity)
              12-14: UMI
                15-17: AT(or GC?)-rich (low diversity)
                    [fragment]
                           -4 -> -1: 3' UMI
    
    It may be a problem to deduplicate taking into account the end of the
    reads, which tends to be of lower quality. The reads with errors will be
    over-represented. That is why we decided to also look at the
    non-deduplicated reads.
    53909846
    History
    Pipeline to process iCLIP data.
    Blaise Li authored
    Currently goes from demultiplexing to mapping, via trimming and
    deduplicating. The mapping is performed on 3 read type:
    - adapt_nodedup (the adaptor was found, and the reads were trimmed but
      not deduplicated)
    - adapt_deduped (the adaptor was found, and the reads were trimmed and
      deduplicated)
    - noadapt_deduped (the adaptor was not found, and the reads were trimmed
      and deduplicated)
    
    The trim_and_dedup script currenly assumes that two low-diversity zones
    are present, and ignores them for deduplication:
    
    NNNNNGCACTANNNWWW[YYYY]NNNN
    1---5 : 5' UMI
         6--11: barcode (lower diversity)
              12-14: UMI
                15-17: AT(or GC?)-rich (low diversity)
                    [fragment]
                           -4 -> -1: 3' UMI
    
    It may be a problem to deduplicate taking into account the end of the
    reads, which tends to be of lower quality. The reads with errors will be
    over-represented. That is why we decided to also look at the
    non-deduplicated reads.