-
Blaise Li authored
Currently goes from demultiplexing to mapping, via trimming and deduplicating. The mapping is performed on 3 read type: - adapt_nodedup (the adaptor was found, and the reads were trimmed but not deduplicated) - adapt_deduped (the adaptor was found, and the reads were trimmed and deduplicated) - noadapt_deduped (the adaptor was not found, and the reads were trimmed and deduplicated) The trim_and_dedup script currenly assumes that two low-diversity zones are present, and ignores them for deduplication: NNNNNGCACTANNNWWW[YYYY]NNNN 1---5 : 5' UMI 6--11: barcode (lower diversity) 12-14: UMI 15-17: AT(or GC?)-rich (low diversity) [fragment] -4 -> -1: 3' UMI It may be a problem to deduplicate taking into account the end of the reads, which tends to be of lower quality. The reads with errors will be over-represented. That is why we decided to also look at the non-deduplicated reads.
Blaise Li authoredCurrently goes from demultiplexing to mapping, via trimming and deduplicating. The mapping is performed on 3 read type: - adapt_nodedup (the adaptor was found, and the reads were trimmed but not deduplicated) - adapt_deduped (the adaptor was found, and the reads were trimmed and deduplicated) - noadapt_deduped (the adaptor was not found, and the reads were trimmed and deduplicated) The trim_and_dedup script currenly assumes that two low-diversity zones are present, and ignores them for deduplication: NNNNNGCACTANNNWWW[YYYY]NNNN 1---5 : 5' UMI 6--11: barcode (lower diversity) 12-14: UMI 15-17: AT(or GC?)-rich (low diversity) [fragment] -4 -> -1: 3' UMI It may be a problem to deduplicate taking into account the end of the reads, which tends to be of lower quality. The reads with errors will be over-represented. That is why we decided to also look at the non-deduplicated reads.