- Feb 24, 2020
-
-
Blaise Li authored
-
- Nov 18, 2019
-
-
Blaise Li authored
-
- Jun 08, 2018
-
-
Blaise Li authored
When low quality zones are kept for deduplication, they later need to be trimmed.
-
- May 18, 2018
-
-
Blaise Li authored
Hopefully iCLIP data now has better quality and deduplication will be more efficient taking into account those zones.
-
- Feb 02, 2018
-
-
Blaise Li authored
Currently goes from demultiplexing to mapping, via trimming and deduplicating. The mapping is performed on 3 read type: - adapt_nodedup (the adaptor was found, and the reads were trimmed but not deduplicated) - adapt_deduped (the adaptor was found, and the reads were trimmed and deduplicated) - noadapt_deduped (the adaptor was not found, and the reads were trimmed and deduplicated) The trim_and_dedup script currenly assumes that two low-diversity zones are present, and ignores them for deduplication: NNNNNGCACTANNNWWW[YYYY]NNNN 1---5 : 5' UMI 6--11: barcode (lower diversity) 12-14: UMI 15-17: AT(or GC?)-rich (low diversity) [fragment] -4 -> -1: 3' UMI It may be a problem to deduplicate taking into account the end of the reads, which tends to be of lower quality. The reads with errors will be over-represented. That is why we decided to also look at the non-deduplicated reads.
-
- Jan 18, 2018
-
-
Blaise Li authored
-
- Aug 03, 2017
-
-
Blaise Li authored
The code in the snakefile produced weird results.
-
- Aug 02, 2017
- Aug 01, 2017
-
-
Blaise Li authored
-
- Apr 12, 2017
-
-
Blaise Li authored
There are two deduplication flows: for the reads with or without the adapter.
-