From 04c52c2ed528ded6e01b781626d6ed9ee58a76a1 Mon Sep 17 00:00:00 2001 From: Alexis CRISCUOLO <alexis.criscuolo@pasteur.fr> Date: Sun, 5 Sep 2021 13:21:19 +0200 Subject: [PATCH] Update README.md --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 8984646..b39bc39 100644 --- a/README.md +++ b/README.md @@ -122,8 +122,7 @@ OPTIONS: * By default, _ROCK_ uses _k_-mers of length _k_ = 25 (option `-k`). Increasing this length is not recommanded when dealing with large FASTQ files (e.g. average coverage depth > 500x from genome size > 1 Gbps), as the total number of canonical _k_-mers can quickly grow, therefore implying a very large CMS (i.e. many hashing functions) to maintains low FPP (e.g. ≤ 0.05). Using small _k_-mers (e.g. _k_ < 21) is also not recommanded, as this can negatively affect the overall specificity (i.e. too many identical _k_-mers arising from different sequenced genome region). -* All _ROCK_ steps are based on the usage of valid _k_-mers, i.e. _k_-mers that do not contain any undetermined base `N`. Valid _k_-mers can also be determined by bases associated to a Phred score greater than a specified threshold (option `-q`; Phred +33 offset, default: 0). A minimum number of valid _k_-mers can be specified to consider a SE/PE HTS read(s) (option `-m`; default: 1). All SE/PE HTS read(s) that do not contain enough valid _k_-mers are written into FASTQ file(s) with extension _.undetermined.fastq_. - +* All _ROCK_ steps are based on the usage of valid _k_-mers, i.e. _k_-mers that do not contain any undetermined base `N`. Valid _k_-mers can also be determined by bases associated to a Phred score greater than a specified threshold (option `-q`; Phred +33 offset, default: 0). A minimum number of valid _k_-mers can be specified to consider a SE/PE HTS read(s) (option `-m`; default: 1). ## References -- GitLab