README.md 23.2 KB
Newer Older
Alexis  CRISCUOLO's avatar
Alexis CRISCUOLO committed
1 2
# SAM2MSA

Alexis  CRISCUOLO's avatar
0.01  
Alexis CRISCUOLO committed
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383
The package _SAM2MSA_ is a set of command line programs written in [Java](https://docs.oracle.com/javase/8/docs/technotes/guides/language/index.html) to build multiple sequence alignments (MSA) from read alignments in [SAM format](https://samtools.github.io/hts-specs/SAMv1.pdf).
_SAM2MSA_ tools are specifically designed to ease the building of MSA that are well-suited for phylogenetic inference.

_SAM2MSA_ is composed of three tools:
* _SAM2MAP_ to build a MAP file (i.e. a tab-delimited read alignment summary) from SAM-formatted read alignments,
* _MAP2FASTA_ to convert a MAP file into a FASTA-formatted consensus sequence file,
* _FASTA2MSA_ to pool different FASTA-formatted consensus sequence files into a unique MSA.

No need to sort or convert read alignments: _SAM2MSA_ only needs standard SAM-formatted data to quickly infer consensus sequences and MSA that can be directly used for phylogenetic analyses.



## Compilation and execution

The source code of _SAM2MSA_ is inside the _src_ directory and can be compiled and executed in two different ways. 

#### Building executable jar files

On computers with [Oracle JDK](http://www.oracle.com/technetwork/java/javase/downloads/index.html) (6 or higher) installed, Java executable jar files can be created. 
In a command-line window, go to the _src_ directory and type:
```bash
javac SAM2MAP.java 
echo Main-Class: SAM2MAP > MANIFEST.MF 
jar -cmvf MANIFEST.MF SAM2MAP.jar SAM2MAP.class 
javac MAP2FASTA.java 
echo Main-Class: MAP2FASTA > MANIFEST.MF 
jar -cmvf MANIFEST.MF MAP2FASTA.jar MAP2FASTA.class 
javac FASTA2MSA.java 
echo Main-Class: FASTA2MSA > MANIFEST.MF 
jar -cmvf MANIFEST.MF FASTA2MSA.jar FASTA2MSA.class 
rm MANIFEST.MF SAM2MAP.class MAP2FASTA.class FASTA2MSA.class 
```
This will create the three executable jar files `SAM2MAP.jar`, `MAP2FASTA.jar` and `FASTA2MSA.jar` that can be run with the following command line models:
```bash
java -jar SAM2MAP.jar   [options]
java -jar MAP2FASTA.jar [options]
java -jar FASTA2MSA.jar [options]
```

#### Building a native executable

On computers with [GraalVM](hhttps://www.graalvm.org/downloads/) installed, native executables can also be built. 
In a command-line window, go to the _src_ directory, and type:
```bash
javac SAM2MAP.java 
native-image SAM2MAP SAM2MAP
javac MAP2FASTA.java 
native-image MAP2FASTA MAP2FASTA
javac FASTA2MSA.java 
native-image FASTA2MSA FASTA2MSA
rm SAM2MAP.class MAP2FASTA.class FASTA2MSA.class 
```
This will create the three native executable `SAM2MAP`, `MAP2FASTA` and `FASTA2MSA` that can be launched with the following command line models:
```bash
./SAM2MAP   [options]
./MAP2FASTA [options]
./FASTA2MSA [options]
```

## Usage

### _SAM2MAP_

Run _SAM2MAP_ without option to read the following documentation:

```
 SAM2MAP

 SAM2MAP infers  a consensus  sequence from  read alignments  against a reference
 sequence in SAM format. The inferred consensus sequence has always the same size
 as the  reference one.  At each  position,  the inferred character  state is the
 the majority-rule one  within the aligned reads (option -f).  For each position,
 the inferred character state is associated to a map code:
   U  under-covered position (options -p or -c)
   u  position neighboring map code 'U'
   O  over-covered position (options -p or -C)
   o  position neighboring map code 'O'
   S  strand-biased position (option -s)
   X  position within SNP-rich or SNP-poor regions (options -x and -w)
   M  unbiased position
 Inferred  character  states  associated  to the  map code  'U' are  always  '?'.
 Inferred character states that differ  from the reference ones and associated to
 the map codes  'u', 'O', 'o' or 'S' can  be replaced by  'X' or not (option -m).
 Inferred character states  associated to the  map code 'X'  are replaced  by 'x'
 (options -x and -w).
 The main output file is  a map file that summarizes  the read alignments against
 the reference sequence. The inferred sequence is also written in FASTA format.

 USAGE: SAM2MAP [-i SAMFILE] [-r REFFILE] [-o BASENAME] ...

 GENERAL OPTIONS:

   -i FILE      read alignment in SAM format; set "-" to read from standard input
                (mandatory)
   -r FILE      reference sequence(s)  in FASTA format;  should contain  at least
                one sequence used for the read alignment (mandatory)
   -o BASENAME  basename for output files (mandatory)
   -n STRING    name of the inferred sequence;  when set,  a unique sequence will
                be written in a FASTA file with the specified name in the header
   -v           verbose mode

 READ ALIGNMENT:

   -Q INTEGER   minimum  allowed  Phred score;  sequenced bases  associated  to a
                Phred  score  smaller  than  the   specified  threshold  are  not
                considered (default: 20)
   -q INTEGER   minimum allowed mapping Phred score;  aligned reads associated to
                a Phred  score  smaller  than the  specified  threshold  are  not
                considered (default: 20)

 READ COVERAGE:

   -p NUMBER    p-value  to  determine the  coverage  depth  confidence interval;
                after  estimating  a  theoretical   coverage  depth  distribution
                (Poisson  + NB)  from the  read alignments,  the lower  and upper
                coverage bounds  are determined  by the  CDF NB values p and 1-p,
                respectively (default: 0.005)
   -c INTEGER   coverage depth  lower bound;  if the  number of  aligned reads is
                smaller  than  this  threshold,  the  corresponding  position  is
                considered as  under-covered and  associated  to the map code 'U'
                (default: estimated from the data via option -p, but at leat 10)
   -C INTEGER   coverage depth  upper bound;  if the  number of  aligned reads is
                larger  than  this  threshold,   the  corresponding  position  is
                considered as  over-covered  and associated  to the  map code 'O'
                (default: estimated from the data via option -p)
   -s INTEGER   minimum number of  reads for each strand  to trust a position; if
                if a position  does not verify  this condition,  it is considered
                as strand-biased and associated to the map code 'S' (default: 5)

 CHARACTER STATE INFERENCE:

   -f NUMBER    minimum proportion  to infer  the majority-rule  character state;
                at each position,  the set of  most occuring aligned  nucleotides
                is built  up to  this threshold  and the  corresponding character
                state is inferred from the one(s) in this set (default: 0.8)
   -m STRING    allowing mismatch in biased positions; when an inferred character
                state differs  from the  reference one  and is  associated to map
                codes  'u',  'O',  'o' or  'S', it is replaced by 'X' by default;
                this can  be relaxed  by specifying  the  map  code(s) for  which
                mismatches are allowed (default: "M")

 SNP DENSITY:

   -x NUMBER    threshold to determine  whether a position  does not belong  to a
                clonal SNP  density region;  after estimating  a theoretical  SNP
                density distribution  (NB) using  a sliding  window  (option -w),
                each position is associated  to an index that  is close to 0 when
                it belongs to  a region containing  significantly too few  or too
                many SNP;  every  position  with index  lower than  the specified
                threshold (e.g. 0.01) is associated to map codes 'X' and replaced
                by 'x' (default: 0)
   -w INTEGER   window size to assess clonal SNP density (default: 1000)

```

#### Notes on _SAM2MAP_

* As most read mapping programs (e.g. [BWA](http://bio-bwa.sourceforge.net/), [Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml), [minimap2](https://github.com/lh3/minimap2)) output the read alignments in SAM format, _SAM2MAP_ can be directly used together with any of them without conversion, compression or sorting step.

* By default, all sequenced bases with Phread score < 20 are not considered (option -q), therefore minimizing the impact of sequencing errors when computing the consensus sequence. By default, all read alignment with Phred score < 20 (as assessed by the read mapping program) are also not considered (option -Q), therefore discarding from the consensus sequence every region with low mappability (e.g. low complexity or repeated regions).

* For each position of the specified reference, _SAM2MAP_ summarizes the corresponding aligned bases in a tab-delimited MAP file. _SAM2MAP_ also estimates a Poisson+Negative Binomial (NB) theoretical distribution from the observed read coverage distribution, and writes the results into an output file (.cov.txt); the Poisson+NB distribution is used to determine the min/max coverage depths to assess reference regions where the consensus sequence can be accurately built.

* For most positions (e.g. associated with the map code 'M' in the assessed min/max read coverage regions), the inferred consensus character state is the majority one(s) observed in at least 80% of the aligned bases (option -f). Of note, when more than one character state is required to determine at least 80% of the aligned ones, the corresponding degenerate nucleotide is inferred in the consensus sequence. This can be useful to build consensus sequences corresponding to quasi-species.

* As the main aim of _SAM2MSA_ is to quickly build MSA from read alignments, it should be stressed that insertions are never considered, therefore guaranteeing that the consensus sequence will have the same length as the the reference, which is expected to quickly build an MSA by pooling all consensus sequences.

* Basically, running _SAM2MAP_ with default options is sufficient to build a MAP file and its associated FASTA-formatted consensus sequence in many cases. However, if the consensus sequence is found to be incorrect, alternative ones can be quickly built by using _MAP2FASTA_ directly on the MAP file.

* The reference file used by _SAM2MAP_ (option -r) is not expected to be the same as the one used to align reads, but it should contain at least one common sequence (identified by its FASTA header). This can be useful to infer a specific consensus sequence (e.g. a given chromosome) from the read alignments against a complete genome (e.g. multiple chromosomes/replicons).



### _MAP2FASTA_

Run _MAP2FASTA_ without option to read the following documentation:

```
 MAP2FASTA

 MAP2FASTA infers a consensus sequence from a MAP file.

 USAGE: MAP2FASTA [-i MAPFILE] ...

 GENERAL OPTIONS:

   -i FILE      tab-delimited  MAP file;  set "-"  to read from  standard input
                (mandatory)
   -n STRING    name of the inferred sequence;  when set,  a unique sequence will
                be written in a FASTA file with the specified name in the header
   -v           verbose mode

 READ COVERAGE:

   -p NUMBER    p-value  to  determine the  coverage  depth  confidence interval;
                after  estimating  a  theoretical   coverage  depth  distribution
                (Poisson  + NB)  from the  read alignments,  the lower  and upper
                coverage bounds  are determined  by the  CDF NB values p and 1-p,
                respectively (default: 0.005)
   -c INTEGER   coverage depth  lower bound;  if the  number of  aligned reads is
                smaller  than  this  threshold,  the  corresponding  position  is
                considered as  under-covered and  associated  to the map code 'U'
                (default: estimated from the data via option -p, but at leat 10)
   -C INTEGER   coverage depth  upper bound;  if the  number of  aligned reads is
                larger  than  this  threshold,   the  corresponding  position  is
                considered as  over-covered  and associated  to the  map code 'O'
                (default: estimated from the data via option -p)
   -s INTEGER   minimum number of  reads for each strand  to trust a position; if
                if a position  does not verify  this condition,  it is considered
                as strand-biased and associated to the map code 'S' (default: 5)

 CHARACTER STATE INFERENCE:

   -f NUMBER    minimum proportion  to infer  the majority-rule  character state;
                at each position,  the set of  most occuring aligned  nucleotides
                is built  up to  this threshold  and the  corresponding character
                state is inferred from the one(s) in this set (default: 0.8)
   -m STRING    allowing mismatch in biased positions; when an inferred character
                state differs  from the  reference one  and is  associated to map
                codes  'u',  'O',  'o' or  'S', it is replaced by 'X' by default;
                this can  be relaxed  by specifying  the  map  code(s) for  which
                mismatches are allowed (default: "M")

 SNP DENSITY:

   -x NUMBER    threshold to determine  whether a position  does not belong  to a
                clonal SNP  density region;  after estimating  a theoretical  SNP
                density distribution  (NB) using  a sliding  window  (option -w),
                each position is associated  to an index that  is close to 0 when
                it belongs to  a region containing  significantly too few  or too
                many SNP;  every  position  with index  lower than  the specified
                threshold (e.g. 0.01) is associated to map codes 'X' and replaced
                by 'x' (default: 0)
   -w INTEGER   window size to assess clonal SNP density (default: 1000)
```

#### Notes on _MAP2FASTA_

* The program _MAP2FASTA_ has broadly the same options as _SAM2MAP_. When the consensus sequence built using _SAM2MAP_ is found to be incorrect (e.g. not enough mismatches in comparison with the reference), alternative ones can be built using _MAP2FASTA_ directly on the MAP file generated by _SAM2MAP_. As a MAP file (e.g. one line per reference position) is smaller than a SAM file (e.g. one line per sequenced read), _MAP2FASTA_ runs faster than _SAM2MAP_ for the same aim.

* Of note, output files written by _MAP2FASTA_ are derived from the input file name.


### _FASTA2MSA_

Run _FASTA2MSA_ without option to read the following documentation:

```
 FASTA2MSA

 FASTA2MSA combines  the different  consensus sequences  estimated by  SAM2MAP or
 MAP2FASTA into a multiple sequence alignment (MSA) in FASTA format.

 USAGE: FASTA2MSA [-i INFILE] [-o BASENAME] ...

 GENERAL OPTIONS:

   -i FILE      text file containing one line per taxon, each line containing the
                taxon  name  followed  by  the  corresponding  FASTA  file  name,
                separated by blank space(s) or tabulation(s) (mandatory)
   -o BASENAME  basename for output files (mandatory)
   -v           verbose mode

 CHARACTER FILTERING:

   -u STRING    the character state(s) to be considered as unknown ones (default:
                "xXnN-?")
   -p NUMBER    the maximum  allowed frequency  of unknown  character states  per
                aligned character (default: 0.5)
   -f STRING    the  character  state(s)  to be  considered  as  forbidden;  each
                character containing at least one  forbidden character state will
                be discarded (default: "")
   -V           when set, only variable characters will be selected (default: not
   -s NUMBER    discarding all characters before the specified position (default:
                not set)
   -e NUMBER    discarding all characters  after the specified position (default:
                not set)

 CODING REGIONS:

   -g FILE      GFF3 file containing  annotations of the  reference sequence used
                for creating  the input  FASTA files  (same contig  order as  the
                FASTA reference sequence file)
   -1           only codon positions 1 will be selected  (requires option -g; can
                be combined with options -2 and -3)
   -2           only codon positions 2 will be selected  (requires option -g; can
                be combined with options -1 and -3)
   -3           only codon positions 3 will be selected  (requires option -g; can
                be combined with options -1 and -2)
   -G NUMBER    genetic code  identifier according  to the translation  tables at
                www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi (default: 1)

```

#### Notes on _FASTA2MSA_

* _FASTA2MAP_ pools the consensus sequences generated by _SAM2MAP_ and/or _MAP2FASTA_ into a multiple sequence alignment (MSA). It is therefore expected that the input FASTA files (option -i) contain sequences of identical lengths and with comparable FASTA headers.

* Regions containing too many unknown character states (e.g. '?', 'N', 'X', gaps) can be filtered out (option -p; default 50%). 

* Variable characters in the MSA are summarized in a tab-delimited file (.var.tsv). A GFF3-formatted annotation file associated to the reference can be used (option -g) to determine variable characters that are synonymous/non-synonymous within the MSA (different genetic codes available using option -G). To obtain the list of all variable characters, use option -p 1.

* A MSA corresponding to a specific region of the reference can be obtained using option(s) -s and/or -e.

* The FASTA-formatted MSA file generated by _FASTA2MSA_ can be directly used to perform a phylogenetic analysis of the sequenced samples. Of note, recombination-purged MSA can be obtained by filtering out from the different consensus sequences the regions that contain significantly too many or not enough mismatches when compared to the reference sequence. This can be obtained when using _SAM2MAP_ and/or _MAP2FASTA_ (options -x and -w).



## Example

The directory _example/_ contains different files associated to the study of [Seth-Smith et al. (2019)](https://wwwnc.cdc.gov/eid/article/25/6/17-2119_article) describing an outbreak of _Burkholderia stabilis_ identified among hospitalized patients across several cantons in Switzerland during 2015-2016.
The different _SAM2MSA_ programs can be used to build a MSA from 38 isolates sequenced in the context of the [Seth-Smith et al. (2019)](https://wwwnc.cdc.gov/eid/article/25/6/17-2119_article) study.

#### Sample FASTQ files

The following command lines allow downloading from the [ENA](https://www.ebi.ac.uk/ena) ftp repository the 38 pairs of FASTQ files (Illumina MiSeq) associated to the 38 sequenced isolate genomes:
```bash
for err in ERR180650{8..9} ERR18065{10..34} ERR1810578 ERR18105{89..98}
do
  echo -n "$err ..."; 
  wget -q ftp.sra.ebi.ac.uk/vol1/fastq/${err:0:6}/00${err:9:1}/$err/$err""_1.fastq.gz & 
  wget -q ftp.sra.ebi.ac.uk/vol1/fastq/${err:0:6}/00${err:9:1}/$err/$err""_2.fastq.gz ; 
  wait ; 
  echo " [ok]";
done
```
Of note, faster downloading times can be obtained by using [axel](https://github.com/axel-download-accelerator/axel) instead of [wget](https://www.gnu.org/software/wget/): on computers with [axel](https://github.com/axel-download-accelerator/axel) installed, replace the two occurrences of `wget` by e.g. `axel -a -n 10`.

#### Reference FASTA and GFF3 files

The following command lines allow downloading from the [NCBI](https://www.ncbi.nlm.nih.gov/) ftp repository the FASTA and GFF3 files associated to the reference genome of [_B. stabilis_ CH16](https://www.ncbi.nlm.nih.gov/genome/45559?genome_assembly_id=413612) (3 chromosomes and 1 plasmid; see the corresponding [GenBank repository](ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/240/005/GCA_900240005.1)):
```bash
BASEURL="ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/240/005/GCA_900240005.1_BStabCH16";
wget -q -O- $BASEURL/GCA_900240005.1_BStabCH16_genomic.fna.gz | zcat > CH16.fasta ;
wget -q -O- $BASEURL/GCA_900240005.1_BStabCH16_genomic.gff.gz | zcat > CH16.gff ;
```

#### _SAM2MAP_ on read alignments

The following command lines allow read mapping to be carried out against _CH16.fasta_ using [minimap2](https://github.com/lh3/minimap2) ([Li 2018](https://doi.org/10.1093/bioinformatics/bty19)) on 6 threads; the SAM-formatted read alignments are directly read via a pipe (|) by _SAM2MAP_ (default options) to obtain the MAP files and the FASTA-formatted consensus sequences:
```bash
for err in ERR180650{8..9} ERR18065{10..34} ERR1810578 ERR18105{89..98}
do
  minimap2 -ax sr -t 6 CH6.fasta $err""_1.fastq.gz $err""_2.fastq.gz 2>/dev/null | SAM2MAP -i - -r CH6.fasta -o $err -v ;
done
```
Faster running times can be obtained by using [minimap2](https://github.com/lh3/minimap2) on more threads (option -t).

The directory _example/_ contains the three output files written by _SAM2MAP_ for the sample ERR1806508:
* _ERR1806508.cov.txt_: the observed and theoretical mapping coverage distributions,
* _ERR1806508.map_: the MAP file summarizing the overall mapping for each reference position,
* _ERR1806508.fasta_: the consensus sequences in FASTA format.


#### _FASTA2MSA_ on consensus sequences

The following command line allows creating a MSA using _FASTA2MSA_ on the 38 generated consensus sequences (the input file _infile.txt_ is available in the directory _example/_):
```bash
FASTA2MSA -i infile.txt -o msa -g CH16.gff -v
```
The directory _example/_ contains the three output files written by _FASTA2MSA_:
* _msa.fasta_: a multiple sequence alignment of 8,371,282 aligned nucleotide characters in FASTA format,
* _msa.var.tsv_: a tab-delimited file summarizing the 489 variable characters inside _msa.fasta_.


#### Phylogenetic inference

A phylogenetic tree was inferred using [IQ-TREE 2](http://www.iqtree.org/) ([Minh et al. 2020](https://academic.oup.com/mbe/article/37/5/1530/5721363)) with evolutionary model HKY+F:
```bash
iqtree2 -s msa.fasta -T 12 -m HKY+F
```
Branch lengths of the ML tree (_msa.fasta.treefile_ in _example/_) were rescaled using the total number of aligned characters (i.e. 8,371,282) to estimate the number of SNPs on each branch with the following _awk_ one-liner:
```bash
awk 'BEGIN{s=8371282;p=0} {t=t$0} 
     END{while((c=substr(t,++x,1))!=";")
          if(c==":"){
           y=++x; while((c=substr(t,++y,1))!=","&&c!=")"){}
           t=substr(t,1,x-1)sprintf("%."p"f",s*substr(t,x,y-x))substr(t,y)} 
         print t}' msa.fasta.treefile > msa.nwk
```

Alexis  CRISCUOLO's avatar
0.01  
Alexis CRISCUOLO committed
384
The final phylogenetic tree (_msa.nwk_ in _example/_) is represented below (to be compared with Figure 2 in [Seth-Smith et al. 2019](https://wwwnc.cdc.gov/eid/article/25/6/17-2119_article)).
Alexis  CRISCUOLO's avatar
0.01  
Alexis CRISCUOLO committed
385

Alexis  CRISCUOLO's avatar
0.01  
Alexis CRISCUOLO committed
386 387
![CH16 SNP tree](example/msa.nwk.png "CH16 SNP tree")
<!--- <iframe width="800" height="900" src="https://phylogeny.io/share/0ad13a8140437565f9593fba1b04b36028dc882b"></iframe> --->
Alexis  CRISCUOLO's avatar
0.01  
Alexis CRISCUOLO committed
388 389 390 391 392 393 394 395 396 397 398
<!--- https://phylogeny.io/edit/0ad13a8140437565f9593fba1b04b36028dc882b/dcc6740bddd18a1f0f93c031ad38beaf5c17b4e6  --->

## References

Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. [doi:10.1093/bioinformatics/bty191](https://doi.org/10.1093/bioinformatics/bty19).

Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R (2020) IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution, 37(5):1530-1534. [doi:10.1093/molbev/msaa015](https://doi.org/10.1093/molbev/msaa015).

Seth-Smith HMB, Casanova C, Sommerstein R, Meinel DM, Abdelbary MMH, Blanc DS, Droz S, Führer U, Lienhard R, Lang C, Dubuis O, Schlegel M, Widmer A, Keller PM, Marschall J, Egli A (2019) Phenotypic and Genomic Analyses of Burkholderia stabilis Clinical Contamination, Switzerland. Emerging Infectious Diseases, 25(6):1084-1092. [doi:10.3201/eid2506.172119](https://doi.org/10.3201/eid2506.172119).