Skip to content
Snippets Groups Projects
Select Git revision
  • master
  • bbrancot-master-patch-78606
  • patch-1
3 results

django-diu

  • Clone with SSH
  • Clone with HTTPS
  • Forked from Hervé MENAGER / django-diu
    Source project has a limited visibility.

    Beautiful Pattern Matching (BPM)

    BPM is a software for exact pattern matching of DNA. It can be seen as a DNA multi "grep".

    First, the tool preprocess a bunch of DNA patterns to create an index. Then, the index is used to search for all the patterns in one or more genomes. The tool is very fast and can process a classical bacterial genome, looking for hundreds of thousands of patterns in less than a second, using a single thread on a desktop computer.

    Compile and run

    Compile BPM

    	cmake .
    	make

    All the binaries generated are present in the bin folder.

    Pre-processing and index alleles

    Index specific kmers in a binary file
    The process is in two steps:

    • Execute the KMC tools to create an index of all the possible kmers of size 32 and their frequences:
    	kmc -k32 -ci0 -t<threads> -fm alleles.fasta alleles.dump /tmp/
    	kmc_dump alleles.dump alleles.kmc

    To install kmc please follow the instructions on their github: https://github.com/refresh-bio/KMC/

    • Execute our indexing tool:
    	./bin/bpm index -kmc alleles.kmc -alleles alleles.fasta -bin alleles.bin

    The indexing process will produce a binary file containing rarest kmers for each alleles using the kmer counts and alleles fasta sequences. The binary file also contains a bloom filter corresponding to the selected kmers.

    Matching alleles on sequences:

    	./bin/bpm match -bin <alleles.bin> -sequences <sequences.fasta>

    Arguments:

    • -b or -bin: A binary file generated by the indexing part. This file contains relevant kmers, a bloom filter and the allele sequences.
    • -s of -sequences: A fasta file containing the sequence(s) where you want to find the alleles.

    Tests:

    	cmake .
    	./bin/tests