Available within BioBloom Tools, we illustrate the utility of miBF in two use cases: read-binning for targeted assembly, and taxonomic read assignment. We formalize how to minimize the false-positive rate of miBFs when classifying sequences from multiple targets or references. To address these challenges, we have designed a probabilistic data structure called a multiindex Bloom Filter (miBF), which can store multiple spaced seed sequences with a low memory cost that remains static regardless of seed length or seed design. These limitations have also caused the design and length of practical spaced seeds to be constrained, since storing spaced seeds can be costly. However, spaced seeds have seen little practical use in classification because they bring increased computational and memory costs compared to methods that use k-mers. In response, some tools have been augmented with spaced seeds, which are capable of tolerating mismatches. Originally k-mer based, such tools often lack sensitivity when faced with sequencing errors and polymorphisms. Alignment-free classification tools have enabled high-throughput processing of sequencing data in many bioinformatics analysis pipelines primarily due to their computational efficiency.
0 Comments
Leave a Reply. |