Common Microbes Database Last updated July 24, 2017, by Brian Bushnell This directory contains different versions of common microbes used by RQCFilter or removemicrobes.sh. They are indexed under 4 BBMap index directories, denoted by build 1-4; the builds are described in builds.txt. To access them, see removemicrobes.sh. The following files are the concatenated references of the selected organisms listed in FilterMicrobes.txt: fused2.fa.gz fusedEPmasked2.fa.gz fusedERPBmasked2.fa.gz fusedERPBBmasked2.fa.gz There are also versions without the "2" which are the same but lack Lambda, which is also a common contaminant, and while it is not exactly bacterial, it shares much of its genome with E.coli. The process used to generate these files, and related temporary files, are in commonMicorbeCreation. Essentially, once a core set of commonly-observed contaminant microbes were identified, many other microbes with complete genomes were shredded and mapped to them. Other microbes for which a large part of the genome was indistinguishable from a member of the contaminant set were then either added to the set (if they seemed like probably common contaminants themselves), or excluded from further filtering. The remaining non-contaminant microbes having little overlap with contaminant microbes were iteratively mapped to the contaminant microbes; the mapping locations were masked; and the process was repeated, until the noncontaminant shreds no longer mapped to anything in the contaminant contigs. The resulting masked references are considered at relatively low risk of spurious read-length (150bp+) hits from organisms unrelated to members of the contaminant set.