Human reads incorrectly classified as Bacteria using Silva

I found a problem with my analysis of skin microbiota. My reads are sequenced by Illumina MiSeq (2x300). I performed the following steps:

  • DADA2
  • qiime feature-classifier classify-sklearn using SILVA 132 database

I obtained the attached taxonomy.qza file. After manual inspection, a lot of reads were classified as 'Bacteria' with no further assignation below this taxa rank. I performed BLAST over some of those reads, and matched with Homo sapiens DNA. I believed that after SILVA classification, all assigned to Bacteria were indeed bacteria. Maybe I missed a filtering step of human reads? I followed the tutorials of the qiime2 website, but I didnt see that.

Thanks a lot
taxonomy-silva.qza (170.4 KB)

That makes sense! If the database consists of only bacteria, then there is no “outgroup” for the classifier to classify these human sequences to, so if your sequence has even a slight similarity to bacterial sequences you can see that classification (but no phylum level classification). In general, I find that any sequence that does not classify to at least phylum level is garbage — be it host DNA or just artifact of some sort — so I usually just throw these out after some basic inspection as you have done.

This is the filtering step, in a sense. There are a few ways to do this and it depends on your pipeline.

  1. deblur uses a “positive filter” to make sure sequences have at least some resemblence to a set of reference sequences.
  2. q2-fragment-insertion allows you to filter out sequences from your table that do not splice into your reference tree, e.g., human DNA
  3. q2-quality-control allows you to perform a “positive filter” of the sort deblur uses.

Or just do what you did — that’s how I often do it.

You can also add an outgroup… e.g., host DNA sequences… to your reference database so that these classifications are less ambiguous in the future.


Yeah, you have to get rid of those sequences.

I think that’s the major difference between QIIME1 and QIIME2, the clustering methods end up filtering out a lot of human hits early on in QIIME1, but in QIIME2, there’s no such method unless you utilize one of the filtering steps.

1 Like

well that depends; closed-reference OTU clustering will drop these human reads, but the other OTU clustering methods used in QIIME 1 will not. Both QIIME 1 and 2 have closed-reference OTU clustering methods, for what it’s worth. It just helps to be aware of what your pipeline is doing under the hood, to understand what extra steps may be necessary (e.g., to remove non-target DNA as in this case).


OK. Since now, I will filter out OTUs assigned to Bacteria;__. I thought that SILVA classified human reads as UNASSIGNED.
Thanks a lot for all contributions!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.