Dear Qiime2 community,
I have a question about classifier after clustering based on cluster-features-open-reference.
I use cluster-features-open-reference after dada2 for sequences clustering, with gg_13_8 for reference. Do I need classifier the whole data with gg_13_8 again? It is easy to understand to taxonomy classifier for the new sequences. But what about these alreadly identified in the reference?
I have tried below three classifiers:
- classify-consensus-blast,
- classify-consensus-vsearch,
- classify-sklearn. (use the Greengenes 13_8 99% OTUs full-length sequences downloaded from the QIIME2 data resources as the trianed classifier.)
Results from the first two are quite similar. But the taxonomy information is not the same as its identify number.
From our wet lab infor, we know that there are spike-in (two bacterias as below) in some specific samples:
- k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Flavobacteriaceae;g__Imtechella;s__halotolerans;
- k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;
There taxonomy IDs in greengens database are 4416544 and 361710, respectively.
And we can see them after cluster-features-open-reference. But after classify-consensus-vsearch or classify-consensus-blast, the related taxonomy IDs point to:
- k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Flavobacteriaceae;
and - k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__Bacillus, respectively. Although the same two families, but different obviously.
This may not that matter in research projects, but really count when in clinical tests. And we are the later one.
(BTW, result from the third one performs not so good. As there are so many OTUs have
been not identified by it. So I think we are not going to choose the third one.)
My questions here are two as below:
- Why ther taxonomy information different after cluster-features-open-reference and classify-consensus-vsearch? Which one should we belive?
- Regarding to this, do we have to use the results after classify-consensus-vsearch? Or we can use the below infor as our final taxonomy result?
1). have identifed number in: cluster-features-open-reference
2). unassigned in open-reference, but identified in classify-consensus-vsearch
3). other unassigned bacterias either by cluster-features-open-reference or classify-consensus-vsearch
Thank you so much in advance!