Hi, I followed the tutorial “Training feature classifiers with q2-feature-classifier” and assigned taxonomy to my samples. However, the assignment does not look right to me since too many ASVs receive the same taxonomy and in many cases that the taxonomy is not correct as SILVA and BLAST search give me the same different result.
I’m thinking if that’s because of a problem with training the classifier.
why is this a problem? do you know the (approximate) composition of these samples? Many ASVs just means these sequences are different, not that the taxonomy should be different. You could, e.g., have many low-abundance ASVs that receive the same classification, but their relative abundance could still average out to an “expected” composition.
I’d recommend using taxa barplot to check this before drawing conclusions. Sharing plots like these could also make it clearer to us what the problem is.
SILVA and QIIME2 BLAST are very similar — both are LCA alignment-based classifiers, so are expected to behave similarly. So just because they agree does not necessarily mean that classify-sklearn is wrong. However, I would not expect them to give a very different classification from classify-sklearn — if so, that might indicate a training issue.
How different are the classifications? Is the issue that BLAST/SILVA give deeper classifications (e.g., to species level) and classify-sklearn gives shallower classifications (e.g., to family level)?
Your commands look correct. You could use extract-reads (as shown in the tutorial) to trim your reference sequences. That will probably increase accuracy a bit. Note that classify-sklearn has not been optimized for 18S reads, so fiddling with some parameters (e.g., the confidence parameter) may also help. It’s worth a try, though without an 18S mock community you really don’t know the “true” composition of a sample.
The first plot is based on SILVA assignment down to level 7 and the second is based on PR2 assignment down to level 9. Given that I have over 6000 features, I feel PR2 is giving me too few assignments.
Thanks for sharing the plots! This helps — and these results do look suspicious.
So let me get this straight:
Classification with classify-consensus-blast against the SILVA 18S reference database gives somewhat reasonable results. (for some reason in your earlier message I thought you were also using the aligner on the SILVA website and getting similar results to BLAST — which would make sense — so I apologize for that misunderstanding)
Classification with classify-sklearn against the PR2 database is getting mostly unassigned and poorly assigned features.
Classification with PR2 is giving so few taxonomy assignments because they are shallower: the classifier cannot confidently assign at deeper levels, so it is assigning at order or class level for the most part. This is resulting in most features being collapsed, even though they probably do belong to different species.
The problem here is almost certainly with the database. When we see poor classification like this (when results should be better, and your blast comparison is a good example), it is usually due to one of the following:
The database is no good to begin with (I know nothing about PR2 but would assume you wouldn’t be using it if this were the case!)
The database is trimmed to different regions than the query sequences. So PR2 may have only fragments of 18S, for example, that don’t fully overlap with your query sequences (which are also fragmentary). This is resulting in poor classification
The query sequences contain non-biological sequence (e.g., barcodes or adapter sequences). Ironically, BLAST would be less sensitive to this, since it is just looking for the best alignments (even if those alignments are terrible!)
I would bet on the second possibility above, maybe the third — you should check to make sure these query/reference sequences actually overlap completely, and that the query sequences are “clean”.
Another good diagnostic test — switch up the classifiers/databases. What if you use classify-consensus-blast against PR2, and classify-sklearn against SILVA? This could help determine whether it’s the database or classifier that’s misbehaving (and either can be fixed, but this will help diagnose)
Also, I’m not sure if this helps but a colleague of mine runs the same data in PhyloSeq with PR2 and I think he does get a reasonable assignment. I think this means that the database is good but there is something wrong either with the files (sequence and taxonomy) for QIIME2 (I downloaded them on PR2’s website and I think PhyloSeq uses different files) or with the classifier?
This is not all that surprising, as I noted above:
Those alignments are very likely not terrible (since you have pretty good taxonomy assignments), there is probably just extra noise in there (e.g., the database does not fully overlap your query sequences) that is less upsetting to BLAST local alignment.
classify-consensus-blast is blast local alignment followed by LCA taxonomy classification. classify-sklearn is a naive Bayes classifier based on kmer frequency, similar to RDP classifier. These have not been tested on 18S, but on 16S and fungal ITS classify-sklearn does a little better. For more details and benchmark results, see here.
Something is making PR2 behave badly and it is probably the way the database is pre-processed/trimmed prior to training the classifier.
You mean BLAST + PR2 vs. classify-sklearn + PR2?
What about BLAST + PR2 vs. classify-sklearn + SILVA or BLAST + SILVA?
You could use qiime quality-control evaluate-taxonomy to compare how close these assignments are. (This just gives you a quick assessment of the similarity between the two at each taxonomic level. Anything with F > 0.9 is pretty similar)
If I were in your shoes, I would probably just use the SILVA database and use either classifier (whichever one gives more pleasing results, because I already know both are pretty good on other marker genes) — or use PR2 with BLAST (it looks like you get good classification but the fact that this looks like a database formatting issue makes me slightly uneasy). The reason: it looks like BLAST + PR2 is not enormously different from classify-sklearn + SILVA (though this is just a passing glance) and I would rather not spend the time debugging a database formatting issue if SILVA gives me similar results.
The other problem: when you have multiple databases and multiple classifiers you wind up with the “two watches” problem (4 in your case!). Do you really know which is most accurate? I linked to an 18S mock community above that you could use to figure out which seems to be best — but that’s a single old data set and might not really generalize to your data.
Hence back to my “if I were in your shoes” suggestion: lacking benchmarking data or any other evidence to suggest one “watch” is better than the other, I would throw away the broken one and just stick with the one that I like the most.