the following is the error:
There was an issue with viewing the artifact taxonomy_hybrid.qza as QIIME 2 Metadata:
CategoricalMetadataColumn does not support values with leading or trailing whitespace characters. Column 'Taxon' has the following value: "D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Chromatiales;D_4__Sedimenticolaceae;D_5__Sedimenticola;D_6__Escarpia spicata endosymbiont 'Alvin "
Plugin error from taxa:
Feature IDs found in the table are missing from the taxonomy: {'2c046ebcbe3c77d290b439269f2981acfbcdae2f', '79d318932fb2dadc0ec4676d3b316cecf2e26f3b', 'a903a7338a38a6374a6253c2d065e937f6c55ce9', 'aa0aafff20819bf70a6826b74db4beec8f0399fd', '17be879cdaa0d023c80fc7dd6c478e65b0e07a4e', '4e0dc35655ba79ac928f8dedbc45427189e0595a', '12976d6d53b1557c836d8cade02bf05d5732ebe1', '4c66c461ce900875f50ba137c391ee0c8f392fce', 'dc9848cd92de1f73865519f703a7d0489c6eba3e', '5cd1a928e44d917f463e19a9fa86f51399ef2d2b', '7fd4bbac9c8d0171cace590c05df0ed21b1064a8', 'd2603311c474aa61d72c3df73b96007bed997573', '1ec69f423b70c69555265e56c90f441013e2069c'}
Debug info has been saved to /tmp/qiime2-q2cli-err-2tpqfyaw.log
The hybrid classifier includes an optional filtering step to remove sequences that poorly align to the reference, so it looks like you have a handful of sequences that are being filtered by that step. You have two options to resolve:
use the --p-no-prefilter option with the hybrid classifier to disable this filtering step
filter your table with qiime feature-table filter-features to only keep features found in the metadata file (i.e., those that pass this filter).
You should probably do the latter; it is a very rough prefilter (50% similarity to a random subsample of the reference sequences) so anything failing to match could be junk — but it is worth manually checking those features to see (if you are filtering real sequences you may need to increase the subsample or just disable the prefilter step).
Note that the hybrid classifier should really only be used if your reference database and sequences are trimmed to exactly the same sites, e.g., with extract-reads. The first step of this pipeline performs exact matching between query and reference, so is not the same as the default classify-consensus-vsearch method... reads will be unclassified if they do not match 100% with at least one reference sequence.
@colinbrislawn@Nicholas_Bokulich Thanks for your reply
I use qiime tools import --type 'SampleData[Sequences]' to import my 16S miTags to qiime2. So I can’t extract-reads. I will try to use --p-no-prefilter option.
When I use --p-no-prefilter option, the bug was fixed.
But in the classification reuslt, there are 10% of reads can only classify to Becteria this Kingdom. Could you give me some advice to improve it? The 16s mitags are in different variable regions,which are difficult to classify.
10% of reads only classifying to kingdom level? Such a low level is not too concerning — often some non-target DNA can be amplified or cross-contaminated, and should just be removed (you can spot-check a few of these unclassified ASVs with NCBI BLAST to see what they are first)
So then it is also possible that some of these reads are not classifying due to the mixed amplicons? You can use a full-length 16S classifier to classify these (with classify-sklearn).
Do not bother using the hybrid classifier — this will not be useful for your data unless if you use extract-reads to extract all possible primer pairs and merge those data together, since the first step of the classifier uses vsearch with exact match.
But I use some of these reads to blast in NCBI I find most of them are 16s sequences. So I feel confusing.
The resulttaxa-bar-plots_vsearch (2).qzv (2.3 MB) is classified by classify-consensus-vsearch using full-length 16S classifier.
This is classified by sklearntaxa-bar-plots_sklearn.qzv (2.5 MB)
Sklearn result seems have more reads only classifying to kingdom level.
Do you still suggest I remove all of these reads and recalculate the abundance?
PS: The unsigned and only classifying to kingdom level reads account for about 10%~20%
Thanks for sharing your results! Based on these results it sounds like this is probably related to the multi-amplicon protocol that you are using.
Sounds like those are definitely 16S reads (usually this issue indicates non-target DNA but there are exceptions which is why I always recommend checking).
As noted on the training a classifier tutorial, accuracy increases slightly when training on the primer region being targeted. Usually using the full-length classifier does not impact accuracy too much, but I have not tested all 16S domains... it is possible that some domains are impacted by this more than others. It would be very interesting to see if these unclassified/underclassified ASVs all belong to a specific 16S region, or to a specific clade.
Based on the profiles, it looks like removing these probably wouldn't impact the resulting proportions too much, since the unclassified/underclassified ASVs represent such a small fraction.
However, I hate to throw away "good" data if we are able to use it with another method. You could use classify-sklearn and try splitting out the different amplicon regions to train region-specific classifiers, then recombine after classification. But if I were in your shoes, I would use the classify-consensus-vsearch classifier, since it seems to perform better "out of the box" with your protocol. It looks like you could probably improve the results with that classifier a little more, too — maybe try using the --p-top-hits-only option and increase --p-perc-identity a bit.