However, after I got the results (taxa-barplot), most sequences were not assigned at phylum level, but shown as “Eukaryota” (please see attached figure).
I think that it is something wrong.
When I searched some unassigned feature in BLAST, some of them had high similarity to known species.!
Hi @F18D029G,
It looks like your barplot did not attach — could you please try again? It may be too large to upload here — you may need to produce a smaller example barplot (filter out samples first), or post to dropbox etc and share the link here.
Most or all? The difference is important — if all, this is usually human error (e.g., the wrong amplicon site was used). If most, then the query sequences usually have something wrong with them. Once I see your barplot I can diagnose a little better.
Are your sequences in mixed orientations? classify-sklearn cannot handle mixed-orientation reads currently — mixed orientation reads are also bad news for denoising and OTU clustering (since you effectively duplicate all unique sequence variants), so the best approach is to re-orient these reads prior to importing to QIIME 2 (we do not yet have a method in QIIME 2 to re-orient these).
Could you try classifying with qiime feature-classifier classify-consensus-blast? That will also help diagnose if this is an issue with the classifier or reference sequences.
Thanks for uploading! I am guessing that this is a level 2 view and e.g., Dikarya are classified more deeply.
Since you say those unclassified sequences BLAST to known species, I suspect this may be a mixed read orientation issue. You can try re-orienting reads, or use one of QIIME 2's other taxonomy classifiers, which are not sensitive to read orientation. At the very least, try this and if you get similar results then there is a different issue afoot (probably issues with the reference since you made your own custom reference):
Hi @F18D029G,
Indeed, looks like this is probably an issue with the reference. What steps did you take to format the reference sequences and taxonomy?
I imagine SILVA is fine for LSU analysis but I am not sure. RDP also has a fungal LSU database that you could download for this. I have not used either of these so cannot comment on pros/cons.
Another possibility is that you are getting ~50% unclassified sequences because you are amplifying non-target DNA. Do you know if your primers amplify non-fungal DNA?
What I meant is: did you just download the SILVA LSU database and import to QIIME 2 or did you take any other steps to format those files? Could you send me the fasta and taxonomy file that you imported to QIIME 2? I can take a look to make sure the format is correct.
You should contact the RDP admins for that — I do not know if it is available for download, but I do know that they have a LSU reference for their webtool classifier.
Could you pull out some of the sequences that are not classifying, and run an NCBI blast search on them to see what these seqs could be? (this will just confirm that these are in fact fungal LSU sequences)
could you please share ref-seqs.qza and SILVA_119_LSURef_trunc_taxonomy.txt?
I wonder if something is going wrong at the extract-reads step. Is it possible that the SILVA 199 database does not contain full LSU and does not overlap your amplicon region?