I have a set of 16s data generated from miseq platform with v3v4 regions sequenced. I used dada2 for qc filtering, followed by taxonomy analysis with v3v4 classifier. However, I am having a problem where the sequence taxonomy classification stop at "bacteria" level. Some have ~90% of abundance of this k__Bacteria;;;;;__.
These are animal samples and some of them with as low as 100 reads. Any insight on this phenomenon? Should I use --p-exclude to remove this?
Chances are these unclassified reads are host DNA or other non-target DNA. Use NCBI BLAST (use the “exclude uncultured” option) to confirm, and if so then yes filter these out from your data before proceeding.
Thanks for the reply. But sorry to ask a naive question, do you mean doing NCBI BLAST search manually? Cause I tried to find related docs in qiime2.org on how to do it, but no luck yet.