Hi there! I have some doubts about taxonomical asignment.
The pre-formatted SILVA reference sequence and taxonomy files that can be found in Data resources — QIIME 2 2020.6.0 documentation contain both 16S and 18S SSU references, right?
Is there any way I can choose using only 16S references?
Another question I have is, are there any resources available for PR2 to use with BLAST classification on Qiime2 like the SILVA reference sequences and associated taxonomy files ?
One last doubt is about doing the taxonomical asignment using vsearch vs BLAST. I’ve tried them both with a set of 12 samples and while the process with vsearch lasted ~ 38h, with BLAST it took just 1 hour. Is this explained by the fact that in vsearch global alignments are made while in vsearch they are local?
You can simply use RESCRIPt to make a 16S-rRNA gene reference set from the SILVA db. This is the tool we currently use to generate the SILVA files on the data resources page.
I often prefer to keep the 18S rRNA genes as there can be amplified off-targets. Leaving these 18S sequences in the reference set as “decoys” helps with identifying non-16S rRNA gene sequences. However, it can be beneficial to make a SILVA 16S rRNA only reference db to help limit memory usage. So, following the above linked tutorial and only keep Bacteria and Archaea should do the trick.
Not currently, although other users may have made their own. Hopefully, others will check into this thread. Otherwise, you can also download 16S rRNA gene files from GTDB and re-format them for import into QIIME 2. Obviously you should be able to do the same for PR2.
What parameter settings did you use for vsearch and blast?
I would avoid setting the taxonomic classification this high, leave this at the default (0.8). Otherwise you’ll end up with very few taxonomic classifications or only upper-level classifications (i.e. class, order…). For more details I’d recommend reading the Taxonomy Overview and the q2-feature-classifier paper.
Remember these classifiers are not necessarily returning only the best / top hit. Often you’ll have equivalent “hits” across several members in the databases. In this case a consensus taxonomy will be returned. That is, both vsearch and BLAST+, as implemented in QIIME 2, will perform database searching, followed by LCA (lowest common ancestor) taxonomy consensus assignment.
They are all good, which is why we offer these as options, see the overview link. Personally, I prefer to use naïve bayes classifier through sklearn.