DADA2 giving less observed ASV and taxonomy classifier

Keegan-Evans · November 9, 2022, 12:29am

@Vetshweta,
Welcome to the community! Let's see if we can get you unstuck.

Good work on your attempts to get your analysis off the ground so far. I am going to suggest that you try to stick completely inside one set of tools, as the analysis process is complicated enough without jumping between various software packages, and in this case I am going to advocate for sticking within QIIME 2 as much as possible, as we aim to deliver a complete set of tools, laid out in a consistent and well documented manner.

Very first, if you have not seen it already, I want to make you aware of these two resources, plugin-workflows and QIIME 2 for Experienced Microbiome Researchers. These pages can often help clarify the overall process and can help you plan out the roadmap for your analysis.

Following this line of reasoning, this year, support for DADA2 denoising of PacBio CCS data was added the Q2-DADA2 plugin, so your initial denoising and ASV generation should be able to happen inside of QIIME 2 now! Check out the docs for this functionality here. Additionally, this functionality was largely developed by DADA2 development team and the settings defined in it might be better tuned to getting the most out of CCS data using DADA2 than the instructions provided by your sequencing center(no promises there).

Getting optimal denoising results can take a bit of tweaking, often you will lose usable sample data if DADA2 detects too low of a quality score at anypoint in the sample read, so it is often better to trim your data to where the quality remains high, even if you end up losing some base pairs(not really an issue with PacBio sequencing, there are lots!), this will keep entire samples from being thrown out as quickly, here are some videos from one of our workshops that provide a bit more detail. Quality drop near the end of sample read in some ways is less of a problem with long read technologies, but the principles still apply, and getting these settings correct could save a lot of data. You will have to use a 'Manifest' import (docs) and cutadapt demux-single (docs) to demultiplex.

It is likely still worth creating taxonomically annotated data as opposed to performing only strictly distance based analysis methods (such as the diversity methods) where the distances are compute directly from the sequences themselves.

You can still use a SILVA classifier Matching percentages only really apply to OTUs, not ASVs(which are generally a more accurate, modern approach, see this paper for more). But you can still use a classifier trained on OTUs! If you are interested in training your own classifier, I would checkout RESCRIPt, as it can make this process a lot easier. I would probably be worth doing the classification with the generic classifier first, just to get the process down first, then going back and training your own if it still feels necessary later, it can be a slow process .

Hope this helps and if I have missed anything or you have other questions, let me know!