Hi, I’m working on my first major QIIME run and I have few questions about the best format for the pipeline. I see the examples online, but I was given an example that differs quite a bit and I wanted to know some opinions on which way I might have a more successful outcome.
Current Pipeline:
Import paired-end demultiplexed sequences
Denoise using dada2
Make feature table - check for low read counts, determine # of reads for rarifying
Run feature classifier to create taxonomy database
Run Vsearch to cluster de-novo by 99% identify
filter low read samples out
rarefy samples and make new feature table
do all downstream analysis (diversity, heatplots, etc).
I guess what I’m mostly wondering about is where to run vsearch. Should I run it before denoising (i.e. should step 5 move to step 2)?
Also, do you see any other issues with the pipeline as I currently have it?
Hi @reige012,
Is there a specific reason why you have to include Vsearch at all? Is OTU clustering really needed for your project? I ask because you are already utilizing dada2 to create ASVs which are just higher resolution analogues of OTUs. In most cases, there is no more needed for OTUs. I would just remove that step all together. Also don’t forget about tree building if you want any phylogenetic insights.
You do not need to rarefy either — rarefied tables should only be used as input for alpha/beta diversity methods, and this is automatically built in to the core-metrics pipeline. Do not use rarefied tables for differential abundance methods like ANCOM.
I do have another questions in regards to these responses. Is there a way to essentially “clean” and trim the sequences without using dada2 so that I can then cluster into OTUs instead of ASVs? I’d like to do both OTU (99% and 97%), as well as ASVs to see if there are any differences in the final outcome. Thanks!
There will be vast differences. Unless if you use a mock community (as the dada2 developers did to benchmark their method) there is no telling which method is "better"