Which way would be in principle the most appropriate? I understand ASVs concept is somewhat close to the 100% OTU clustering. And ASVs sequence is the exact sequence.
In contrast, OTU clustered sequences are the consensus seqs. So it might not be the exact biological sequence. I read a lot of papers says ASVs is more powerful than OTU.
And my coworker asks that if ASVs seqs are 100% OTU, could I make ASVs table by
vsearch(100%thresholds) -> OTU clustering -> diversity analysis?
here is brief.
I think ASVs are more appropriate than OTUs. What is your recommendation by making ASVs table. by Dada2? or by Vsearch? (I choose Dada2, but as there are many options, my coworker's opinion is vsearch).
just a very quick answer!
vsearch will group your sequences in clusters, which will be defined as any sequences with a fixed number of differences from a given centroid sequence (it may be a simplify definition but I hope you got the point, if not please look at vsearch documentation.). Vsearch will the output the centroid sequences not the consensus, which is a totally different concept to me. The usual difficult with the clusters is that two sequences located at extremity of a cluster still have the same, defined, difference from the centroid, but it is unclear what is the difference between these two (which may be up to twice the differences defined to set a cluster). Hope make sense so far.
Moreover, 100% threshold-clustering is not the same as ASVs creation. The main difference is that dada2 error-correct the sequence, predicting the original amplicon sequences from which the reads were obtained.
Clustering is merely 'sorting sequence out' in groups. Sequencing errors will be reflected in many spurious clusters.
If you want to cluster, you should consider an error correction step in any case. Some may choose to use dada2 as denoiser, then apply vsearch to cluster the denoised sequences, an alternative may be to generate zOTU (zero-radiance OTUs, basically after clustering denoising, see Generating OTUs and ZOTUs, currently not available in qiime2).
Being lazy, I just stick to dada2 (or debulr) only ...
Oh, I misunderstood ASVs might be the same if I use vsearch and cluster with 100 threshold. So I thought I could make ASVs table with DADA2 and Vsearch. But as you mentioned, if I use vsearch or debur, I might not be able to produce ASVs table right?
And also I am trying to use DADA2. However not like Vsearch, there weren't --p-maxns option in DADA2. (--p-maxns : remove N sequences). Whereas there is no N trimming option in DADA2.
This is the very reason my coworker was worried about.
If there are N sequences in the dada2 output (rep-seq.qza), can I ignore it?
My understanding is that dada2 filter out any reads containing Ns, as well as any reads with number of identified errors above the threshold (I assume the options above expose the 'maxEE' settings in dada2).
In a normal situation, you most likely have Ns at the tail of the sequences, so the idea is to change the trimming parameters to exclude these Ns from the dada2 processes. Then, after applying the error model, dada2 will retain any sequences with error count less than 2 (if you keep maxEE default setting).
Hope it make sense (last time I looked at dada2 manuals was a while ago!)
Cheers