Hi. I am going to do a meta-analysis which combines V4, V3-4, V3-5, and unknown regions. So I decide to do closed reference OTU picking using q2-vsearch. Here are my questions:
Is denoising and clustering are totally separated? I mean, can I use the table.qza and rep seqs.qza after denoising as the input file in q2-vsearch? Hit the denoised table and rep seq to the reference, and get the closed reference feature table and rep seqs.
If I cannot do denoising and clustering together. Are the following steps correct for q2-vsearch clustering?
demultiplex---"qiime vsearch join-pairs"---"qiime quality-filter q-score"---"qiime vsearch dereplicate-sequences"---"qiime vsearch cluster-features-closed-reference" ---"chimera filtering"
At which step should I trim all sequences to the same length? and How can I trim the sequences to the same length?
Because the moving pictures tutorial said "One situation where you might deviate from that recommendation is when performing a meta-analysis across multiple sequencing runs. In this type of meta-analysis, it is critical that the read lengths be the same for all of the sequencing runs being compared to avoid introducing a study-specific bias."
From a technical perspective, I would say yes, these are two separate steps. You could denoise in one step, to remove noise while capturing as much real biological sequence variation as possible, then cluster in a second step, to match up your reads to database sequences at a specific percent identity (99% or the classic 97% common for OTUs).
From a practical perspective, these two steps are ways of summarizing all your reads into a feature-table that says 'I saw this thing, this many times, in these samples.' If the thing is an ASV, or a de novo OTU, or a closed-ref OTU, that thing is still a feature in a table so you can get to the ecology.
ASVs and OTUs the 'things'/features inside of your feature table. Notice how we do not have ASV-tables and OTU-tables, we just have feature-tables because these are all features.
Yes! But in this specific case, it's an extra step you can probably skip...
Because your final step is a closed-ref clustering, that initial denoising step should not make a difference. During close-ref clustering counting, you are just counting hits to your database so any extra sequence variance you capture during denoising will be lost. This mean you can skip ahead to:
Yep!
Because close-ref just counts database hits, sequence length should not matter a lot. Especially because vsearch does 'glocal' alignments (edit distance excluding terminal gaps).
This is super important when using ASVs or de novo OTUs, because changes in sequence length will result in different features. But all your features will just be the ones in the database because that's how closed-ref counting works!
You are on the right track! Let use know if you have any other questions!