Would anyone be able to help clear my confusion on sequence clustering in qiime2 please? In mothur, sequence clustering is done in two steps. The first is a precluster step where very similar sequences are clustered using pre.cluster step. This is to further denoise the sequences. It will split the sequences by group and then sort them by abundance and go from most abundant to least and identify sequences that are within 2 nt of each other (if diiffs is set to 2). If they are, then they get merged. This can drastically reduce the number of unique sequence count. The second is the real cluster step with cluster.split command, where sequences are clustered based on the distances (how difference they are from each other) calculated previously, like 3% different or 1% different.
(1) Is clustering a must-have step for general 16s data analysis ?
(2) In qiime2, is the clustering already done by the dada2 denoising step or the dada2 only performs the aforementioned pre.cluster step and a further clustering still necessary?
(3) if further clustering is needed, is the q2-vsearch the one that can do the similar job? any other options?
(4) Or, the q2-vsearch is equivalent to dada2 denoising in terms of sequence clustering?
(5) dereplicating here in qiime2 is actually clustering sequencing, right?
Thank you in advance. Looking forward to any inputs.