How to group the reads in bigger than that DADA2 makes?


I have analyzed the reads of amplicons of marker genes (e.g., mcrA, dsrB), using the seqs resulting from DADA2. All are fine to get the data on diversity and classification of the seqs resulting from dada2.

One more thing i would like to do is making a phylogenic tree using rep-seqs. The rep-seqs from dada2 are too many (~for 1000 seqs for one gene) me to handle them; such that I hope to reduce the number of rep-seqs by increasing the cutting levels (e.g., 95%, 90%, 80%).

Could someone give me an advise to do that. Any suggestion is thankful (e.g., plugin, tutorials).

Best regards,


Hello @baehsung,

Sure! After denoising your reads using dada2, you can then cluster them into OTUs at different levels of similarity using vsearch as shown here!

Let us know if you have any questions about this process.

One more thing...

MAFFT supports up to 30k sequences, so you could do an MSA with MAFFT then put that into fasttree2 and see how it goes. MSA may be able to handle your large data set!


1 Like

Thanks Colin for informing me "vsearch", which is the exact one that i have wanted, and MAFFT.
In MAFFT, could you let me know what is what is MSA?


MSA is a Multiple Sequence Alignment. MAFFT one of many programs that do MSA.

After you align your multiple sequences, you can infer a phylogenetic tree. This slide deck is a great overview of this whole process.

Thanks Colin.

I took a look at the "slide deck", which explained details on the phylogeny.