I don’t have much experience with statistical and phylogenetic analyses, and I have only recently started using QIIME 2. By following the suggested tutorials, I reached the step of building a phylogenetic tree in a 16S metagenomic analysis. I have the following questions:
For statistical analyses such as alpha and beta diversity, is it always advisable to build a phylogenetic tree rather than not using one? I noticed that there are also commands to compute these metrics without a tree. In my case, I am analyzing the V3–V4 region of the 16S rRNA gene.
To obtain a higher-quality phylogenetic tree, if I understood correctly from reading posts like this one and the Taxonomy assignment section of the downstream tutorial, should ASVs that are classified as unassigned be filtered out from both the feature table and the sequences before building the tree?
It depends on the metrics you are going to use! If you are not interested in such metrics as Faith PD (great one, IMHO) and UniFracs (weighted or not), or others that incorporate phylogeny, then you don’t need to construct a tree. I prefer to have them to understand my data better.
Once again, it depends on what you are going to do with your data. There is no such rule requiring the deletion of unassigned reads before tree construction. Personally, I do the following:
Filter feature table (remove rare ASVs, ASVs that were assigned to organelles, optionally unassigned ASVs)
Filter representative sequences to retain only ASVs that are present in filtered tables.