if I filter a table,,,,

I made a phylogeny with table(ex. alldata-table.qza)
However there were unassigned signs in the barplot.
So I filtered unassigned as below
$qiime taxa filter-table --p-exclude Unassigned option.
And got new table (ex. filtered-table.qza).

  1. So, as long as I use this filtered-table.qza, should I make a new phylogeny and use new rooted-tree.qza for core-metrics-phylogenetic input instead of original one?

  2. And I figured out that sample depth is the lowest feature count of all the Sample ID. And I could get this information form summerize filtered-table.qza. Both are right?

  3. So if I use the filtered table (alldata-table : 47000, filtered-table : 674) I might use 674 sample depth in alpha rarefaction, make core-metrics-phylogenetic?

Hi @yogurt!
Unless I misunderstand, it seems like these questions are not all directly related. In future, please create separate topics for each unique question or closely-related group of questions.

Whether you want to do this will depend on what questions you’re trying to ask with your study, and may depend on how you’re creating your phylogeny as well. With mafft fasttree, for example, I believe your tree is generated from the aligned sequences themselves, not by referencing an external phylogeny. When you calculate phylogenetic diversity, then, dropping unassigned sequences will impact your alpha/beta diversity results significantly.

Only you would know whether this is the right choice for your study design - it might be worth considering how you would explain the decision to filter out unassigned taxa during paper review.

These questions are pretty unclear. Please clarify if I’ve misinterpreted, but it sounds to me like you are asking how to select a --p-sampling-depth for use in core-metrics-phylogenetic. The basic goal is to preserve as many sequences as possible, while avoiding the loss of samples that are important to your study. The parkinson’s mouse and moving pictures tutorials both present approaches, and there are many forum posts discussing sampling depth. Please take some time to search :mag: and read these.

If I misunderstood questions 2 and 3, feel free to clarify here. If, after reading, you have specific questions about how to select a sampling depth, please open a new topic for them.

Chris :penguin:

1 Like

@aeriel.belk, rumor has it you might have done some studies in which you filtered unassigned taxa before running diversity analysis? If so, the OP and others might get some value from a discussion of what your use case was, and how it went. If not, please disregard.

Hey! Usually, I only filter out chloroplast and mitochondria prior to diversity analysis, because I think the unassigned ASVs can still have value to diversity questions. I will sometimes filter it out for taxonomic analysis just because it makes the data messy. But, I suppose making this decision also depends on your system. For example, if you have a really well-described system like the human gut I think it is easier to justify filtering the unassigned ASVs out before diversity, but if you have a less documented or very complex system I’d leave it in. I hope that makes sense!

In general, though, you don’t need to remake a phylogenetic tree after filtering. It is ok if the phylogenetic tree has extra sequences! You only run into problems if you have a tree with fewer sequences than your table (i.e. if you did make a tree from the filtered data, then tried to use that tree with the unfiltered table). In short, I’d say the answer to 1 is no, you are fine with your original table as long as you use the filtered table in the core metrics command :slight_smile:


Thank you so much aeriel!
As you mention it, I had also same opinion about ASVs. Since my samples are from ocean(some samples are from 1200m depth) I thought there might be insufficient database which means I might loose valuable data if I filter an assigned ASVs. So in my case, do you think I should stick with OTU clustering instead of ASVs analysis?

Oh, and also my work is not for microbiom. I have eukaryote samples(zooplankton). All papers using QIIME2 I found was dealing with microbiom, not eukaryote. So I was about to choose the best result from those two( OTU clustering analysis v.s ASVs analysis). And I got the beta diversity result from ASVs which shows not very interesting result. Maybe I should do this with OTU right?

How you decide to approach this depends on your study questions, and best practices in your field. I don’t have any experience with macro-biota analysis, but your impulse to compare OTU and ASV approaches seems reasonable - certainly safer than filtering out 98% of your features before diversity analysis.

If, by clustering, you can find meaningful differences between OTUs, great! Just recognize that clustering, like filtering, may present “clearer” comparisons by reducing the amount of information in your data. This may actually make more sense with macro-biota than using ASVs would - I don’t know. Either way, consider how you will justify your selected approach when writing this study up.

How much within-species sequence variation do you see with zooplankton? What is a reasonable clustering threshold for those organisms? Literature may already exist on this, but these are interesting questions. What does an ASV mean in this context? And what is the most meaningful way to define an OTU here? Thanks for getting me thinking!

Good luck,
Chris :ram: