understanding diversity core-metrics-phylogenetic sample filtering

Hi
I am anlysing caeca microbiote. I run the following command:

qiime diversity core-metrics-phylogenetic --p-n-jobs-or-threads 48
--i-phylogeny 04-taxonomy/20251113_16S_chicken_caeca_v2.rooted-tree.qza
--i-table 02-dada2/20251113_16S_chicken_caeca_v2.asv-table-ms2.qza
--p-sampling-depth 73000
--m-metadata-file sample-metadata.tsv
--output-dir 05-diversity_alpha/diversity-core-metrics-phylogenetic

The dada2 sample stats are the following:

As you can notice, only 2 sampels are filtred out for visit num =2

When we look at bray_curtis_emperor.qzv generated by command diversity core-metrics-phylogenetic, only 2 sample from visit_num =2 is present

Why are these samples are filtered out?
If you require adtionnal information, do not hesitate to ask.

Thanks in advance for your help

after testing this more, I see that if I use real low sampling depth (i.e 1000) all 10 samples show up in jaccard or bray-curtis emperor plots

Why is the sampling depth not fitting dada2 plot?

Here is the call with the associated graph:

qiime diversity core-metrics-phylogenetic --p-n-jobs-or-threads 48 --p-ignore-missing-samples
--i-phylogeny 04-taxonomy/20251113_16S_chicken_caeca_v2.rooted-tree.qza
--i-table 02-dada2/20251113_16S_chicken_caeca_v2.asv-table-ms2.qza
--p-sampling-depth 1000
--m-metadata-file sample-metadata.tsv
--output-dir 05-diversity_alpha/diversity-core-metrics-phylogenetic

Hi @jflucier,
Based on the filename for your feature table, it looks like you may have done some filtering of features. I'm looking at the filename 02-dada2/20251113_16S_chicken_caeca_v2.asv-table-ms2.qza, and in some of our tutorials we use ms2 (short for min-samples=2) to indicate that features present in less than two samples were removed. (This isn't conclusive because this is just a filename - data provenance would tell us for sure.)

If you haven't already, could you run qiime feature-table summarize on the file 02-dada2/20251113_16S_chicken_caeca_v2.asv-table-ms2.qza, and look again at how many samples would be dropped with a sampling depth of 73000 (i.e., the first plot you shared). If that doesn't explain what you're seeing in the ordination plot, could you share the .qzv file that is generated by qiime feature-table summarize, and the .qzv with the Bray-Curtis ordination plot? We can take a look at the data provenance then and try to figure out what's going wrong.

Hope this helps!

2 Likes

Hi @gregcaporaso
you have pinpointed exactly the error. My cutoff was mistakenly set using the unfiltered table. Now dada2 and core-metrics-phylogenetic analyse correlate well. Thank you very much for your help

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.