Sampling depth decision making question

Hind_Sbihi · February 2, 2018, 1:13pm

Dear Qiime2 community members:

In order to decide on the sampling depth for rarefaction, I was advised to examine the correlation between my outcome of interest and the features abundances.
I was advised to adapt a script
that provides some guidance on how to do this and even how to use bootstrapping method to get the less biased number.
observation_metadata_correlation.py -i otu_table.biom -m map.txt -c pH -s pearson
–pval_assignment_method bootstrapped --permutations 100 -o pearson_bootstrapped.txt

Is there any equivalent for QIIME2?
Moving pictures tutorial indicates that sampling depth is to be chosen carefully using the table.qzv generated after the Deblur step.
Following this approach the decision is no longer dependent on the metadata.

So in summary:

How can we get the same table of bootstrapped pearson coefficient?
Should the decision be independent or dependent on the metadata variables?
Many thanks for your time and input!

Nicholas_Bokulich · February 2, 2018, 2:43pm

Hi @Hind_Sbihi,

What is your outcome of interest? It sounds like perhaps you are trying to correlate the relative abundance of a particular taxon with something like CFU counts or another method for measuring abundance of a particular organism.

One issue with using these correlation methods is that they are not meant to be used on compositional data (e.g., relative abundances). For discovering associations between metadata values and sequence variants/OTUs/taxa, we instead recommend using q2-gneiss. But I don't think that is necessarily appropriate for the question you are asking here (determining sampling depth). @mortonjt, do you have anything to add on this point?

That is correct — we recommend using alpha rarefaction, which (in my opinion) more directly assesses this question from a diversity perspective (i.e., how does rarefaction impact diversity metrics)

But I think I may understand the goals of the analysis that you describe. It is unclear to me precisely how the approach you describe (rarefying at different depths and correlating feature abundance with metadata) will be an effective method, unless if the values you are using for correlation are the expected value (in metadata) and observed value (in feature abundance) are some sort of internal standard. If that's the case, please do tell more.

QIIME2 does not yet support a correlation method for feature data akin to qiime1's observation_metadata_correlation.py. The main reason is because it is generally an inappropriate test for the data types that are typically supported in QIIME2 at the moment (as described above), so that method was dropped in favor of development in other areas. We have not really had many (or any) users report missing that function. But we have discussed back and forth whether such a correlation plugin would still have merit, e.g., with certain data types we plan to support in the future, and user input would really help us move this forward. If you could describe your use case in more detail (e.g., are you correlating against some type of standard?) this could be very helpful for us in planning future development.

You would need to export your feature table and run that analysis in qiime1 for now, if you still want to get a bootstrapped pearson correlation coefficient. (but again, describe your use case in more detail and we might be able to add support for this in the future)

Not knowing all the details of your method, I'd say that using alpha rarefaction and looking at the impact on diversity metrics (as described in the moving pictures tutorial) is a better way than relying on some metadata variable. But again, if that metadata variable is reporting the abundance of an internal standard or something to that effect, then I think that could be a better approach. I would need to know more to fully weigh my opinion on this.

I hope that helps!

system · March 5, 2018, 8:43pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.