Taxonomic rank rather than ASV

Dear all,

I am with this hoping for some advice and inputs on how to deal with my data when wishing to look at taxonomic ranks rather than ASVs.

I have a dataset where a lot of patient data is of interest (info regarding other diseases, medication, family history, inflammation, you name it…), and the easiest way to run analysis with regards to these data would be to export count data to stata after sorting sequences according to taxonomy at different taxonomic levels. (If not I believe I have to generate an enormous amount of filtered ASV-files and run each with 1 and 1 metadata category of interest through available diff.abundance tools).

Also I have a before/after design, and for differential abundance analysis I have used both ancom and deseq2 on the full dataset in order to compare before and after. Both methods being great as I can run paired analysis on my strictly paired data. But deseq2 is as far as I have understood designed for ASV/OTU analysis, and I have seen people advice against running it on higher taxonomic ranks.

A disadvantage of ancom is that I can not find a way to look into the ancom objects created in R - so for instance plotting is remarkably difficult for a R novice as my self! You do get the ancom plot, but I cannot find a way to adjust it and improve visualization of results. As I cannot get to the data behind the ancom-plot I am not able to redo it freely with ggplot… I am not able to get to the normalized data either…

From deseq2 it is possible to export normalized data, and it should be possible to use this with taxonomy in stata, but again I cannot help feeling that this solution is also a poor one?? That would have me applying non-parametric, paired analysis on these normalized data and I am not sure if the statistics would be correct. My Illumina MiSeq data is rarefied at 10000 sequences for diversity analysis (After reading both Waste not want not, and Weiss´ article…) But this is not appropriate when I wish to see if some phyla is significant different before compared to after in patients who (…and here you can insert any metadata of interest…). And the compositional design with relative abundances is a shared headache of course… I really would like to avoid it.

So here I am - hoping for a great advice on how to dig into my data :slight_smile:

Solveig

Hi @stangedal,
Thanks for posting. My initial assessment is that this is a non-QIIME2 question so we cannot provide 100% support (e.g., for how to get certain visuals in R) but it is still relevant to this forum from a conceptual perspective, so I am focusing on that in my response below.

what type of analysis are you performing? Based on your description here, it sounds like you are performing differential abundance on a feature table. A feature table is a feature table whether the features are ASVs or "collapsed" at different taxonomic levels. So I am not sure why you are framing taxonomic collapsing as a necessity.

I would recommend just taking your feature table (collapsed, uncollapsed, or both) and passing it to ANCOM in QIIME2. Since it sounds like you have a complex experimental design, you could also check out q2-gneiss to see if it provides a way to examine differential abundances in the context of multiple metadata columns.

Correct — you definitely should not rarefy prior to differential abundance testing. Use the raw counts and hand off to ANCOM.

This is what ANCOM is designed for — do not use relative abundances, though. Use raw counts. Collapsing your feature table at level X in QIIME2 will give you raw counts, not relative abundances. That table can be passed directly to the q2-composition plugin to perform ANCOM in QIIME2.

I hope that helps in some way!

1 Like

Thank you for the very quick respons! With ancom and deseq2 I have indeed used feature tables - but I have collapsed the features for ancom such that I check phyla and genus level taxa. The benefit of collapsing at higher taxonomic ranks is that I can get rid of some zeros and in a clinical setting I am more interested in the sum of ASVs identified as say Haemophilus than single ASVs where one Haemophilus ASV might and an other might not be significantly different before/after… And with DESeq2 I think the statistics can not handle data containing few variables as a dataset of 10 phyla for example. Therefor you have to stick to the ASV-level.

But I think gneiss just might help me out here if it is appropriate to enter data after collapsing on phyla level. My only concern with gneiss is my paired design and how gneiss deals with that? Do you know?

With regards to ancom - I might be mistaken - but I believe the incorporation in Qiime2 does not allow you to take paired data into account, while it is possible when you run it in R…? In my case it affects the results a great deal… Tried both in R :wink:

Looking forward to the next release of qiime2 which will give me even more options with my paired samples!

In QIIME2 you can use add-pseudocount to address that issue prior to feeding to ANCOM.

Ah yes that makes sense.

Oh I am not sure — @mortonjt any thoughts on this?

Arg you are right! It used to be possible but it looks like that function may have been removed in more recent releases. Too bad.

We do have open issues to upgrade to the latest version of ANCOM and improve paired testing support... I do not have an ETA but will post here when there is an update.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.