ANCOM tutorial: Relative frequency table error required to make PCoA plot; ANCOM on rarefied table

Anahita_Bharadwaj · May 9, 2020, 5:23pm

Hi everyone,

I am currently following this tutorial to use ANCOM on my data. In the end, it shows how to make a PCoA plot showing the features that contribute towards the most variation.

However, when I follow this code, I get the error,

('array must not contain infs or NaNs', 'occurred at index b5fec4eff424e376216b2a3dc295f0b7')

I looked at all the different posts regarding this error and finally found that the error occurs when I try to convert my abundance table to relative abundance table. I found that most of my features ended up being "0" after the conversion. Please see my complete code below and I have bolded where I think the error is happening.

I saw in this post that this is something that is being worked on but I was wondering if there are any workarounds. If not, would you be able to suggest other ways that I can depict my data? Basically, I would like to show, visually, the features whose abundances are significantly increasing or decreasing across my samples. I am still new to this and would really appreciate your help and advice.

Here is the full code (very similar to what's on the tutorial),

qiime composition add-pseudocount
--i-table ANCOM/rarefied_table.qza
--o-composition-table ANCOM/ANCOM_ready_table.qza

qiime composition ancom
--i-table ANCOM/ANCOM_ready_table.qza
--m-metadata-file ANCOM/metadata.txt
--m-metadata-column Description1
--o-visualization ANCOM/ANCOM-Treatment.qzv

qiime feature-table relative-frequency
--i-table ANCOM/rarefied_table.qza
--o-relative-frequency-table ANCOM/rarefied_table_relative.qza
#I think the error might be here but I could be wrong

qiime diversity pcoa-biplot
--i-pcoa ANCOM/unweighted_unifrac_pcoa_results.qza
--i-features ANCOM/rarefied_table_relative.qza
--o-biplot ANCOM/biplot_matrix_unweighted_unifrac.qza

qiime emperor biplot
--i-biplot ANCOM/biplot_matrix_unweighted_unifrac.qza
--m-sample-metadata-file ANCOM/metadata.txt
--m-feature-metadata-file classified_rep_seqs.qza
--o-visualization ANCOM/unweighted_unifrac_emperor_biplot.qzv

llenzi · May 11, 2020, 8:45am

Hi @Anahita_Bharadwaj,

Only a thought on my side: any specific reason why you are using the rarefied table for the ANCOM analysis? My understanding is that even if this is not wrong, you are applying two 'normalisation' processes at the data: the rarefaction and the ANCOM internal log-transformation.
My suggestion would be then to try ANCOM on the not-rarefied data. I think it would also be helpful in any case to filter out low abundance/frequency ASVs to reduce the number of comparison for the ANCOM analysis (I also wonder if these low-abundance/frequency ASVs are the origin of the error after converting to the relative frequency!)

Anahita_Bharadwaj · May 11, 2020, 12:10pm

Hi @llenzi. Thanks for your response. As I mentioned, I am new to this and so, if rarefying is a big no-no, I would really appreciate learning why that's the case.

My reasoning for normalizing the data is related to the interpretations that I am hoping to make from my data. I have 3 experimental treatment and one control treatment in quadriplicates in my experiment. My hope is to compare them to see how the experiment has impacted the populations within the microbiome (if there has been a significant increase or decrease in abundance). Secondarily, I also wished to do pairwise testing with the control to see how much the abundance of populations has changed compared to control.

So, I figured using the rarified table would enable me to do this analysis since it would essentially put all the samples I have on a "level playing field". I understand I cannot compare the W values across different ANCOMs but I maybe able to do some sort of comparative inference since I started off with rarefied data. I acknowledge that there would be a loss in data and also loss of rare ASVs but that's not of interest to me, at least just yet.

So far, the results I got from the rarefied table ANCOM are similar to what I observe experimentally and with the relative abundance heatmap and so, I didn't think it was odd. Furthermore, this paper says normalization should work for ANCOM and would not be a big problem but I would love to hear your thoughts on the same.

With regard to my original question, I have switched to using volcano plots, which seem to be working better. As you mentioned, I filtered out the data (to minimum 10 reads) and still experienced this issue. I think the excessive zeros came from some glitch when converting the abundance data to relative abundance for the PCoA plots.... I am not sure why the tutorial I used suggested this method and I found a workaround with the original QIIME2 tutorial. Sorry guys, as I mentioned, I am incredibly new to this but enjoying it so much!

llenzi · May 11, 2020, 6:01pm

Hi @Anahita_Bharadwaj,

I think I did not explained myself very well! Normalisation is a must! There are many ways to normalise the data and rarefying them is one, as it is the log-scale transformation applied behind the scene by ANCOM.
In my view, I don't see rarefying the data as big no-no! It is still my preferred way to get 'quick and dirty' diversity for a dataset (and my only option for some analysis but this is an aside ... ).
My only point was that ANCOM is designed with its internal normalisation, and in your case would be the only one really important because it considers the compositional nature of the data. I would see rarefaction in your case as not harmful but not necessary either. Hence, maybe trying the command above using the not-normalised abundance table may work and give the information you asking!

As yourself mentioned, I would not expect massively different results with or without rarefying the data and it may even be good do both as sanity-check.

To do pairwise tests, the easy way may be to select only the groups in the comparison, but still the ANCOM including all the data should give a nice overview for the main changes.

Luca

Anahita_Bharadwaj · May 11, 2020, 7:15pm

Thanks so much for your response @llenzi. My bad... When I say "normalization", I mean "normalization by rarefaction".

ahh ok, good to know. I have done this to build some of the alpha and beta diversity metric according to the QIIME2 tutorial.
I figured if I did this, it would help me understand the differences across my samples (and different ANCOM runs) when I do pairwise ANCOM tests on them. Although the W values might be different (and therefore, not comparable), I felt that it would be more indicative of changes across my samples as they start off at the same sampling depth. Does this logic make sense at all or is the ANCOM math not setup to do this?

Good idea... I will do this and see if I get similar results.

Ya, that's what I figured too. As recommended by another post, what I did was to do an "overall" ANCOM on a phylum level and then selected the phyla showing significant changes. I then filtered these phyla from the rarefied table and performed ANCOM on them (on a genus level) to parse out where the main differences are coming from. Is this a reasonable assumption to make?

Sorry, I only ask because I want to be sure I am doing it correctly. Thank you for all the help and advice. This forum is amazing!

PS: I am editing the title to include this discussion.