Mismatch between qiime2 taxa bar plot and phyloseq taxa bar plot

Hi all,

I know this isn’t 100% related to qiime2 since it also involves phyloseq so I am posting in the general discussion section.

I just discovered that the relative abundance plots I’ve made in phyloseq match none of my qiime2 taxa bar plots.

Here is an example.
Phyloseq:

qiime2:

I trust the qiime2 one more since it makes a lot more sense biologically. It appears that the microbiome diversity is getting “homogenized” after importing into phyloseq.

Here is my phyloseq code. I have tried making the plot pre filtering, post filtering, merging samples by metadata column in qiime2 and then importing, but nothing works.
Any ideas?

setwd(“C:/Users/samde/OneDrive - UCLA IT Services/Fish Project/ISLAND COMPILED PROJECT”)
otu_table=read.csv(“i3otumerged.csv”,sep=",",row.names=1, header=TRUE)
destroyX(otu_table)
otu_table=as.matrix(otu_table)
taxonomy=read.csv(“i3taxonomymerged.csv”,sep=",",row.names=1,header=TRUE)
taxonomy=as.matrix(taxonomy)
#dont forget header=TRUE for metadata
metadata=read.csv(“compiled_island_metadata.csv”,row.names=1,header=TRUE)
phy_tree=read.tree(“i3tree.nwk”)
OTU=otu_table(otu_table, taxa_are_rows=TRUE)
TAX=tax_table(taxonomy)
META=sample_data(metadata)
#tree already imported as a phyloseq object
#merge into phyloseq object
isl=phyloseq(OTU,TAX,META,phy_tree)
isl

im ← merge_samples(islclean, “Tax”)
imr=rarefy_even_depth(im)
imr
imrt=transform_sample_counts(imr, function(x) x/sum(x))
ip=tax_glom(im,taxrank=“Phylum”)
ipt=transform_sample_counts(ip, function(x) x/sum(x))

imP=tax_glom(im,taxrank=“Phylum”)
imPf = filter_taxa(imP, function(x) sum(x > 0) > (0.000001*length(x)), TRUE)
imPf
imPfr=rarefy_even_depth(imPf)
imPfr
imPfrt=transform_sample_counts(imPfr, function(x) x/sum(x))

ip=plot_bar(imPfrt,fill=“Phylum”)
ip2=(ip+geom_bar(aes(fill=Phylum),stat=“identity”))
ip2

1 Like

Hi @Sam_Degregori,

We are missing some information on the QIIME 2 side, prior to making the taxonomy plot:

  • What filtering was performed?
  • Were the data rarefied ?
  • Were unwanted taxa removed?
  • That is, did you perform the same operations in QIIME 2 as you did in phyloseq? If not, you are truely comparing apples to oranges here.

Within R / phyloseq there are several operations that may be contributing to the observed differences:
You’ve:

  • rarefied the data : rarefy_even_depth(im)
  • ran tax_glom(im,taxrank=“Phylum”) w/o setting NArm=False
    • This is likely the biggest issue. Any taxa with ‘NA’ in the column ‘Phylum’ (i.e. missing taxonomy for that rank) will be discarded unless you set this to False. Default is True. Check the help text for this.
  • Finally you are removing low count reads: filter_taxa(imP, function(x) sum(x > 0) > (0.000001*length(x)), TRUE)

-I hope this helps! :slight_smile:
-Mike

3 Likes

Hi Mike,

Thanks for the feedback. So I reran the code on the unfiltered dataset this time to match the one from qiime2. I also confirmed that they both have 44,905 OTUs. I also ommitted the glom function to avoid any Unassigned issues. I get the exact same results. In qiime2 I simply ran the taxa bar plot function on the raw dataset so I think the only thing that occurs is transforming the OTUs to relative abundance correct? The only other thing I did in qiime was merge my samples by a metadata column prior to running the taxa bar plot (and I still get 44,905 OTUs after this).

So my phyloseq code is:

iht=phyloseq(OTU,TAX,META,phy_tree)
iht
pd <- psmelt(iht)

ipd<- ggplot(pd, mapping=aes(x = Sample, y = Abundance,fill=Phylum))
ipd2=( ipd +geom_bar(stat=“identity”)
+theme(legend.position=“none”)
)
ipd2

and my qiime code is

qiime taxa barplot
–i-table table.qza
–i-taxonomy taxonomy.qza
–m-metadata-file sample-metadata.tsv
–o-visualization taxa-bar-plots.qzv

Here are the figures as well:
phyloseq:

qiime:

Note: You can see in the qiime2 figure that there should be some instances where Firmicutes takes up 50% of the sample abundance whereas in the phyloseq one you don’t even get close to that (see first fig). So even if the small reads or Unknowns were causing issues I don’t think there would be that big of a discrepancy right?

Thanks for this @Sam_Degregori, however I think the following command:

ipd<- ggplot(pd, mapping=aes(x = Sample, y = Abundance,fill=Phylum))

may suffer the same problems as the tax_glom command. I think one way to confirm this is to run a forward fill on the taxonomy data prior to plotting any of the data. There are a couple ways to do this, e.g. fill.forward, and fill. There may be others. This is the an optional strategy we use when constructing SILVA databases, see the RESCRIPt SILVA tutorial, and see this related but older post about forward filling taxonomy (i.e. propagation).

-Cheers!
-Mike

3 Likes

Hi Mike,

Apologies for the late reply. I keep forgetting to set my notifications on for this site.

I may be wrong here but wouldn’t this be a major issue with phyloseq? I know of many studies that have analyzed and published their data with phyloseq based on SILVA classified sequences and I would imagine their bar plots would all be wrong? Or is this specific to my data someway?

Also, is there an example of a script where forward filling is performed on a taxonomy file?
I’ve read all the links you sent which were very helpful but I am still a bit confused how this issue would result in the discrepancy I see since my Unassigned Phyla only make up less than 10% of the data. Thanks for all the feedback.
Cheers,
Sam

1 Like

This is why there is a very explicit statement about this in the tax_glom documentation. Which is what I referred to in my initial response.

You can use do the following to obtain a forward-filled taxonomy data frame from your phyloseq object.

Warning: I am not a savvy R user! I am sure others will have a much more efficient and proper way to do this than I ever could.

library(phyloseq)
library(qiime2R)
library(zoo)

# make phyloseq object
ps <- qza_to_phyloseq(
  features = "feature-table.qza",
  taxonomy = "taxonomy.qza",
  metadata = "metadata.tsv")

ps

# make taxonomy dataframe 
tdf <- as.data.frame(tax_table(ps))
tdf

# transpose and forward fill by columns
ttdf <- t(tdf)
ttdf

fft <- na.locf(na.locf(ttdf), fromLast = TRUE)
fft

# transpose back and update phyloseq object
ff_tax <-  t(fft)
tax_table(ps) <- ff_tax 

# sanity-check 
head(tax_table(ps))

This should work. But I defer to the expert R users for this. :slight_smile:

3 Likes

Hi @SoilRotifer,

Thanks for the clarification. So I tried the forward fill example you gave which changed the Unassigned phyla for me but the graph still looked quite similar. I also tried merging the samples on qiime and then importing to R which didn’t do anything.

Interestingly enough when I import the qza files using qiime2R, using the code you provided, and plot the phyla abundance as I did originally: `

ps ← qza_to_phyloseq(
features = “table.qza”,
taxonomy = “taxonomy.qza”,
metadata = “compiled_island_metadata.txt”)

im ← merge_samples(ps, “Tax”)
p=plot_bar(im,fill=“Phylum”)
p
`

I get the correct graph.

I am starting to wonder if importing the files in R manually somehow caused the OTUs to randomly assign to random samples because that would explain the “homogenization” of the taxa bar plot I was getting before.

I will update if I find out the specific reason in case others run into the same problem. But otherwise, thank you for the help, and the forward filling is super helpful as I had no idea tax glom was removing NAs.
Cheers,
Sam

3 Likes

Hi @Sam_Degregori, I’m glad my little bit of R code helped! :relieved:

If you find out what caused the initial issue let us know!

-Happy :qiime2:ing!
-Mike

1 Like