We are missing some information on the QIIME 2 side, prior to making the taxonomy plot:
What filtering was performed?
Were the data rarefied ?
Were unwanted taxa removed?
That is, did you perform the same operations in QIIME 2 as you did in phyloseq? If not, you are truely comparing apples to oranges here.
Within R / phyloseq there are several operations that may be contributing to the observed differences:
rarefied the data : rarefy_even_depth(im)
ran tax_glom(im,taxrank=“Phylum”) w/o setting NArm=False
This is likely the biggest issue. Any taxa with ‘NA’ in the column ‘Phylum’ (i.e. missing taxonomy for that rank) will be discarded unless you set this to False. Default is True. Check the help text for this.
Finally you are removing low count reads: filter_taxa(imP, function(x) sum(x > 0) > (0.000001*length(x)), TRUE)
Thanks for the feedback. So I reran the code on the unfiltered dataset this time to match the one from qiime2. I also confirmed that they both have 44,905 OTUs. I also ommitted the glom function to avoid any Unassigned issues. I get the exact same results. In qiime2 I simply ran the taxa bar plot function on the raw dataset so I think the only thing that occurs is transforming the OTUs to relative abundance correct? The only other thing I did in qiime was merge my samples by a metadata column prior to running the taxa bar plot (and I still get 44,905 OTUs after this).
Note: You can see in the qiime2 figure that there should be some instances where Firmicutes takes up 50% of the sample abundance whereas in the phyloseq one you don't even get close to that (see first fig). So even if the small reads or Unknowns were causing issues I don't think there would be that big of a discrepancy right?
Thanks for this @Sam_Degregori, however I think the following command:
ipd<- ggplot(pd, mapping=aes(x = Sample, y = Abundance,fill=Phylum))
may suffer the same problems as the tax_glom command. I think one way to confirm this is to run a forward fill on the taxonomy data prior to plotting any of the data. There are a couple ways to do this, e.g. fill.forward, and fill. There may be others. This is the an optional strategy we use when constructing SILVA databases, see the RESCRIPt SILVA tutorial, and see this related but older post about forward filling taxonomy (i.e. propagation).
Apologies for the late reply. I keep forgetting to set my notifications on for this site.
I may be wrong here but wouldn’t this be a major issue with phyloseq? I know of many studies that have analyzed and published their data with phyloseq based on SILVA classified sequences and I would imagine their bar plots would all be wrong? Or is this specific to my data someway?
Also, is there an example of a script where forward filling is performed on a taxonomy file?
I’ve read all the links you sent which were very helpful but I am still a bit confused how this issue would result in the discrepancy I see since my Unassigned Phyla only make up less than 10% of the data. Thanks for all the feedback.
Thanks for the clarification. So I tried the forward fill example you gave which changed the Unassigned phyla for me but the graph still looked quite similar. I also tried merging the samples on qiime and then importing to R which didn't do anything.
Interestingly enough when I import the qza files using qiime2R, using the code you provided, and plot the phyla abundance as I did originally: `
I am starting to wonder if importing the files in R manually somehow caused the OTUs to randomly assign to random samples because that would explain the "homogenization" of the taxa bar plot I was getting before.
I will update if I find out the specific reason in case others run into the same problem. But otherwise, thank you for the help, and the forward filling is super helpful as I had no idea tax glom was removing NAs.