Discrepancy between taxonomy.qzv and taxa-bar-plots.qzv


I noticed that my taxonomy.qzv and taxa-bar-plots.qzv have some descrepancies on the phylum level and I'm wondering where I went wrong.

This is my code to generate the two files:

# Assign taxonomic information to the ASV sequences
qiime feature-classifier classify-sklearn \
  --i-classifier taxonomy_assignment/silva-138-99-nb-classifier.qza \
  --i-reads quality-filtering/rep-seqs.qza \
  --o-classification taxonomy_assignment/taxonomy.qza

# Genearte human-readable summary of the taxonomic annotations
qiime metadata tabulate \
  --m-input-file taxonomy_assignment/taxonomy.qza \
  --o-visualization taxonomy_assignment/taxonomy.qzv

qiime taxa barplot \
  --i-table quality-filtering/table.qza \
  --i-taxonomy taxonomy_assignment/taxonomy.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-visualization taxonomy_assignment/taxa-bar-plots.qzv

taxa-bar-plots.qzv (482.3 KB)
taxonomy.qzv (1.3 MB)

When I look at taxa-bar-plots.qzv, I see that verrucomicrobiota has a much higher frequency than proteobacteria. However, when I look at the taxonomy.qzv it looks super wrong (only one instance of verrucomicrobiota).

On R, I made a phyloseq object (physeq) with the following feature table and taxonomy.qza. I have also attached the output of:

plot_taxa_prevalence(physeq, "Phylum")

taxonomy.qza (110.2 KB)
table.qza (82.1 KB)

This is data that has already been analyzed and I'm using it to practice. The results from the previous analysis done by our bioinformatician shows that Verrucomicrobiota should be much more prevalent.

Thank you for all your help!


I believe that both artifacts are aimed to show different things and can not be compared. Even with one representative in taxonomy file certain taxa can have very high prevalence and relative abundance across samples.

It is totally possible that verrucomicrobiota represented with only 1 or few ASV, which prevalence and relative abundance still higher than such of all ASVs represented proteobacteria. If it is not making sense to you based on samples origin, I would check the literature as well as extraction protocols and choice of primers for 16S. But there is no descrepancies between two files in question as it is.


I don't know if this would be possible, but would you be able to take a look at my files and quickly say which are the top 4 phyla in my samples? I'm just concerned whether I messed up somewhere in my R analysis.

For this, you can check taxonomy barplot you already shared with us.


