Using data from qiime taxa barplot .CSV

Hi there Q2 Druids:

After running the following command, I do generate a barplot of my samples with the relative frequencies:

qiime taxa barplot --i-table table.qza --i-taxonomy taxonomy.qza --m-metadata-file Metadata_HBC.tsv --o-visualization taxa-bar-plots.qzv

From here I can download the .CSV, which gives the following:
$head taxa_level-7.csv

index T1A T1B T1C
|D_0__Archaea;D_1__Asgardaeota;D_2__Heimdallarchaeia;D_3__uncultured archaeon;D_4__uncultured archaeon;D_5__uncultured archaeon;D_6__uncultured archaeon 0 1 0
|D_0__Archaea;D_1__Asgardaeota;D_2__Lokiarchaeia;D_3__uncultured archaeon;D_4__uncultured archaeon;D_5__uncultured archaeon;D_6__uncultured archaeon 6 1 2
|D_0__Archaea;D_1__Asgardaeota;D_2__Lokiarchaeia;D_3__uncultured crenarchaeote;D_4__uncultured crenarchaeote;D_5__uncultured crenarchaeote;D_6__uncultured crenarchaeote| 1 0 0

As you can see, these include counts at the end of the taxas.

This might be a naive inquiry, but I assume these are the non-normalized counts for each sample. Since DESEQ2 normalization methods are not available yet in Qiime2, according to this thread:

I am wondering if using these results to load directly into DESEQ2 package in R would be the route to go to normalize by that method?

Many thanks!

Hi @Purrsia_Felidae,

To export your feature table into R for use of Deseq2 you want to take your actual feature table and not the .csv from the bar plots. So in your case the table.qza, check out this handy package that might be help with the process of getting this file into R. Keep in mind that the feature table by itself does not have taxonomic assignments but instead will be hashed IDs. If you want these with taxonomic assignments, then you’ll want to use the collapse function first before exporting. Lastly, keep in mind if you do plan on using something like ANCOM or gneiss in qiime2, no prior normalization is needed.

2 Likes

Fantastic! Thank you. I will check it out the qiime2R. Seems super handy.

I ran the collapse function command like you suggested and I ended up with a .biom file, which I used Qiime 1 biom convert to a .txt and got this … which looks identical to the .csv file from the taxa function.

Constructed from biom file

#OTU ID T1A T1B T1C
|D_0__Archaea;D_1__Asgardaeota;D_2__Heimdallarchaeia;D_3__uncultured archaeon;D_4__uncultured archaeon;D_5__uncultured archaeon;D_6__uncultured archaeon 0 1 0
|D_0__Archaea;D_1__Asgardaeota;D_2__Lokiarchaeia;D_3__uncultured archaeon;D_4__uncultured archaeon;D_5__uncultured archaeon;D_6__uncultured archaeon 6 1 2
|D_0__Archaea;D_1__Asgardaeota;D_2__Lokiarchaeia;D_3__uncultured crenarchaeote;D_4__uncultured crenarchaeote;D_5__uncultured crenarchaeote;D_6__uncultured crenarchaeote 1 0 0

This is then my raw counts to use as inputs?

And according to this link: Statistical methods using ANCOM
ANCOM isn’t being supported anymore in favor of geniss.

Also, I have been using gneiss in qiime2 with my data, but it keeps giving me a:
" Detected zero variance balances - double check your table for unobserved features" error.

According to this link: Gneiss zero balance error
Its probably bc I have a lot of singletons and doubletons in my data, so using Qiime 1, I ran the following command:
filter_otus_from_otu_table.py -i feature-table.biom -o feature-table.N3.biom -n 3 (removing features that are observed less than 3 times; so singletons and doubletons)

Then I import this filtered table back into Qiime2 and re-ran the gneiss commands only to get the same error. So, I keep filtering with -n 5; -n 10 with each step getting rid of an exorbitant amount of features. By my -n 10, I only have 109 features left out of 23369, and I’m still getting the zero variance balances variances ; so I think DESeq2 normalization would be the better choice.

I just wanted to make sure I’m using the correct input.

Many thanks for your reply!! It was very helpful.

Hi @Purrsia_Felidae,

Hmm, my apologies if you had to take the long way for this. For some reason in my head I had it that the .csv file that would be downloaded from taxa-barplots.qzv would be the relative abundances used to generate the plots and not raw counts. In any case, the .csv file would attach your metadata category columns at the end of the table too which is not what you want for a raw OTU/feature table. But perhaps you didn't have any extra columns in your metadata in the first place. Either way, sounds like both ways work!

Biom actually comes with your qiime2 so you can do this conversion in qiime2!

I'm not sure about the future of ANCOM to be honest, but there is an ANCOM2 which is meant to make its way to q2 at some point, see this thread for a link and more detail. ANCOM and gneiss use a similar approach to dealing with compositionality but gneiss' use of balance trees is a bit different than ANCOM. I'm no expert on this matter but I think there's room for both analyses since they do differ a bit fundamentally.

You can actually do all sorts of filtering right in qiime2 so you don't need to keep switching between qiim1 and 2. See this filtering tutorial for details on how.

As for your error with gneiss, I think your approach is right to filter low abundant features as this removes lot of of noise that can mess up ANCOM/gneiss. If the overwhelming majority of your features are only present less than 10 times it might be worth double checking to make sure these are true features and not a product of improper denoising. For example if you forgot to remove your primers/barcodes before denoising. You could try blasting some of these features from your rep-seqs table to see if they are indeed showing up as true taxa.
Another point to mention, in your little snippet of data I see only 3 samples. Is there only 3 samples in your data? If so then I imagine this might be to blame as well. You can't really do proper stats with n=3.

DESeq2 normalization wouldn't help with the error you are receiving nor the approach you are using to filter. Deseq2 normalizization deals with uneven sampling depth (instead of lets say rarefying) but you would still be wanting to filter low abundance features. ANCOM/gneiss use relative abundance data so that kind normalization is not necessary.

Look into those points I mentioned, and if you are still having problems with gneiss, could you please start a new thread and provide the exact commands you are using alongside with your data (if you can share them) and we will dive into that more in detail there.
I'm also pinging @mortonjt the creator of gneiss in case there's anything else I missed here.

p.s Thanks for looking up and linking other discussions on the forum. Very helpful!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.