Relative abundances of taxonomy analysis

Benedict · July 9, 2018, 5:06am

Hi,
I would like to ask if there's any method to generate relative abundances of the microbial community from Qiime2? So far, I only able to download absolute abundances in CSV format from qiime taxa barplot.

Mehrbod_Estaki · July 9, 2018, 7:21am

Hi @Benedict,

I believe you are looking for the relative-frequency tool.

Benedict · July 10, 2018, 8:26am

Hi,

No, the relative-frequency can't work. This relative-freq table only providing me number of seq per sample and the occurrence of each feature in the whole sample. I would like to know the bacterial composition in each sample as similar to --qime taxa barplot coding, yet so far I could only view the composition in term of absolute abundances but not relative abundances (percentile). My friend told me that Qiime1 was able to generate microbial composition (relative abundances) of each sample at different taxa level by just clicking on download csv button. I wonder if there is any way to download with the similar way or not?

Besides, may I know how to filter the taxonomy data with low relative abundances in order to produce a higher quality taxonomy data? I have read through 'filtering-data' and 'q2-quality-control' tutorial, but none of the statement disentangle my doubts.

Hereby I attach both csv file that generated by Qiime1 (BlastResults_3July18.csv) and Qiime2 (level-7.csv) for your reference and better understanding of my doubts.
Thank you in advance!

level-7.csv (84.5 KB)
BlastResults_3July18.csv (13.1 KB)

Mehrbod_Estaki · July 10, 2018, 10:03am

Hi @Benedict,

Hmm, just to clarify, did you run relative-frequency on your feature table and the output did not come out as relative abundance? If so, this is odd, would you mind sharing your feature table and the exact commands that you used so we can see if there's any issues?

Or are you saying that the script worked but you are just looking for the taxonomic assignments instead of the feature IDs? If your feature table was the output from Dada2/deblur, then you do not have taxonomic assignments but rather the feature IDs for those sequence variants. When you run taxa barplot you provide this feature table + your taxonomy assignments separately to create a new visualizer, but your original feature table still only has the feature IDs. To create a new feature-table which has taxonomic assignments instead of the feature IDs, you'll want to use the taxa collapse command first. This creates a new feature table with taxonomy collapsed at whatever level you want (i.e 6 for genus), then you can run relative-frequency to convert this new frequency table to relative abundance table. That should give you the output you're looking for.
Edit: If you want this new feature table in a new format, lets say a .tsv format, check out this post with more instructions.

Filtering based on percentages is not currently available in qiime2 yet but is on the radar of the development team (open issue). You can currently either filter based on frequency or see this post for an alternative solution within qiime2.
Hope this helps!

Kara · July 10, 2018, 6:15pm

We have a related question: We ran the taxa collapse on level 2/phyla, then ran relative-frequency. We can download the .csv table with this data, but what we REALLY want is this relative frequency data separated out by sample, not aggregated together. We'd like to see, for example, that proteobacteria proportionally makes up 55% (0.55) of sample 1, but 75% (0.75) of sample 2. Is there a way to do that?

The taxa bar plot visualization shows us these relative frequencies of each phyla (for each sample), but when we export the .csv file, it apprears to be absolute counts, so we can't actually get the exact numbers exported to Excel. I'm sure we can convert those absolute counts to percentages on our own, in Excel, but is there a way we can do this in QIIME2?

Mehrbod_Estaki · July 11, 2018, 12:35am

Hi @Kara,

The instructions and links I provided above will actually create exactly the type of table you are looking for. The important thing to note is that you should not be downloading the .csv table from taxa bar plots visualizer.

For clarity this is how the work-flow should look like:
First, I'm going to assume you have already acquired your dada2/deblur feature-table and that you have your feature-classifier as well. If these steps have not been performed, I suggest going back to the tutorials and following along those steps.

Ok, now we want to create a feature table that has taxonomy instead of feature IDs, basically what former OTU tables were in qiime1:

qiime taxa collapse \
  --i-table feature-table.qza \
  --i-taxonomy taxonomy.qza \
  --p-level 2 \
  --o-collapsed-table phyla-table.qza

Where feature-table.qza is from output of dada2/deblur and the taxonomy.qza file comes from the classifier I've linked above.
Now we will convert this new frequency table to relative-frequency:

qiime feature-table relative-frequency \
--i-table phyla-table.qza \
--o-realtive-frequency-table rel-phyla-table.qza

This new artifact now has the relative-abundances we want. To get this into a text file we first export the data which is in biom format:

qiime tools export rel-phyla-table.qza \
--output-dir rel-table

We now have our new relative-frequency table in .biom format. Let's convert this to a text file that we can open easily:

# first move into the new directory
cd rel-table
# note that the table has been automatically labelled feature-table.biom
# You might want to change this filename for calrity
biom convert -i feature-table.biom -o rel-phyla-table.tsv --to-tsv

Now we have a text file of our relative frequency table which looks like this:

Where the sum of each column is 1.
Hope this clarifies things!

Benedict · July 11, 2018, 3:04am

Thank you Sir @Mehrbod_Estaki, I got the desired outcomes following your guidance steps above. Obviously, I performed taxa collapse and convert frequency table, and view it with FeatureTable summaries. The tsv conversion is the crucial subsequent step which I missed.

Btw, can the output artifact(.qza) of relative-freq table being applied in other function rather than viewing? for instance, differential abundance, qiime taxa barplot and etc? I've been trying to perform differential abundance testing with ANCOM, yet this plugin didn't accept rel-freq table format.

Mehrbod_Estaki · July 11, 2018, 3:31am

Glad it was helpful @Benedict!

As for what type of data you can use in various plugins, you can simply check their help file to see what type of artifact is accepted. For example from the qiime taxa barplot plugin page:

 --i-table ARTIFACT PATH FeatureTable[Frequency]
                                  Feature table to visualize at various
                                  taxonomic levels.  [required]

So, this script only accepts FeatureTable in Frequency format.
This is because taxa barplot will automatically convert these and show you relative abundance data in the visualization so you don't want to perform this step twice.

ANCOM actually requires compositional data but again you don't need to do this yourself as it does this for as you as part of its pipeline. For example if you are following the Moving Pictures tutorial you'll see that under the differential abundance with ANCOM section, the first NOTE box it indicates that both ANCOM and gneiss use compositional data. As so, following an initial recommended filtering step, you are required to create this using qiime composition add-pseudocount which adds pseudocounts as well converting these to composition data. So you actually don't want to take relative frequency table into this because then we would be performing this step twice.