What's happening under the hood when running core-diversity-phylogenetic?

pramesh_shakya · February 21, 2019, 7:23pm

Hello,
I’ve been using Qiime2-2018.11. I ran the core-metrics-diversity-phylogeny(i passed in a phylogeny tree) to calculate the various diversity metrics after my count matrix(OTU(ASV in my case) by Sample) was rarefied. My confusion arises from the following:

I looked into the rarefied-otu-table and calculated Observed_OTUs in R where if an OTU had non-zero value for a Sample, it was counted as 1 and columns were added similarly(OTUs are rows and Samples are columns).
Then, i looked into the observed-otus.qza produced by Qiime.

These two outputs are similar but the numbers are different(they’re in close range). What I want to understand is why the numbers are different? Is it because qiime compares the taxonomic similarity for OTUs and if they’re similar, counts them as one? This is very confusing and the lack of documentation is very frustrating. A detailed explanation would be really appreciated.

thermokarst · February 22, 2019, 12:34am

Are you using the same rarefied table when comparing the results from R and QIIME 2?

QIIME 2 uses scikit-bio under the hood to computer alpha diversity.

You will need to provide more details regarding what was done in R in order to compare.

No, the observed OTUs metric is non-phylogenetic --- this algorithm is not capable of utilizing phylogeny.

docs.qiime2.org has a lot of information, but, we are a community-driven effort --- if you see something you want to change, we encourage you to contribute that change --- that is how these things grow and mature!

pramesh_shakya · February 22, 2019, 1:05am

Thank you for the quick reply. Yes, i found out that qiime uses scikit-bio under the hood and checked its documentation as well. My process of comparison was, I first outputted the rarefied table , converted into tsv, imported it in R, and for every datapoint that was non-zero, i counted it once and summed these non-zero occurrences across every sample. While doing this, the results i obtained(i.e. observed otus) were different from observed-otus.qza (this was also converted into tsv format ). And i find qiime’s documentation inadequate as things such as how the calculations are being done is not fairly accessible. I had to go through github to find out that it used scikit-bio under the hood. I wish i could upload the screenshot of the two results i got. So,when calculation observed-otus metric, qiime doesn’t take into account the phylogeny?

thermokarst · February 22, 2019, 1:25am

We will need something more concrete from you in order to assist --- an r script of what was done, plus the differing results would be helpful.

Once your trust level is bumped up you will receive permission to attach files. Until then, countless services exist on the internet that allow you to upload an image --- you can do that and then provide a link here.

Yes, but more specifically --- scikit-bio doesn't.

pramesh_shakya · February 22, 2019, 1:45am

I used the Vegan package in R to find out the observed-otus number which was calculated as follows:

data <- read.table(file.choose(), header=T, sep="\t")

library("vegan")

richness <- specnumber(t(data))

t(data) transposes the rarefied table as it is otu by sample, but for the calculation it should be in samples by otus format.
The link for the screenshot of the observed-otus.qza's content is below:

Similary,
the link for the screenshot of the observed-otus number i obtained after the R commands are as follows:

thermokarst · February 22, 2019, 10:43pm

I am going to attempt to recreate this using the Moving Pictures tutorial's rarefied table [link] (that way you can follow along, too).

# get data
wget https://docs.qiime2.org/2019.1/data/tutorials/moving-pictures/core-metrics-results/rarefied_table.qza

# compute observed otus in qiime 2
qiime diversity alpha --i-table rarefied_table.qza --p-metric observed_otus --o-alpha-diversity q2-obs-otus.qza
qiime tools export --input-path q2-obs-otus.qza --output-path q2-obs-otus
sed '1 s/^.*$/sample-id\tqiime2/' q2-obs-otus/alpha-diversity.tsv > q2-obs-otus.tsv


# export feature table for use in R
qiime tools export --input-path rarefied_table.qza --output-path rarefied_table
biom convert -i rarefied_table/feature-table.biom -o rarefied-feature-table.tsv --to-tsv
# clean up the biom file for import in R
sed '1d' rarefied-feature-table.tsv |  tr -d '#OTU ID' > table.tsv

# in R:
library('vegan')
table <- read.table('table.tsv', header=TRUE, sep='\t', row.names=1)
richness <- specnumber(table, MARGIN=2)
 write.table(as.data.frame(richness), 'r-obs-otus.tsv', sep='\t')

# prep r file for qiime 2 metadata tabulation
sed -i '1 s/^.*$/sample-id\tr/' r-obs-otus.tsv
qiime metadata tabulate --m-input-file q2-obs-otus.tsv --m-input-file r-obs-otus.tsv --o-visualization comparison.qzv

comparison.qzv (1.1 MB) | q2view

As you can see, the values are identical. It looks like the way you were loading and manipulating the dataframe was causing some samples and/or features to be dropped from the dataframe in R.

pramesh_shakya · February 23, 2019, 1:55am

Thank you. I’ll try and run it again and see if there’s something I missed.

system · March 26, 2019, 7:55am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.