Evaluate mock community composition

(Candela) #1

I’m having some issues to obtain the plot score for my mock samples composition comparing observed against expected. The thing is that the qiime2 analysis (step assign taxonomy) are done against SILVA database who has this aspect when you get a taxa for a feature:
D_0__Bacteria;D_1__Firmicutes;D_2__Bacilli;D_3__Bacillales;D_4__Bacillaceae;D_5__Bacillus;D_6__Bacillus subtilis
where 0 is domain, 1 is phylum, 2 is class, 3 is order, 4 is family, 5 is genus and 6 is specie

In qiime2, level 1 is domain, level 2 is phylum, 3 is class, 4 is order, 5 is family, 6 is genus and 7 is specie.

So in my score plot, level 1 in qiime2 that it’s D_0__Bacteria in SILVA isn’t counting (see score plot uploaded --> score%20plot cause R-value as also p-value are cero but the rest is.

I don’t know if I not importing a good txt or what is happening is a semantic problem between the database format and qiime2 nomenclature.

I am also attaching the observed and expected txt.
expected-relative-frequency.txt (1.4 KB)
observed-relative-frequency.txt (1.8 KB)

Will be grateful if someone knows what is the real issue here.



(Nicholas Bokulich) #2

Hi @Candela,
Thank you for reporting this bug!

This numbering system is not the cause of this issue. That is just the numbering on that score plot — this numbering system is not canonical to QIIME 2, and QIIME 2 (and the evaluate-composition action) does not actually use the numbering information in your taxonomy… levels 1-7 are just the ranks as they appear in your taxonomy files.

Your expected and observed taxonomies use the same syntax (SILVA format) and that is all that matters here.

Your results actually look quite good and accurate if you ignore the R2 value at level 1. The other R2 values, as well as TAR and TDR, look like they are probably correct.

I have raised an issue to investigate the bug that is causing R2 values at level 1 to be miscalculated, and hope to get that fixed in time for this month’s release of QIIME 2.

Until then, I recommend not using the R2 values (at least not at level 1) — use TAR and TDR instead, which all look correct — or calculate R2 independently to confirm the correct values at each level.


(Nicholas Bokulich) #3

Could you please provide:

  1. the exact command that you used
  2. all input files you used (QZA files for expected and observed, as well as metadata if you used it)

this will help me debug. Thanks!


(Nicholas Bokulich) #4

Just for more context, it looks like this bug occurred here because scipy’s linear regression function (which is being used to calculate R here) fails if there is only one measurement each in the expected and observed — which is the case in your data since you only have one observation in each sample at domain level: bacteria.

The good news is that means that the other measurements are valid, it really is only R and R2 values at level 1 (where you have only bacteria) that is affected by this bug.

This should be an easy fix, so I hope to have this fixed in this month’s release of q2-quality-control


(Candela) #5

Ok, now I get it. I am glad to here these, since I was going kind of obsessed with it. I wasn’t entirely sure about the numbering system thing.

I am attaching what you ask before but 100% is due to one observation in level 1 since the rest are doing ok.
expected-relative-frequency.qza (5.8 KB)
observed-relative-frequency.qza (6.4 KB)

The comands were the following:

biom convert -i expected-relative-frequency.txt -o expected-relative-frequency.biom --table-type="OTU table" --to-hdf5

biom convert -i observed-relative-frequency.txt -o observed-relative-frequency.biom --table-type="OTU table" --to-hdf5

qiime tools import --input-path expected-relative-frequency.biom --output-path expected-relative-frequency.qza --type ‘FeatureTable[RelativeFrequency]’

qiime tools import --input-path observed-relative-frequency.biom --output-path observed-relative-frequency.qza --type ‘FeatureTable[RelativeFrequency]’

qiime quality-control evaluate-composition --i-expected-features expected-relative-frequency.qza --i-observed-features observed-relative-frequency.qza --o-visualization mock-comparison.qzv

Thanks for the quick answer and for the efforts to find the problem.
I will be aware of the update of this month’s release of q2-quality-control.


(Candela) #6

mock-comparison.qzv (299.7 KB)
I am also attaching the final file for the visualization in qiime as I can attached everything in the same post.


(Nicholas Bokulich) #7

Thank you @Candela! I have used your files to test and confirm that my changes have fixed this bug; those changes will be available in this month’s release of QIIME 2 (version 2019.4). Here are the results:

mock-comparison.qzv (334.8 KB)


(Candela) #8

Thanks @Nicholas_Bokulich . Now results looks pretty good.