I'm having some issues to obtain the plot score for my mock samples composition comparing observed against expected. The thing is that the qiime2 analysis (step assign taxonomy) are done against SILVA database who has this aspect when you get a taxa for a feature:
where 0 is domain, 1 is phylum, 2 is class, 3 is order, 4 is family, 5 is genus and 6 is specie
In qiime2, level 1 is domain, level 2 is phylum, 3 is class, 4 is order, 5 is family, 6 is genus and 7 is specie.
So in my score plot, level 1 in qiime2 that it's D_0__Bacteria in SILVA isn't counting (see score plot uploaded --> cause R-value as also p-value are cero but the rest is.
I don't know if I not importing a good txt or what is happening is a semantic problem between the database format and qiime2 nomenclature.
This numbering system is not the cause of this issue. That is just the numbering on that score plot — this numbering system is not canonical to QIIME 2, and QIIME 2 (and the evaluate-composition action) does not actually use the numbering information in your taxonomy… levels 1-7 are just the ranks as they appear in your taxonomy files.
Your expected and observed taxonomies use the same syntax (SILVA format) and that is all that matters here.
Your results actually look quite good and accurate if you ignore the R2 value at level 1. The other R2 values, as well as TAR and TDR, look like they are probably correct.
I have raised an issue to investigate the bug that is causing R2 values at level 1 to be miscalculated, and hope to get that fixed in time for this month’s release of QIIME 2.
Until then, I recommend not using the R2 values (at least not at level 1) — use TAR and TDR instead, which all look correct — or calculate R2 independently to confirm the correct values at each level.
Just for more context, it looks like this bug occurred here because scipy’s linear regression function (which is being used to calculate R here) fails if there is only one measurement each in the expected and observed — which is the case in your data since you only have one observation in each sample at domain level: bacteria.
The good news is that means that the other measurements are valid, it really is only R and R2 values at level 1 (where you have only bacteria) that is affected by this bug.
This should be an easy fix, so I hope to have this fixed in this month’s release of q2-quality-control
Thank you @Candela! I have used your files to test and confirm that my changes have fixed this bug; those changes will be available in this month's release of QIIME 2 (version 2019.4). Here are the results: