quality control evaluate taxonomy - outputs blank file

ahfitzpa · March 25, 2021, 12:18pm

I am running the quality control plugin on 12 custom mock community samples for norovirus. (qiime2/2021.2 on HPC cluster). When I run the following

qiime quality-control evaluate-taxonomy
--i-expected-taxa ../output/classified_expected_taxonomy.qza
--i-observed-taxa ../output/decontam_taxonomy.qza
--i-feature-table ../output/rel_decontam_dada2.qza
--p-depth 7
--p-no-require-exp-ids
--p-no-require-obs-ids
--verbose
--o-visualization ../output/evaluate-taxonomy.qzv

I obtained the blank evaluate-taxonomy.qzv file attached. I have tried to include a sample id for extracting frequency data and obtain the following error 'Cannot retrieve an element from an empty/null table' . I have viewed the input files in R using qiimeR. the taxonomy files and relative feature table are not empty and contain the overlapping features. I have tried running the same command above using the collapsed relative frequency table but obtained similar output and ran at various depths. I have applied evaluate-seqs and evaluate-composition with some of the same input files and obtained expected outputs.

Current process cutadapt-demux-dada2 denoise-feature classifier skearn-decontam in R- filter rep-seq, dada2 output, taxonomy by ids of contaminants decontam- quality control plugin.

I would appreciate any guidance on how to make this feature work!

evaluate-taxonomy.qzv (310.5 KB) classified_expected_taxonomy.qza (56.8 KB) rel_decontam_dada2.qza (39.2 KB) rel_decontam_dada2_collapsed.qza (102.8 KB) decontam_taxonomy.qza (80.2 KB)

Thanks,

Amy

Nicholas_Bokulich · March 25, 2021, 12:31pm

Hi @ahfitzpa ,

Sounds very cool!

The output is not actually blank; you are actually scoring 0 accuracy for all metrics/levels.

This is because the feature IDs do not align between your observed and expected taxonomies.

This method is most useful when you have a list of features with known taxonomies (e.g., simulated data), but your data are from a mock community so you know which taxonomies to expect, but only on a community-wide basis, not for individual sequences.

With that in mind, I think you might want this method instead:
https://docs.qiime2.org/2021.2/plugins/available/quality-control/evaluate-composition/

See also the brief example here:
https://docs.qiime2.org/2021.2/tutorials/quality-control/#evaluating-quality-of-samples-with-known-composition

Hopefully that gets you on the right track! Please let us know if you have more questions.

ahfitzpa · March 25, 2021, 1:00pm

Thank you for such a fast reply. I have used the evaluated composition for the same data and it's incredibly useful, especially for method validation work (so thank you to the creators).

For evaluate taxonomy the ideas is that the feature_ids match? I think I still need some time/coffee or an example to get my head around the features purpose. Perhaps an addition to the tutorial at a future point.

All the best,

Amy

Nicholas_Bokulich · March 25, 2021, 2:14pm

Yes! It is measuring accuracy (as precision, recall, and F-measure) by averaging across many features. The idea is you have a set of input sequences (e.g., simulated sequences or sequences taken from known species). You classify them. Then you compare the expected vs. observed taxonomy for each of these. The expected and observed taxonomies must map to the same feature IDs to perform this comparison. The feature table is just provided if you want to weight these scores using some abundance information.

Yes sorry that would be useful (this method was added after that tutorial was written, and looks like it was not updated)... I will put it on the to-do list

system · April 25, 2021, 8:15pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.