Greengenes2 feature in table not found in tree during beta-rarefaction

Good afternoon! I am using Qiime2 in a conda env and ran the beta-rarefaction plug-in:

qiime diversity beta-rarefaction
--i-table OTU_filtdecontamfilt_long_baseline.qza
--i-phylogeny 2022.10.phylogeny.asv.nwk.qza
--p-metric weighted_normalized_unifrac
--p-clustering-method upgma
--m-metadata-file HMBpaper_metadata_longitudinalsubset.txt
--p-sampling-depth 100
--p-iterations 100
--p-correlation-method spearman
--p-color-scheme RdGy_r
--o-visualization JK_w_norm_unif_long.qzv
--output-dir Jackknife_betaWNU_longitudinal
--verbose

I would get the error message:
Plugin error from diversity:

Command '['ssu', '-i', '/var/folders/ff/1kk8pp_x6xq9jk3pkmnty9w40000gn/T/qiime2/marlydmejia/data/8332f6cf-72d4-4eca-a8ef-ee85a4516a31/data/feature-table.biom', '-t', '/var/folders/ff/1kk8pp_x6xq9jk3pkmnty9w40000gn/T/qiime2/marlydmejia/data/bc681fc9-3a54-48b7-9143-434a11a5a94c/data/tree.nwk', '-m', 'weighted_normalized', '-o', '/var/folders/ff/1kk8pp_x6xq9jk3pkmnty9w40000gn/T/q2-LSMatFormat-rtnbsu1n']' returned non-zero exit status 1.

Using verbose, we also saw:

Compute failed in one_off: Table observation IDs are not a subset of the tree tips. This error can also be triggered if a node name contains a single quote (this is unlikely).
Traceback (most recent call last):

The table above had been feature-filtered (also exported and re-imported into qiime), so we tried running with an un-exported, minimally filtered table and still got the error. We then ran a smaller sample-filtered table (4 samples) that went through the process. When we opened the tree and saw many single-quotes, so we left that alone and pursued computational or mismatched causes of error. We tried using "export UNIFRAC_USE_GPU=N" and another computer to no avail. Finally, we found that there was a mismatch in features where the ID VOUX01000001 was not in the tree and therefore caused an error when I used the weighted_normalized_unifrac method. I had created my taxonomy file out of my table using:

qiime greengenes2 taxonomy-from-table
--i-reference-taxonomy 2022.10.taxonomy.md5.nwk.qza
--i-table mergedstudy_feattable_gg2.qza
--o-classification mergedstudy.gg2.tabletaxonomymd5.qza

and, upon exporting my taxonomy into a tsv, saw that the feature ID corresponded to an unspecific call.

It made sense in my mind that it would not by in the tree. Once I removed this ID found in 10 samples at less than 0.1 frequency, the phylogenetic tree could be used for generation of the appropriate output files. It was odd because I had previously run the "qiime diversity core-metrics-phylogenetic" with the 2022.10.phylogeny.asv.nwk.qza and got a bunch of output files where distances ranged 0 - 400s. *I actually started with the 2022.10.phylogeny.asv.nwk.qza in beta-rarefaction and then thought it wasn't working because it didn't match the md5 taxonomy file I used. Switching between phylogeny.asv or .md5 still gave me the error.

Is this something that can be fixed in taxonomy so that it doesn't lead to this mismatching? Or can it be stated as a warning for users to remove calls that would not end up on a tree?

Thank you so much for your time!
Marlyd

Hello @MarlydEMejia. You mentioned that the data was exported then re-imported into QIIME 2. What exactly happened there? Why was it exported then re-imported, and was anything done to it prior to it being re-imported?

Hi @Oddant1 , thank you for the reply! Yes, things were definitely altered. I exported it because I used the R package for Decontam and had to remove some features. It re-imported okay and is something I had done previously. The feature IDs were all maintained- we had cross-checked the new feature-table with the taxonomy file (which was created from the original un-imported feature-table) and everything matched. Said output taxonomy file is what is shown in the previous message. So it isn't that it got corrupted and led to an incorrect taxonomy output downstream (we thought that might have been the issue but it turned out okay).

The thing is, opening the 2022.10.taxonomy.md5.nwk.qza and the asv.nwk.qza version (not shown), we still saw VOUX01000001 had no known taxonomy. The only mismatch was between the phylogenetic tree and all the taxonomy files...

Interesting, that taxon is present in the .tsv.gz version

but not in the taxonomy or phylogeny qzas


I don't have an answer for why that is at this time.

Hi @MarlydEMejia,

Thank you for the inquiry and I apologize for the issue. It looks like you've stumbled on an edge case. Specifically, the record "VOUX01000001" comes from the Living Tree Project which is actively curated. Our processing assumption is the records are high quality 16S. Genbank suggests it may not be a great record, as from the PGAP report, it does not have complete 16S.

Looking closer, the 16S included is only 451nt in length, far shorter than what is often considered full length (1200nt). That length placed it below our arbitrary length threshold used to differentiate ASVs, leading to being handled differently within MD5 and ASV artifacts. The record technically is in the phylogeny and taxonomy but it's under and arbitrary identifier.

This is a bug. I've opened an issue to address it.

As for a resolution, I'm leaning on suggesting filtering out VOUX01000001 from the feature table as it is an unexpectedly low quality 16S. Would that be detrimental to the analysis or interpretation?

Best,
Daniel

1 Like

Hi Daniel,

Thank you for this explanation! I'm glad it wasn't all in my head or some unknown/undetermined error I made. And thank you for making steps to address it.

I had decided to filter it out, it was about at most 0.4% of reads per sample and in about only 6/400+ samples. I'm glad that is your recommendation, as well. So no, not detrimental and all our conclusions (non-phylogenetic, since we don't have a pre-filtering phylogenetic analysis) remained the same before and after filtering it out.

Again, thank you for your help.
Marlyd

1 Like

That's great, @MarlydEMejia!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.