Hi,
I am following the information provided here (Introducing Greengenes2 2022.10) to classify by data with Greengenes2. I have used the non-V4 region commands as I have forward and reverse reads. I ran denoise with the default parameters and there were ~ 6,000 features.
I used the Greengenes2 backbone full length file to map my representative sequences from the denoise output. I then classified the taxonomy (all with the commands in the Greengenes2 announcement post).
However, I looked at the FeatureData[Taxonomy] output file there were only 106 features and they all had a confidence of 1.
I thought I had ~ 6000 features so how have they disappeared down to ~100? And why is the confidence 1 for all of them?
The non-v4-16s action is a thin wrapper around q2-vsearch's closed reference OTU picking. In that mode, the ASVs are collapsed to the backbone features.
The taxonomy of the backbone is invariant, however the semantic type for FeatureData[Taxonomy] requires (if I recall correctly) a column called confidence. As a work around, and because the taxonomy of the backbone is fixed, we set the value to 1.0. We tentatively could express the confidence based on the underlying mappings, however as far as I know, the mappings from q2-vsearch are not presently exposed, and I believe related to this issue.
That makes perfect sense thank you. I knew the non-v4-16s action was using closed reference OTU picking but it didn't click in my head.
Just to check I have this right, the ~6000 features have been collapsed into the taxonomy backbone? I can still look at each taxonomic level? Sorry if that's a dumb question.
I'm going to pull the ASV count data out of will that be possible with the FeatureData[Taxonomy] table if its the taxonomy is collapsed?
Thanks
The taxonomic annotations of the OTUs are what they are. The mapping of ASV to OTU is not exposed in the q2-vsearch plugin yet. It would not be unreasonable to use the OTU-based table for diversity analyses, and to use the taxonomy associated with the OTUs (at least to genus). It would also be fine to apply Naive Bayes classification for the ASVs.