Edit the taxonomic classifications of query sequences

microb123 · August 10, 2022, 7:00am

I trained Silva database (138.1 Ref NR99) for V3-V4 using RESCRIPt and ended up with annotation like “g__uncultured; s__uncultured_bacterium”, “s__uncultured_organism”, “s__uncultured_bacterium” as shown in uploaded image. Can I remove those unassigned taxonomic level and keep only properly annotated level, for example up to family level of uploaded image as of sq1079?

Also, what those size value (262, 263,264) represent. Some has size of few thousands and some are in hundred.
Thank you

cherman2 · August 10, 2022, 4:28pm

Hello @microb123,

I am not quite sure what this screenshot is of, so more context on that would be helpful.

As for the “g__uncultured; s__uncultured_bacterium”, “s__uncultured_organism”, “s__uncultured_bacterium” labels. Those are technically saying something different than just cutting off at the family level.

If an annotation has only family level resolution then that means that the classifier only has confidence to the family level. Where the the ones that have “g__uncultured; s__uncultured_bacterium”, “s__uncultured_organism”, “s__uncultured_bacterium” labels have resolution to go down to the species or genus level but the database doesn't have actual names for the genus or species in it. if that makes sense.

As for the numbers that seems like information from Silva about the sequences. Maybe sequences length? But I am not that sure about that.

However if you are not doing anything special to the Silva classifier (I.e. creating a weighted classifier) I would encourage you to use our pre-trained classifier: Data resources — QIIME 2 2022.2.0 documentation

Hope that helps!

cherman2 · August 10, 2022, 7:43pm

Another Moderator just pointed out that you are V3-V4 Classifier so you can not use the pre built classifiers.

microb123 · August 10, 2022, 9:47pm

Hi @cherman2, Thank you for quick reply and explanation regarding uncultured annotation.

Sorry my bad. I just uploaded the part of the qiiewView.

Ohh that's makes a lot of sense. Thank you for the explanation. The reason I was trying to cut off at the family is when I used another approach of taxonomy assignment like of DADA2 it returned with "NA" on those "uncultured" assigned taxa.

Now as from your explanation it is not the good idea to remove them but rather assigning "NA" might be better option. Can I assigned "NA" to those "uncultured taxa"? What's your thought?

Thank you so much for you time and support.

cherman2 · August 10, 2022, 11:10pm

Hey @microb123

If you want to create your own term for the "uncultured taxa", that is perfectly fine. NA is fine. I might use the term unknown as opposed to NA because the label is applicable to the sequences its just not known but that is just my personal opinion.

There is one issue with how you are using NA.
So for the first sequence in the top photo it only has genus resolution. However the first sequence in your bottom photo has NA for the genus and the species level. That's losing the resolution, because now it looks like you have species resolution for a sequences that you only have genus level resolution for. So I would come up with a way that differentiates between the level being not named and the resolution not being strong enough.

Here is an example of what I am thinking:
If there is resolution but you dont know the name call it unknown and if there isn't resolution call it NA

or if there is resolution but you dont know the name call it NA but then leave the cell blank if there is not enough resolution.
Screen Shot 2022-08-10 at 4.08.12 PM

Hope that helps!

cherman2 · August 11, 2022, 3:14pm

Hi again @microb123 ,
I changed the title of this discussion so that it is a little bit more accurate to your question.

This will allow users who have the same question find this thread easier!

microb123 · August 11, 2022, 5:09pm

Thank you so much @cherman2 !!!

system · September 11, 2022, 11:09pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.