I trained Silva database (138.1 Ref NR99) for V3-V4 using RESCRIPt and ended up with annotation like “g__uncultured; s__uncultured_bacterium”, “s__uncultured_organism”, “s__uncultured_bacterium” as shown in uploaded image. Can I remove those unassigned taxonomic level and keep only properly annotated level, for example up to family level of uploaded image as of sq1079?
Also, what those size value (262, 263,264) represent. Some has size of few thousands and some are in hundred.
Thank you
I am not quite sure what this screenshot is of, so more context on that would be helpful.
As for the “g__uncultured; s__uncultured_bacterium”, “s__uncultured_organism”, “s__uncultured_bacterium” labels. Those are technically saying something different than just cutting off at the family level.
If an annotation has only family level resolution then that means that the classifier only has confidence to the family level. Where the the ones that have “g__uncultured; s__uncultured_bacterium”, “s__uncultured_organism”, “s__uncultured_bacterium” labels have resolution to go down to the species or genus level but the database doesn't have actual names for the genus or species in it. if that makes sense.
As for the numbers that seems like information from Silva about the sequences. Maybe sequences length? But I am not that sure about that.
However if you are not doing anything special to the Silva classifier (I.e. creating a weighted classifier) I would encourage you to use our pre-trained classifier: Data resources — QIIME 2 2022.2.0 documentation
Ohh that's makes a lot of sense. Thank you for the explanation. The reason I was trying to cut off at the family is when I used another approach of taxonomy assignment like of DADA2 it returned with "NA" on those "uncultured" assigned taxa.
Now as from your explanation it is not the good idea to remove them but rather assigning "NA" might be better option. Can I assigned "NA" to those "uncultured taxa"? What's your thought?
If you want to create your own term for the "uncultured taxa", that is perfectly fine. NA is fine. I might use the term unknown as opposed to NA because the label is applicable to the sequences its just not known but that is just my personal opinion.
There is one issue with how you are using NA.
So for the first sequence in the top photo it only has genus resolution. However the first sequence in your bottom photo has NA for the genus and the species level. That's losing the resolution, because now it looks like you have species resolution for a sequences that you only have genus level resolution for. So I would come up with a way that differentiates between the level being not named and the resolution not being strong enough.
Here is an example of what I am thinking:
If there is resolution but you dont know the name call it unknown and if there isn't resolution call it NA
or if there is resolution but you dont know the name call it NA but then leave the cell blank if there is not enough resolution.