Missing unclassified taxa after fragment insertion.

Hi
I need some clarification on q2-fragment insertion, I have 50 microbiome samples belongs to five species (different primers). Initially, I analyzed the individual species sequence files (same primer set) separately and classify with Greengene references, then I merge table and seq (different primer) and run fragment insertion and filter the table with insertion tree and again I classified the sequence with ref-gg-99-taxonomy.
My question is when I run the individual sample it shows the proportion of unclassified taxa in the taxonomy file, But the proportion of unclassified sequence after fragment insertion and classification with ref-gg-99-taxonomy were missing.
So please help me in this regard.
This may be my mistake of using ref-gg-99-taxonomy or?

Sequences are unclassified because they do not resemble anything in the reference database. Sometimes this is due to human error and sometimes this could be because you have a genuinely novel organism! But in the majority of cases this is because the sequences are junk or non-target DNA (e.g., host DNA) and are left unclassified because they really just do not belong.

Similarly, fragment insertion will remove sequences that cannot be inserted into the reference tree… such failure is due to the same reasons as unclassification.

So I would recommend just moving forward without these unclassified/un-inserted sequences.

@Nicholas_Bokulich Thank you
But I doubt after insertion fragment no sequence was removed, them how come the unclassified sequence was removed. Maybe they classified after insertion step :roll_eyes:

None of the classifiers in q2-feature-classifier generate "random" results or give different answers each time you run them, unless if you change the reference sequences... or if there is something wrong with the query sequences, e.g., they are in mixed orientations. This topic links to a few different discussions of taxonomy and unclassification issues, you could browse those if you think that variable classification is the issue.

Evidently something was removed — why do you doubt it? I recommend starting there to verify whether or not any sequences failed to insert with fragment-insertion.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.