I have some 16S data, and encountered a problem while trying to build an insertion tree using the qiime fragment-insertion sepp function. Although I know that I have ~18000 unique ASVs, when I tried to build a tree from my representative sequences (filtered to the ASV level and trained on silva-132), I get an insertion tree that only has 196 taxa. My initial thought was that maybe my sequences are just incredibly different from what is found in the silva database-- I know the sepp step gets rid of sequences that are too divergent. But I compared this to a greengenes insertion tree generated by Qiita, and it only had 193 taxa. I’m confident that my sequences are not so novel that over 17,000 of them are less than 70% similar to the databases.
Hi! What is the source of your samples?
A lot of ASVs may be assigned to the same taxa. The ratio of ASVs to resulted taxa is higher, for example, than OTUs (97%) to taxa.
Hi @slgoldman,
Just to clarify a few things. Can you clarify what you mean by it only had 196 taxa, where are you getting this value from exactly? Could you share the resulting artifact from this with us?
It might also be worth double-checking that your features don’t have any lingering non-biological reads attached to them as these would maybe cause them to diverge enough from the references to be un-insertable.