Hi
I used the plugin q2-fragment-insertion with QIIME 2 version 2018.11.0.
By mistake, I used 1120 non-16S seqs from amplicon sequencing as input, but they were all inserted into the default reference tree (Greengenes 13_8 at 99%).
Here's the command I used:
qiime fragment-insertion sepp
--i-representative-sequences groEL.qza
--o-tree groEL_tree.qza
--o-placements groEL_placements.qza
Take one of the non-16S seqs I used for example.
Zotu133
GAAGCTGCAGGAGCGTCTGGCCAAGCTGGCTGGCGGCGTGGCTGTCATCAAGGTCGGCGCTGCCACCGAGGTCGAGGCCAAGGAGCGCAAGCACCGCATCGAAGATGCCGTGCGTAACGCCAAGGCCGCCATCGAGGAAGGCCTGCTGCCTGGCGGTGGCGTGGCCCTCGTTCAGGCTGCTGCCAAGGCCGAGAAGACCGAGGCCGTCACCTCCCTGACCGGCGAAGAGGCTACCGGTGCCGCCATCGTGTTCCGCGCCATCGAGGCCCCGATCAAGCAGATCGCCGAGAACGCCGGCGTGTCCGGTGACGTGGTCATCAACACCGTCCGCTCCCTGCCTGATGGCGAAGGCTTCAACGCCGCCACCGACACCTACGAAGACCTGCTGGCCGCCGGTGTGACCGACCCGGTCAAGGTGACCCGCTCCGCTCTGCAGAACGCCGCCTCCATCGCTGGTCT
Blast shows that it is the groEL gene of Bifidobacterium longum, but it was inserted into the tree, close to the following two gg rep seqs:
4345221 k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria; o__Sphingomonadales; f__Sphingomonadaceae; g__Kaistobacter; s__
806208 k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria; o__Sphingomonadales; f__Sphingomonadaceae; g__Kaistobacter; s__
Aligning this non-16S seq to the two gg rep seqs above, I got only 2% query cover (i.e. 11 bp out of 450 bp aligned).
How did this happen? I thought SEPP would reject the insertion of such non-16S seqs.
If SEPP does insert some fragments that are very remotely related to everything in the reference phylogeny, any suggestions on how to use SEPP in a better way?
Thanks!