Non-16S seqs inserted into gg13_8 ref tree by SEPP

Hi

I used the plugin q2-fragment-insertion with QIIME 2 version 2018.11.0.

By mistake, I used 1120 non-16S seqs from amplicon sequencing as input, but they were all inserted into the default reference tree (Greengenes 13_8 at 99%).

Here’s the command I used:

qiime fragment-insertion sepp
–i-representative-sequences groEL.qza
–o-tree groEL_tree.qza
–o-placements groEL_placements.qza

Take one of the non-16S seqs I used for example.

Zotu133
GAAGCTGCAGGAGCGTCTGGCCAAGCTGGCTGGCGGCGTGGCTGTCATCAAGGTCGGCGCTGCCACCGAGGTCGAGGCCAAGGAGCGCAAGCACCGCATCGAAGATGCCGTGCGTAACGCCAAGGCCGCCATCGAGGAAGGCCTGCTGCCTGGCGGTGGCGTGGCCCTCGTTCAGGCTGCTGCCAAGGCCGAGAAGACCGAGGCCGTCACCTCCCTGACCGGCGAAGAGGCTACCGGTGCCGCCATCGTGTTCCGCGCCATCGAGGCCCCGATCAAGCAGATCGCCGAGAACGCCGGCGTGTCCGGTGACGTGGTCATCAACACCGTCCGCTCCCTGCCTGATGGCGAAGGCTTCAACGCCGCCACCGACACCTACGAAGACCTGCTGGCCGCCGGTGTGACCGACCCGGTCAAGGTGACCCGCTCCGCTCTGCAGAACGCCGCCTCCATCGCTGGTCT

Blast shows that it is the groEL gene of Bifidobacterium longum, but it was inserted into the tree, close to the following two gg rep seqs:

4345221 k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria; o__Sphingomonadales; f__Sphingomonadaceae; g__Kaistobacter; s__

806208 k__Bacteria; p__Proteobacteria; c__Alphaproteobacteria; o__Sphingomonadales; f__Sphingomonadaceae; g__Kaistobacter; s__

Aligning this non-16S seq to the two gg rep seqs above, I got only 2% query cover (i.e. 11 bp out of 450 bp aligned).

How did this happen? I thought SEPP would reject the insertion of such non-16S seqs.
If SEPP does insert some fragments that are very remotely related to everything in the reference phylogeny, any suggestions on how to use SEPP in a better way?

Thanks!

1 Like

pinging @Stefan
:qiime2:

let me try to reproduce your observation with your example sequence. However, I think Siavash (main developer of SEPP) might be the more knowledgeable person to contact https://github.com/smirarab/sepp if this is really a mis-configuration of SEPP itself.

However, SEPP operations under the assumption that sequences need to be put somewhere into the tree and it “only” tries to quickly determine the best fit. It is kind of abuse if you try to filter input sequences for non 16S. There are for sure better suitable programs for that.

Also in my hands, SEPP produces exactly your observed wrong insertion. I opened an issue on github to bring Siavash’s attention to it.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.