Very long read (16S--ITS--23S) Taxonomy Assignment

vrbana · March 8, 2022, 6:14pm

Hello, I have long read amplicon data (avg length 2,352 bp) that I've denoised with DADA2 in R. I know there are many other Q2 forum posts about PacBio data, but I haven't seen one for this size of amplicon that uses the 27F primer and the reverse primer is located in the 23S gene (spanning the entire ITS). From what I've seen on other forum posts, SILVA full length sequences use the 27F and 1492R primers. As I trimmed my primers prior to denoising, qiime feature-classifier extract-reads does not work when trying to use the 27F and 1492R primers (since the 27F primer is already trimmed). I can figure out a solution outside of Q2 to trim my rep seqs at only the 1492R, but I'm wondering if there is another suggested solution for assigning taxonomy to these reads since I'm throwing away information to only use the 16S gene.

Also, if anyone has a suggested method for tree building that they think would be most appropriate for this read length, it would be much appreciated! Thank you!

Keegan-Evans · March 14, 2022, 11:42pm

@vrbana,

There has been some behind-the-scenes discussion going on about your question, it looks like the short of it is that there may be something in the works, but at the moment we don't have a great answer. Hopefully someone who is a bit more knowledgable about this will hop on soon and fill you in

Nicholas_Bokulich · March 15, 2022, 6:30am

Hi @vrbana ,

Solution: The RESCRIPt plugin has a trim-alignment action that can trim sequences to a position in an alignment, including where a primer aligns to a set of aligned sequences (i.e., the primer only needs to align to one sequence in that alignment to find the position).

Sure, you can use a reference database consisting of full rrn operon sequences. We are working on an easy way to make such databases in QIIME 2, but this is not ready yet. For now, I think a few exist out there in the wild if you don't want to build your own.

First, this paper describes a bacterial rrn database which comes with QIIME 2-compatible files, and the use of this database with QIIME 2 (e.g., taxonomic classification with q2-feature-classifier). It looks like the paper uses this for bacterial ITS classification, but as far as I can tell the database itself if full-length rrn:

Also a few papers that describe other full rrn databases, but I have not used or checked these so do not know if they are compatible with QIIME 2 out of the box:

vrbana · March 15, 2022, 6:08pm

Thanks! I did successfully assign taxonomy using the full-length SILVA classifier and trimming reads outside of QIIME 2. I'll look into those other databases.

system · April 16, 2022, 12:08am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.