How Do I Recover 18s Reads after Filtering in Qiime 2?

AlfalfaResearcher · May 17, 2018, 9:27pm

Hello Everyone,
I am new to Qiime and to this forum, so I hope I am posting this in the right place. My project involves using two different primer sets to amplify two different regions of the 16s gene. One primer set comes from Jed Fuhrman's paper Every Base Matters. This primer set amplifies the 515 to 926 region of the 16s gene and are degenerate. The other primer set is from Illumina and amplifies a different region of the 16s gene and is also slightly degenerate. The Fuhrman primers are more universal and amplify both the 16s and 18s genes. I hope to find a way to filter out the fungal 18s reads and analyze them separately from the bacterial 16s reads. Is there a way to make two separate taxonomies with one being only fungi and the other being only bacteria? The samples were run on an Illumina MiSeq and are demultiplexed and paired-end. Thanks so much!

Nicholas_Bokulich · May 18, 2018, 3:36pm

Welcome, you have definitely come to the right place

You have a couple different options, depending on whether you want to separate before or after taxonomy classification:

analyze together through taxonomy classification. Use the SILVA 16S + 18S database for classification (we have a pre-trained classifier for full-length SSU with this database, or you can train your own with your primers). Subsequently, you can split your sequences and feature tables into separate tables/sequences for bacteria/archaea and for eukaryota, or whatever taxonomic groups you are interested in, but using qiime taxa filter-table.
After denoising/otu picking but prior to downstream steps (e.g., taxonomy classification) use qiime quality-control exclude-seqs to split your sequences into two different sets based on alignment to a reference sequence set (e.g., 18S sequences) within a certain percent similarity. I do not know what % similarity would be suitable for differentiating 16S and 18S sequences. If you know of a non-degenerate form of your primers (and primer sequences are included in your reads), or an internal sequence within your amplicon that easily differentiates 16S from 18S, you could use exclude-seqs with the blastn-short method, which is optimized for searching for very short sequence alignments, e.g, with a very high % similarity to differentiate these groups.

That should do it! I personally might go the former route, if there aren't concerns about the reference taxonomy, as it would be a little more streamlined and explicitly identify sequences that cannot be classified as 16S OR 18S. You could always use that approach to split seqs/tables, and then reclassify those sequences with a different reference database or taxonomy classifier if for some reason you suspect the 16S + 18S classifier would give inadequate results (I have no evidence that it would) or if you have a favorite database that you want to use.

I hope that helps!

AlfalfaResearcher · May 18, 2018, 6:53pm

Thank you so much @Nicholas_Bokulich! I will try out the first method and let you know how it goes.

Update: The first method worked really well! Thank you so much!