Weird Taxonomic classification 28S rRNA

SoilRotifer · February 12, 2021, 4:23pm

I can't really say, it is largely dependent on the sequence lengths present within the SILVA database. You can also simply filter everything to the same length too via filter-seqs-length. Also, one option is to not filter based on length at all until after you've extracted your amplicon region first. That is, you may remove short "full-length" sequences which actually contain your complete amplicon region. We allude to this in the tutorial:

Processing, filtering, and evaluating the SILVA database (and other reference sequence data) with RESCRIPt

qiime feature-classifier extract-reads \
    --i-sequences silva-138-ssu-nr99-seqs-derep-uniq.qza \
    --p-f-primer GTGYCAGCMGCCGCGGTAA \
    --p-r-primer GGACTACNVGGGTWTCTAAT \
    --p-n-jobs 2 \
    --p-read-orientation 'forward' \
    --o-reads silva-138-ssu-nr99-seqs-515f-806r.qza
Note / Tip : Depending on your goals, it may also be reasonable to use the raw imported sequences, or output from either cull-seqs or filter-seqs-length[-by-taxon] as input into the above extract-reads command. Be aware that you may need to run reverse-transcribe prior to other steps outlined in this tutorial.

As for your next question...

The SILVA 138 LSU was not available at the time we developed this code. However, I see that SILVA now has provided LSU 138 data for us to pull down. We'll work on updating get-silva-data. Note: this pipeline simply wraps all of the steps outlined in the 'hard way' approach. The steps outline under the "hard way", is essentially our "escape hatch" for cases like this.

It seems like the have two folders: 138_1 and 138.1.
I think you'd want to go to the 138.1 folder and download the following:

These match the file naming schema outlined in the tutorial.

Hopefully, there are no formatting issues with these SILVA 138.1 LSU files. Please let us know how it goes.

-Mike