I can't really say, it is largely dependent on the sequence lengths present within the SILVA database. You can also simply filter everything to the same length too via filter-seqs-length
. Also, one option is to not filter based on length at all until after you've extracted your amplicon region first. That is, you may remove short "full-length" sequences which actually contain your complete amplicon region. We allude to this in the tutorial:
As for your next question...
The SILVA 138 LSU was not available at the time we developed this code. However, I see that SILVA now has provided LSU 138 data for us to pull down. We'll work on updating get-silva-data
. Note: this pipeline simply wraps all of the steps outlined in the 'hard way' approach. The steps outline under the "hard way", is essentially our "escape hatch" for cases like this.
It seems like the have two folders: 138_1
and 138.1
.
I think you'd want to go to the 138.1 folder and download the following:
- taxonomy/tax_slv_lsu_138.1.txt.gz
- taxonomy/taxmap_slv_lsu_ref_nr_138.1.txt.gz
- taxonomy/tax_slv_lsu_138.1.tre.gz
- SILVA_138.1_LSURef_NR99_tax_silva_trunc.fasta.gz
These match the file naming schema outlined in the tutorial.
Hopefully, there are no formatting issues with these SILVA 138.1 LSU files. Please let us know how it goes.
-Mike