Taxonomy assignment using Silva 138

SoilRotifer · March 30, 2023, 7:33pm

Potentially. After extracting your amplicon region did you dereplicate the data as outlined here? If not, this might be partly responsible the excessive memory despite using the extracted amplicon region. That is, many reference sequences will become identical over the smaller span of the amplicon region. Dereplication will remove any redundant sequences with identical taxonomy, which can be substantial. You can also perform other optional filtering of your reference data too.

Alternatively, you may want to consider using feature-classifier classify-consensus-vsearch, just simply use the base sequence and taxonomy files that you'd train your classifier with, as your reference data. Otherwise you may need to find a machine with more RAM.