Creating a V3 classifier killed.

Hello Mike Robeson, I am having this error message whilst trying to use the RESCRIPt to create a V3 classifier.

qiime feature-classifier fit-classifier-naive-bayes \

--i-reference-reads silva-138-ssu-nr99-seqs-388f-518r-uniq.qza \
--i-reference-taxonomy silva-138-ssu-nr99-tax-388f-518r-derep-uniq.qza \
--o-classifier silva-138-ssu-nr99-388f-518r-classifier.qza

Killed

Could be related to my computing power?

Yes. It is likely you do not have enough memory on your system. In my experience it is not unusual to require anywhere from 24 - 64 GB RAM. If you do not have access to a machine with more memory you can try using alternative means of classification through the feature-classifier plugin.

You can shrink the memory footprint a couple of ways:

  1. Remove / Filter taxonomic groups you are not interested in (e.g. Eukaryota) prior to making your classifier.
    • You can use qiime taxa filter-seqs for this.
  2. Do not include the organism name a the species label (i.e. do not use the --p-include-species-labels when constructing your SILVA reference database). These may not be reliable anyway, as SILVA only curates down to genus-level.
1 Like

Thank you for your quick suggestions.

Pertaining to the removal of Eukaryota and also Archaea
Can I use the filter-seqs-length-by-taxon plugin?

For example
qiime rescript filter-seqs-length-by-taxon
–i-sequences silva-138-ssu-nr99-seqs-cleaned.qza
–i-taxonomy silva-138-ssu-nr99-tax.qza
–p-labels Bacteria
–p-min-lens 1200
–o-filtered-seqs silva-138-ssu-nr99-seqs-filt.qza
–o-discarded-seqs silva-138-ssu-nr99-seqs-discard.qza

This script is intended to filter reference sequences based on the length of that sequence given the taxonomy provided, and leave the rest of the data as is. That is, your command would only filter Bacteria based on the length you specified, all other data will remain in your final output file. This is why I recommended qiime taxa filter-seqs.

If you would like to use qiime rescript filter-seqs-length-by-taxon, then you could use the following command by just setting impossibly high sequence lengths for the other taxonomic groups. But this would end up being much slower than the command I recommended as it will have to check the taxonomy of each entry, and then check the sequence length before it makes the decision to filter.

qiime rescript filter-seqs-length-by-taxon \ 
    –i-sequences silva-138-ssu-nr99-seqs-cleaned.qza \
    –i-taxonomy silva-138-ssu-nr99-tax.qza \
    –p-labels Bacteria Archaea Eukaryota \
    –p-min-lens 1200 9999 9999 \
    –o-filtered-seqs silva-138-ssu-nr99-seqs-filt.qza \
    –o-discarded-seqs silva-138-ssu-nr99-seqs-discard.qza

thanks again for your comment.

Please I will like to use the qiime taxa filter-seq plugin, however, I am not too familier with that command.

Could please provide me with some tips?

Also which step should I run the qiime taxa filter-seqs . scrpit?

  1. qiime rescript parse-silva-taxonomy
    –i-taxonomy-tree taxtree-silva-138-nr99.qza
    –i-taxonomy-map taxmap-silva-138-ssu-nr99.qza
    –i-taxonomy-ranks taxranks-silva-138-ssu-nr99.qza
    –o-taxonomy silva-138-ssu-nr99-tax.qza

  2. qiime rescript cull-seqs
    –i-sequences silva-138-ssu-nr99-seqs.qza
    –o-clean-sequences silva-138-ssu-nr99-seqs-cleaned.qza

  3. qiime rescript filter-seqs-length-by-taxon
    –i-sequences silva-138-ssu-nr99-seqs-cleaned.qza
    –i-taxonomy silva-138-ssu-nr99-tax.qza
    –p-labels Archaea Bacteria Eukaryota
    –p-min-lens 900 1200 1400
    –o-filtered-seqs silva-138-ssu-nr99-seqs-filt.qza
    –o-discarded-seqs silva-138-ssu-nr99-seqs-discard.qza

  4. qiime rescript dereplicate
    –i-sequences silva-138-ssu-nr99-seqs-filt.qza
    –i-taxa silva-138-ssu-nr99-tax.qza
    –p-rank-handles ‘silva’
    –p-mode ‘uniq’
    –o-dereplicated-sequences silva-138-ssu-nr99-seqs-derep-uniq.qza
    –o-dereplicated-taxa silva-138-ssu-nr99-tax-derep-uniq.qza

  5. qiime feature-classifier extract-reads
    –i-sequences silva-138-ssu-nr99-seqs-derep-uniq.qza
    –p-f-primer ACWCCTACGGGWGGCAGCAG
    –p-r-primer ATTACCGCGGCTGCTGG
    –p-n-jobs 2
    –p-read-orientation ‘forward’
    –o-reads silva-138-ssu-nr99-seqs-388f-518r.qza

  6. qiime rescript dereplicate
    –i-sequences silva-138-ssu-nr99-seqs-388f-518r.qza
    –i-taxa silva-138-ssu-nr99-tax-derep-uniq.qza
    –p-rank-handles ‘silva’
    –p-mode ‘uniq’
    –o-dereplicated-sequences silva-138-ssu-nr99-seqs-388f-518r-uniq.qza
    –o-dereplicated-taxa silva-138-ssu-nr99-tax-388f-518r-derep-uniq.qza

  7. qiime feature-classifier fit-classifier-naive-bayes
    –i-reference-reads silva-138-ssu-nr99-seqs-388f-518r-uniq.qza
    –i-reference-taxonomy silva-138-ssu-nr99-tax-388f-518r-derep-uniq.qza
    –o-classifier silva-138-ssu-nr99-388f-518r-classifier.qza

The easiest approach, as I suggested earlier, would be to do this just prior to making your classifier. This way you do not have to re-run several steps. So, do this between steps 6 and 7.

Additionally, you can check out the QIIME 2 documentation, and work through the tutorials.

You can view a list of all available QIIME 2 plugins and actions. Here is the information for the qiime taxa filter-seqs command. Do not forget you can add --help to any of the qiime commands to read the help text.

Thank you very much for your suggestions. I got it to work

1 Like