Naive Bayes classifiers trained on V3-V4 region for 16S and 18S data for a newest version of qiime2 (qiime2-2020.11)

I am looking for a help to find or create a Naive Bayes classifier trained on V3-V4 region for 16S and 18S data for a newest version of qiime2 (qiime2-2020.11). Greatly appreciate your help!! Thank you in advance.

Hello @Angela1971,

There is a tutorial for training classifiers, although perhaps you have discovered that already.
https://docs.qiime2.org/2020.11/tutorials/feature-classifier/

Let us know if you have any questions related to that tutorial, or if any of the steps do not work well for your database.

Colin

2 Likes

Thank you, Colin! I have tried to create and train my own classifier using Silva132 database. However, I am kind of suspicious about it. It is just an silva132_V3_V4_2020_11_classifier.qza (5.3 MB) overall low-memory classifier. Usually, the classifier is around 100 Mb. This one is just 5.4 Mb.

Yeah, that does seems a bit small compared to the classifiers provided by the Qiime2 devs… (and also the files there use SILVA v138, which might be useful!)

Would you be willing to post all the commands you ran to build that classifier?

Out of all the files SILVA provides for v132, which one did you use?

1 Like

Actually, I have downloaded a Silva database from the Silva132 resources for qiime:


Subsequently, I have used the following command lines:
qiime tools import --type FeatureData[Sequence] --input-path silva_132_99_16S.fna --output-path silva_132_99_16S.qza

Imported silva_132_99_16S.fna as DNASequencesDirectoryFormat to silva_132_99_16S.qza

qiime tools import
–type ‘FeatureData[Taxonomy]’
–input-format HeaderlessTSVTaxonomyFormat
–input-path taxonomy_all_levels.txt
–output-path 16S-ref-taxonomy.qza

Imported taxonomy_all_levels.txt as HeaderlessTSVTaxonomyFormat to 16S-ref-taxonomy.qza

qiime feature-classifier extract-reads
–i-sequences silva_132_99_16S.qza
–p-f-primer CCTACGGGNGGCWGCAG
–p-r-primer GACTACHVGGGTATCTAATCC
–p-trunc-len 120
–p-min-length 100
–p-max-length 400
–o-reads 16S-ref-seqs.qza

Saved FeatureData[Sequence] to: 16S-ref-seqs.qza

qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads 16S-ref-seqs.qza
–i-reference-taxonomy 16S-ref-taxonomy.qza
–o-classifier silva132_V3_V4_2020_11_classifier.qza

Saved TaxonomicClassifier to: silva132_V3_V4_2020_11_classifier.qza

Hi @Angela1971,

You might make your life much easier by using RESCRIPt. :hourglass:

You can download whichever version of SILVA you’d like and process the database in your own way. This is the tool we use to generate the SILVA reference databases on the Data resources page.

Take it for a spin and let us know how things work out. :racing_car:

-Mike

2 Likes

Yes, indeed, you are right. However, it takes longer to run a taxonomy classification if the full length classifier is used. It is OK if you have a small dataset. I have to analyze sequencing data for 24 sequencing runs that we have done for 5 years, and more than 1000 samples. We have 341F and 785R primers to target the V3-V4 region. A trained classifier for the V3-V4 region allows to shorten time for classification, which is practically the most time-consuming process. Thank you for your response.

1 Like

RESCRIPt let's you do this :point_up_2:
1.e Make amplicon-region specific classifier

1 Like

Thanks, Mike!!! I found that I was looking for. You were right, using RESCRIPt was really helpful.

  1. Get SILVA database:

qiime rescript get-silva-data
–p-version ‘138’
–p-target ‘SSURef_NR99’
–p-include-species-labels
–o-silva-sequences silva-138-ssu-nr99-seqs.qza
–o-silva-taxonomy silva-138-ssu-nr99-tax.qza

  1. “Culling” low-quality sequences with cull-seqs:

qiime rescript cull-seqs
–i-sequences silva-138-ssu-nr99-seqs.qza
–o-clean-sequences silva-138-ssu-nr99-seqs-cleaned.qza

  1. Filtering sequences by length and taxonomy:

qiime rescript filter-seqs-length-by-taxon
–i-sequences silva-138-ssu-nr99-seqs-cleaned.qza
–i-taxonomy silva-138-ssu-nr99-tax.qza
–p-labels Archaea Bacteria Eukaryota
–p-min-lens 900 1200 1400
–o-filtered-seqs silva-138-ssu-nr99-seqs-filt.qza
–o-discarded-seqs silva-138-ssu-nr99-seqs-discard.qza

  1. Dereplicating in uniq mode:

qiime rescript dereplicate
–i-sequences silva-138-ssu-nr99-seqs-filt.qza
–i-taxa silva-138-ssu-nr99-tax.qza
–p-rank-handles ‘silva’
–p-mode ‘uniq’
–o-dereplicated-sequences silva-138-ssu-nr99-seqs-derep-uniq.qza
–o-dereplicated-taxa silva-138-ssu-nr99-tax-derep-uniq.qza

  1. Make a classifier for full length:

qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads silva-138-ssu-nr99-seqs-derep-uniq.qza
–i-reference-taxonomy silva-138-ssu-nr99-tax-derep-uniq.qza
–o-classifier silva-138-ssu-nr99-classifier.qza

  1. Make amplicon-region specific classifier:
    16S:
    qiime feature-classifier extract-reads
    –i-sequences silva-138-ssu-nr99-seqs-derep-uniq.qza
    –p-f-primer CCTACGGGNGGCWGCAG
    –p-r-primer GACTACHVGGGTATCTAATCC
    –p-n-jobs 2
    –p-read-orientation ‘forward’
    –o-reads silva138-nr99-seqs-16S-V3-V4.qza

qiime rescript dereplicate
–i-sequences silva138-nr99-seqs-16S-V3-V4.qza
–i-taxa silva-138-ssu-nr99-tax-derep-uniq.qza
–p-rank-handles ‘silva’
–p-mode ‘uniq’
–o-dereplicated-sequences silva-138-nr99-seqs-16S-V3-V4-uniq.qza
–o-dereplicated-taxa silva-138-nr99-tax-16S-V3-V4-derep-uniq.qza

qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads silva-138-nr99-seqs-16S-V3-V4-uniq.qza
–i-reference-taxonomy silva-138-nr99-tax-16S-V3-V4-derep-uniq.qza
–o-classifier silva-138-nr99-16S-V3-V4-classifier.qza

18S:
qiime feature-classifier extract-reads
–i-sequences silva-138-ssu-nr99-seqs-derep-uniq.qza
–p-f-primer CCAGCASCYGCGGTAATTCC
–p-r-primer ACTTTCGTTCTTGATYRA
–p-n-jobs 2
–p-read-orientation ‘forward’
–o-reads silva138-nr99-seqs-18S-V4.qza

qiime rescript dereplicate
–i-sequences silva138-nr99-seqs-18S-V4.qza
–i-taxa silva-138-ssu-nr99-tax-derep-uniq.qza
–p-rank-handles ‘silva’
–p-mode ‘uniq’
–o-dereplicated-sequences silva-138-nr99-seqs-18S-V4-uniq.qza
–o-dereplicated-taxa silva-138-nr99-tax-18S-V4-derep-uniq.qza

qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads silva-138-nr99-seqs-18S-V4-uniq.qza
–i-reference-taxonomy silva-138-nr99-tax-18S-V4-derep-uniq.qza
–o-classifier silva-138-nr99-18S-V4-classifier.qza

2 Likes

Thanks a lot for help. The RESCRIPt did worked well. I really appreciate it.

2 Likes

Hi @Angela1971,

I am glad we were able to help! Thank you for letting us know how it all worked out. Happy :qiime2:-ing!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.