Hello!
I have built a new naive bayesian classifier using RESCRPt for the V3-V4 variable regions using the custom primers provided to us from our sequencing centre and we are using the SILVA138 reference. We are trying to validate their findings by recreating the classifier they have though are getting very different results.
To put into context - we are using the same clean.fasta files the only different is the aligner.
They directed us towards the marker gene sets on the QIIME2 help page so I mainly have two key questions;
- I used the RESCRIPt tutorial to make a custom V3-V4 classifier - here I think in the first step to get the whole database and then go through the steps to then make a specific classifier. Should I be using V4 specific references instead?
- The custom classifier I made has come up with strange OTUs such as a high number of Chloroplasts and Unclassified which is not in their OTU table - am I missing a clean up step somewhere?
-
Get SILVA database:
qiime rescript get-silva-data
--p-version '138'
--p-target 'SSURef_NR99'
--p-include-species-labels
--o-silva-sequences outputs/silva-138-ssu-nr99-seqs.qza
--o-silva-taxonomy outputs/silva-138-ssu-nr99-tax.qza -
“Culling” low-quality sequences with cull-seqs:
qiime rescript cull-seqs
--i-sequences outputs/silva-138-ssu-nr99-seqs.qza
--o-clean-sequences outputs/silva-138-ssu-nr99-seqs-cleaned.qza -
Filtering sequences by length and taxonomy:
qiime rescript filter-seqs-length-by-taxon
--i-sequences outputs/silva-138-ssu-nr99-seqs-cleaned.qza
--i-taxonomy outputs/silva-138-ssu-nr99-tax.qza
--p-labels Archaea Bacteria Eukaryota
--p-min-lens 900 1200 1400
--o-filtered-seqs outputs/silva-138-ssu-nr99-seqs-filt.qza
--o-discarded-seqs outputs/silva-138-ssu-nr99-seqs-discard.qza -
Dereplicating in uniq mode:
qiime rescript dereplicate
--i-sequences outputs/silva-138-ssu-nr99-seqs-filt.qza
--i-taxa outputs/silva-138-ssu-nr99-tax.qza
--p-rank-handles 'silva'
--p-mode 'uniq'
--o-dereplicated-sequences outputs/silva-138-ssu-nr99-seqs-derep-uniq.qza
--o-dereplicated-taxa outputs/silva-138-ssu-nr99-tax-derep-uniq.qza -
Make a classifier for full length:
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads outputs/silva-138-ssu-nr99-seqs-derep-uniq.qza
--i-reference-taxonomy outputs/silva-138-ssu-nr99-tax-derep-uniq.qza
--o-classifier silva-138-ssu-nr99-classifier.qza -
Make amplicon-region specific classifier:
16S:
qiime feature-classifier extract-reads
--i-sequences outputs/silva-138-ssu-nr99-seqs-derep-uniq.qza
--p-f-primer CCTAYGGGRBGCASCAG
--p-r-primer GGACTACNNGGGTATCTAAT
--p-n-jobs 2
--p-read-orientation 'forward'
--o-reads silva138-nr99-seqs-16S-V3-V4.qza
qiime rescript dereplicate
--i-sequences silva138-nr99-seqs-16S-V3-V4.qza
--i-taxa outputs/silva-138-ssu-nr99-tax-derep-uniq.qza
--p-rank-handles 'silva'
--p-mode 'uniq'
--o-dereplicated-sequences outputs/silva-138-nr99-seqs-16S-V3-V4-uniq.qza
--o-dereplicated-taxa outputs/silva-138-nr99-tax-16S-V3-V4-derep-uniq.qza
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads outputs/silva-138-nr99-seqs-16S-V3-V4-uniq.qza
--i-reference-taxonomy outputs/silva-138-nr99-tax-16S-V3-V4-derep-uniq.qza
--o-classifier silva-138-nr99-16S-V3-V4-classifier.qza
I have my code above
Thank you so much for your help!