Hello,
First and foremost thanks a lot for this amazing forum. This is a great resource.
I am new to qiime2 an I'm playing around with taxonomic classification of bird cloacal microbiota using the 28S rRNA (D4-D6 region; among others).
I created a primmer region specific classifier, using the RESCRIPT plugin and following the tutorial provided here . I used SILVA132.
Results of taxonomic classification seem to make sense, apart from one weird classification:
d__Eukaryota; p__Eukaryota; c__Eukaryota; o__Eukaryota; f__Eukaryota; g__Eukaryota
I was expecting in this case that the classification would be just:
d_Eukaryota
In the results I get both:
d__Eukaryota; p__Eukaryota; c__Eukaryota; o__Eukaryota; f__Eukaryota; g__Eukaryota
and
d_Eukaryota
Suppose this is a problem with the classifier? Is SILVA the best database for this? Any ideas?
This is what I did:
-
Get SILVA data
qiime rescript get-silva-data --p-version '132' --p-target 'LSURef' --p-include-species-labels --o-silva-sequences silva-132-LSU-nr99-seqs.qza --o-silva-taxonomy silva-132-LSU-nr99-tax.qza -
Culling low quality sequences
qiime rescript cull-seqs --i-sequences silva-132-LSU-nr99-seqs.qza --o-clean-sequences silva-132-LSU-nr99-seqs-cleaned.qza -
Filtering sequences by length and taxonomy
qiime rescript filter-seqs-length-by-taxon --i-sequences silva-132-LSU-nr99-seqs-cleaned.qza --i-taxonomy silva-132-LSU-nr99-tax.qza --p-labels Archaea Bacteria Eukaryota --p-min-lens 900 1000 1200 --o-filtered-seqs silva-138-ssu-nr99-seqs-filt.qza --o-discarded-seqs silva-138-ssu-nr99-seqs-discard.qza -
Dereplication of sequences and taxonomy
qiime rescript dereplicate --i-sequences silva-132-LSU-nr99-seqs-filt.qza --i-taxa silva-132-LSU-nr99-tax.qza --p-rank-handles 'silva' --p-mode 'uniq' --o-dereplicated-sequences silva-132-LSU-nr99-seqs-derep-uniq.qza --o-dereplicated-taxa silva-132-LSU-nr99-tax-derep-uniq.qza -
Extract reads on primer region
qiime feature-classifier extract-reads --i-sequences silva-132-LSU-nr99-seqs-derep-uniq.qza --p-f-primer GTAACTTCGGGAWAAGGATTGGCT --p-r-primer AGAGTCAARCTCAACAGGGTCTT --p-min-length 250 --p-max-length 600 --p-n-jobs 2 --p-read-orientation 'forward' --o-reads silva-132-LSU-nr99-seqs-GA20F-RM9R.qza -
Dereplicate extracted region
qiime rescript dereplicate --i-sequences silva-132-LSU-nr99-seqs-GA20F-RM9R.qza --i-taxa silva-132-LSU-nr99-tax-derep-uniq.qza --p-rank-handles 'silva' --p-mode 'uniq' --o-dereplicated-sequences silva-132-LSU-nr99-seqs-GA20F-RM9R-uniq.qza --o-dereplicated-taxa silva-132-LSU-nr99-tax-GA20F-RM9R-derep-uniq.qza -
Build amplicon-region specific classifier
qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads silva-132-LSU-nr99-seqs-GA20F-RM9R-uniq.qza --i-reference-taxonomy silva-132-LSU-nr99-tax-GA20F-RM9R-derep-uniq.qza --o-classifier silva-132-LSU-nr99-GA20F-RM9R-classifier.qza
Kind regards,
Hugo Pereira