Hey guys,
I want to understand the naive bayes classification of silva 138_99 and greengenes2 should be performed.
Given the tutorial (Training feature classifiers with q2-feature-classifier — QIIME 2 2023.2.0 documentation), the commands used should be like I putted below.
I saw in this other post (How to train the classifier for V3-V4 region with 99% identity using full length seuqnces from new relase of GreenGenes-2022?? - #7 by buzic) that for greengenes2 it should be like this:
qiime feature-classifier extract-reads 
--i-sequences 2022.10.backbone.full-length.fna.qza 
--p-f-primer GTGGTGGTGGTGGTGGTG 
--p-r-primer GGACTGGACTGGACTGGA 
--p-min-length 100 
--p-max-length 600 
--o-reads gg_12_10_ref_primer_region_seqs.qza
then use your newly trimmed sequence file along with the backbone taxonomy to train your classifier:
qiime feature-classifier fit-classifier-naive-bayes 
--i-reference-reads gg_12_10_ref_primer_region_seqs.qza 
--i-reference-taxonomy 2022.10.backbone.tax.qza 
--o-classifier gg_12_10_primer_region-classifier.qza
So my questions are these:
-There is no need to do these steps of importing reference datasets before extracting reads according to tutorial? Like below?
qiime tools import 
--type 'FeatureData[Sequence]' 
--input-path 85_otus.fasta 
--output-path 85_otus.qza
qiime tools import 
--type 'FeatureData[Taxonomy]' 
--input-format HeaderlessTSVTaxonomyFormat 
--input-path 85_otu_taxonomy.txt 
--output-path ref-taxonomy.qza
-How should it be for silva? Like this? Where do I get the fasta sequences for silva 138_99 to import as qza?
Import data
qiime tools import 
--type 'FeatureData[Sequence]' 
--input-path silva-138-99-seqs.fasta 
--output-path silva-138-99-seqs.qza
qiime tools import 
--type 'FeatureData[Taxonomy]' 
--input-format HeaderlessTSVTaxonomyFormat 
--input-path silva_138_99_taxonomy.txt 
--output-path ref-taxonomy.qza
Extract reads
qiime feature-classifier extract-reads 
--i-sequences silva-138-99-seqs.qza 
--p-f-primer GTGCCAGCMGCCGCGGTAA 
--p-r-primer GGACTACHVGGGTWTCTAAT 
--o-reads ref-seqs.qza
Train classifier
qiime feature-classifier fit-classifier-naive-bayes 
--i-reference-reads ref-seqs.qza 
--i-reference-taxonomy ref-taxonomy.qza 
--o-classifier classifier.qza
And the last ones:
-Is it really necessary to use -p-trunc-len and -pmin-lnegth and -p-max-length if I have already performed DADA2?
-What is the difference between Silva SSU rescript and trainig with naive bayes?
Thank you in advance.