Hey guys,
I want to understand the naive bayes classification of silva 138_99 and greengenes2 should be performed.
Given the tutorial (Training feature classifiers with q2-feature-classifier — QIIME 2 2023.2.0 documentation), the commands used should be like I putted below.
I saw in this other post (How to train the classifier for V3-V4 region with 99% identity using full length seuqnces from new relase of GreenGenes-2022?? - #7 by buzic) that for greengenes2 it should be like this:
qiime feature-classifier extract-reads
--i-sequences 2022.10.backbone.full-length.fna.qza
--p-f-primer GTGGTGGTGGTGGTGGTG
--p-r-primer GGACTGGACTGGACTGGA
--p-min-length 100
--p-max-length 600
--o-reads gg_12_10_ref_primer_region_seqs.qza
then use your newly trimmed sequence file along with the backbone taxonomy to train your classifier:
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads gg_12_10_ref_primer_region_seqs.qza
--i-reference-taxonomy 2022.10.backbone.tax.qza
--o-classifier gg_12_10_primer_region-classifier.qza
So my questions are these:
-There is no need to do these steps of importing reference datasets before extracting reads according to tutorial? Like below?
qiime tools import
--type 'FeatureData[Sequence]'
--input-path 85_otus.fasta
--output-path 85_otus.qza
qiime tools import
--type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat
--input-path 85_otu_taxonomy.txt
--output-path ref-taxonomy.qza
-How should it be for silva? Like this? Where do I get the fasta sequences for silva 138_99 to import as qza?
Import data
qiime tools import
--type 'FeatureData[Sequence]'
--input-path silva-138-99-seqs.fasta
--output-path silva-138-99-seqs.qza
qiime tools import
--type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat
--input-path silva_138_99_taxonomy.txt
--output-path ref-taxonomy.qza
Extract reads
qiime feature-classifier extract-reads
--i-sequences silva-138-99-seqs.qza
--p-f-primer GTGCCAGCMGCCGCGGTAA
--p-r-primer GGACTACHVGGGTWTCTAAT
--o-reads ref-seqs.qza
Train classifier
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads ref-seqs.qza
--i-reference-taxonomy ref-taxonomy.qza
--o-classifier classifier.qza
And the last ones:
-Is it really necessary to use -p-trunc-len and -pmin-lnegth and -p-max-length if I have already performed DADA2?
-What is the difference between Silva SSU rescript and trainig with naive bayes?
Thank you in advance.