How to train the classifier for V3-V4 region with 99% identity using full length seuqnces from new relase of GreenGenes-2022??

buzic · October 23, 2023, 2:31pm

Hi,

the readme file looks like you'd need the following files:

2022.10.backbone.full-length.fna.qza
2022.10.backbone.tax.qza

The following method should help point you in the right direction. First take the sequence files and trim them based on your primers (obviously I've just added a random sequence here!). You can also add truncation and min/max lengths of sequences based on your experimental design, for example:

qiime feature-classifier extract-reads \
  --i-sequences 2022.10.backbone.full-length.fna.qza \
  --p-f-primer GTGGTGGTGGTGGTGGTG \
  --p-r-primer GGACTGGACTGGACTGGA \
  --p-min-length 100 \
  --p-max-length 600 \
  --o-reads gg_12_10_ref_primer_region_seqs.qza

then use your newly trimmed sequence file along with the backbone taxonomy to train your classifier:

qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads gg_12_10_ref_primer_region_seqs.qza \
  --i-reference-taxonomy 2022.10.backbone.tax.qza \
  --o-classifier gg_12_10_primer_region-classifier.qza

I hope that helps, there are lots of walkthroughs and helpful documents in the qiime2 forum and docs, for example here