Hi QIIME Team,
I am running 16S V3V4 (341F/805R region) analysis on q2cli version 2019.7.0 on AWS. The reads were obtained from 2x300bp MiSeq run and I have trimmed them using q2-cutadapt prior DADA2 denoising. From the feature-table, it showed I get 1255 features. Then I performed classifier training to gg_13_8 on 99 OTUs as instructed with the adjusting parameter as below:
- Import
qiime tools import \ --type 'FeatureData[Sequence]' \ --input-path 99_otus.fasta \ --output-path 99_otus.qza qiime tools import \ --type 'FeatureData[Taxonomy]' \ --input-format HeaderlessTSVTaxonomyFormat \ --input-path 99_otu_taxonomy.txt \ --output-path ref-taxonomy.qza
- Extract Ref
qiime feature-classifier extract-reads \ --i-sequences 99_otus.qza \ --p-f-primer CCTACGGGNGGCWGCAG \ --p-r-primer GACTACHVGGGTATCTAATCC \ --p-trunc-len 430 \ --p-min-length 300 \ --p-max-length 500 \ --o-reads ref-seqs.qza \ --verbose
I choose the parameters based on the feature table report on length distribution. You can see the detail here trim-rep-seqs1.qzv (425.3 KB) trim-table1.qzv (464.6 KB)
- Training
qiime feature-classifier fit-classifier-naive-bayes \ --i-reference-reads ref-seqs.qza \ --i-reference-taxonomy ref-taxonomy.qza \ --o-classifier gg-13-8-99-otu-v3v4-illumina-classifier.qza \ --verbose
And then I tested the classifier to my dataset, but it returned only 775 assigned features (from total of 1255) metadata_342-805.tsv (106.5 KB) . However, if I used the ready to go classifier on your website Greengenes 13_8 99% OTUs from 515F/806R region of sequences it gave me all 1225 assigned features metadata_515-806.tsv (178.0 KB).
So, my question is:
- I know that it is better to use the classifier that trained using my own data but why I get less assigned features using my trained classifier? Is there anything wrong with the parameter when I extracting the reference?
- Originally, I want to test my classifier on the mockrobiota that hyperlinked in your website. But I cannot find the dataset that optimized for V3V4, only V4. Do you have any suggestion on how I can evaluate my classifier whether it is doing good/not?
Thank you.