Hello,
This is my first time posting; please let me know if I was to format my question differently.
**I suspect that I am not properly extracting reads as I attempt to create a custom Silva classifier. Some of the taxonomic classifications have confidence intervals slightly greater than 1. **
Here are the commands I use to import and extract sequences
qiime tools import
--type 'FeatureData[Sequence]'
--input-path SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna
--output-path 99_otus.qza
qiime tools import
--type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat
--input-path SILVA_132_QIIME_release/taxonomy/16S_only/99/taxonomy_all_levels.txt
--output-path ref-taxonomy.qza
qiime feature-classifier extract-reads
--i-sequences 99_otus.qza
--p-f-primer AGAGTTTGATCMTGGCTCAG
--p-r-primer ACTCCTACGGGAGGCAGC
--p-trunc-len 325
--p-trim-left 20
--p-min-length 300
--p-max-length 400
--o-reads extracted_ref-seqs.qza
**Again, I suspect I am not using appropriate parameters in the above "feature-classifier extract-reads" command. Perhaps the trim, trunc, or length parameters? **
Here are the commands I used for training and testing the classifier, just in case:
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads extracted_ref-seqs.qza
--i-reference-taxonomy ref-taxonomy.qza
--o-classifier silva_16s_v1-v2_custom_nb-classifier.qza
qiime feature-classifier classify-sklearn
--i-classifier silva_16s_v1-v2_custom_nb-classifier.qza
--i-reads merged_SeqRun1and2_rep-seqs.qza
--o-classification taxonomy_silva_custom_v1-v2.qza
qiime metadata tabulate
--m-input-file merged_SeqRun1and2_rep-seqs.qza
--m-input-file taxonomy_silva_custom_v1-v2.qza
--o-visualization taxonomy_silva_custom_v1-v2.qzv
Background information; sequencing:
- Ion Torrent PGM
- V1 - V2 (27F and 355R)
- Forward reads only
- Ion Torrent adaptor and barcode tagged already demultiplexed by sequencing facility
- Sequence still contains 27F and 355R primers
Background information; denoising:
qiime dada2 denoise-pyro
--i-demultiplexed-seqs SeqRun1_imported.qza
--p-trim-left 20
--p-trunc-len 325
--o-representative-sequences SeqRun1_rep-seqs-dada.qza
--o-table SeqRun1_table-dada2.qza
--o-denoising-stats SeqRun1_stats-dada2.qza
I use the same exact parameters to denoise "SeqRun2". The two sequence runs are different samples (not replicates).
I trim off the 20 nucleotide forward primer. I truncate at position 325 in both denoising runs, because (1) this removes the 18 nucleotide reverse primer sequence, and (2) both runs show a sequence quality drop off at that common position.
Background information, merging and grouping:
I run "feature-table merge" and "feature-table merge-seqs" to merge the feature-tables and rep seqs from both sequence/denoising runs.
Above, I stated that the "two sequence runs are different samples (not replicates)", this is true, but I did have a few same-sample-replicates within each of the two runs.
-
For example, in SeqRun1, I have microbiomeSample1, microbiomeSample1_again, microbiomeSample2, microbiomeSample3...etc.
-
Then, in SeqRun2, I have microbiomeSample101, microbiomesample102, microbiomesample103, microbiomesample103_again.
I run "feature-table group" to group these replicates. The inputs include the merged feature table and a custom metadata file to facilitate groupings.