Hi there!
I tried looking for similar posts to my problem but didn't see any offhand (I'm sorry if this has been answered before since I don't want to waste your time!). Briefly, I have three different sequencing datasets (from river water samples) all using the same primer region (515yF and 926pfR) generated from three different sequencing runs (two from the same company [Mr. DNA] and one from another [RTL Genomics]: two of these sequencing sets appear to be "properly classified" (i.e they had hits past "Domain Bacteria" and had taxa similar to prior river samples) while one set of samples fails to be "properly classified". The classification was carried out with a self-made q2-feature classifier using SILVA v138 and trimmed to primer specific sequence regions with the RESCRIPt plugin (following the tutorial's instructions). This classifier did not have this issue with a previous dataset that was more complex. This was all carried out with QIIME2 v 2022.2 installed using conda on Ubuntu v 20.24.
Details of analysis:
Demux quality plots for each sequencing run:
mr.dna-aug-demux.qzv (315.9 KB)
mr.dna-dec-demux.qzv (313.6 KB)
rtl-aug-demux.qzv (316.6 KB)
#I used DADA2 (denoise-single) separately for each set of samples with the same trim and truncation #parameters to follow the assumptions of the DADA2 error model. Ex.
qiime dada2 denoise-single
--i-demultiplexed-seqs mr.dna-aug-demux.qza
--p-trim-left 9
--p-trunc-len 294
--o-representative-sequences mr.dna.aug.rep-seqs.qza
--o-table mr.dna.aug.table.qza
--o-denoising-stats mr.dna.aug.stats.qza
Output from all three runs:
mr.dna.aug.stats.qzv (1.2 MB)
mr.dna.dec.stats.qzv (1.2 MB)
rtl.aug.stats.qzv (1.2 MB)
#All three DADA2 output files were merged using:
qiime feature-table merge
--i-tables ./mr.dna-aug-seqs/mr.dna.aug.table.qza
--i-tables ./mr.dna-dec-seqs/mr.dna.dec.table.qza
--i-tables ./rtl-aug-seqs/rtl.aug.table.qza
--o-merged-table dada2.merged.table.qza
qiime feature-table merge-seqs
--i-data ./mr.dna-aug-seqs/mr.dna.aug.rep-seqs.qza
--i-data ./mr.dna-dec-seqs/mr.dna.dec.rep-seqs.qza
--i-data ./rtl-aug-seqs/rtl.aug.rep-seqs.qza
--o-merged-data merged.rep-seqs.qza
Output from merging:
merged.rep-seqs.qzv (1.1 MB)
dada2.merged.table.qzv (646.8 KB)
#All well and smooth. Now here comes the fun part...
#I run the RESCRIPt made q2-feature classifier (mentioned above):
qiime feature-classifier classify-sklearn
--p-n-jobs -1
--p-reads-per-batch 5000 \
--i-classifier silva-138-ssu-nr99-515f-926r-classifier.qza
--i-reads merged.rep-seqs.qza
--o-classification bonita.taxonomy.qza
#Filter the output
qiime taxa filter-table
--i-table dada2.merged.table.qza
--i-taxonomy bonita.taxonomy.qza
--p-exclude mitochondria,chloroplast,eukaryota
--o-filtered-table dada2.merged.filtered.table.qza
#And generate barplots
qiime taxa barplot
--i-table dada2.merged.filtered.table.qza
--i-taxonomy bonita.taxonomy.qza
--m-metadata-file bonita.metadata.tsv
--o-visualization filtered-taxa-bar-plots.qzv
#Barplots
filtered-taxa-bar-plots.qzv (2.0 MB)

So something is suspect about the December sampling run (mr.dna-dec-demux.qza) compared to the other two, even though they were processed identically. I am not sure if I made an error earlier on or if I am overlooking something, but I am quite confused. Here are the counts/taxonomic classification I got back from Mr. DNA but I wanted to have all the datasets run through the same pipeline for realistic comparisons. I am not sure if this has something to do with the primer specific regions from the RESCRIPt trained classifier, but this shouldn't really impact the results if they were sequenced with the primer region.
FullTaxa.genus.counts.txt (55.7 KB)
Thank you for your help!