I was also actually thinking of writing @gmdouglas - since all of these amplicon pools were sequenced at the IMR - just to see if he or anyone there has run into this. I've pretty much followed the SOP you've written (Amplicon SOP v2 (qiime2 2019.7) · LangilleLab/microbiome_helper Wiki · GitHub) - which is awesome. Thank you.
For example, for a particular run.
Import reads
mkdir reads_qza
qiime tools import \
--type SampleData[PairedEndSequencesWithQuality] \
--input-path demux/ \
--output-path reads_qza/kellogg_16S_IMR4sequences.qza \
--input-format CasavaOneEightSingleLanePerSampleDirFmt
How's the imported data look:
qiime demux summarize \
--i-data reads_qza/kellogg_16S_IMR4sequences.qza \
--o-visualization reads_qza/kellogg_16S_IMR4reads_untrimmed_summary.qzv
Trim primers:
qiime cutadapt trim-paired \
--i-demultiplexed-sequences reads_qza/kellogg_16S_IMR4sequences.qza \
--p-cores 4 \
--p-front-f GTGYCAGCMGCCGCGGTAA \
--p-front-r CCGYCAATTYMTTTRAGTTT \
--p-discard-untrimmed \
--p-no-indels \
--o-trimmed-sequences reads_qza/kellogg_16S_IMR4reads_trimmed.qza
Hows the data look after trimming (19-20bp shorter - good news!):
qiime demux summarize \
--i-data reads_qza/kellogg_16S_IMR4reads_trimmed.qza \
--o-visualization reads_qza/kellogg_16S_IMR4reads_trimmed_summary.qzv
And then run dada2 (i've tried a whole bunch of settings here to try and compromise between my old runs and these new ones; here is one of the many):
qiime dada2 denoise-paired --i-demultiplexed-seqs reads_qza/kellogg_16S_IMR4reads_trimmed.qza \
--p-trim-left-f 30 \
--p-trunc-len-f 270 \
--p-trim-left-r 30 \
--p-trunc-len-r 210 \
--p-max-ee-f 3 \
--p-max-ee-r 5 \
--p-n-threads 8 \
--output-dir dada2_output
Any thoughts would be so appreciated! Seems to work fine for the old runs (though, I import them differently using qiime demux emp-paired
), but then yields so many more features for the dual-indexed libraries run a few weeks ago.