Hello Forum!
I have been running into this issue and have yet to find a good solution. Here is an outline:
Seq Specs: MiSeq, V4-V5 region (515F-806R primers) PE 250 bp.
QIIME2 Specs: conda installation, 2019.4, using cluster computing
Our sequencing core demultiplexes the data before returning the reads. They also employ the use of heterogeneity spacers. Thus the reads we receive back still have the spacers and the primers in the sequences. Thus the amplicon read looks roughly like this:
R1: 5' Spacer - 5' Primer Sequence (bacterial 16S sequence) - sequence of interest
R2: 3' Spacer - 3' Primer Sequence (bacterial 16S sequence) - sequence of interest
Following the advice of the forum, using the cutadapt plugin, I removed the primers (and thus the spacers too), but when the DADA2 step comes, I essentially get 0 reads (commands below):
trim primers and spacers
qiime cutadapt trim-paired
--i-demultiplexed-sequences ./raw_artifacts/demux.qza
--p-cores 20
--p-front-f GAGTGCCAGCMGCCGCGGTAA
--p-front-r ACGGACTACHVGGGTWTCTAAT
--p-error-rate 0.1
--p-discard-untrimmed False
--o-trimmed-sequences ./primer_trimmed_seqs/trimmed-seqs.qza
--verbose
truncate and denoise with DADA2
qiime dada2 denoise-paired
--i-demultiplexed-seqs ./primer_trimmed_seqs/trimmed-seqs.qza
--p-trim-left-f 0
--p-trunc-len-f 245
--p-trim-left-r 0
--p-trunc-len-r 220
--p-n-threads 0
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats ./dada2-denoise-dir/stats-dada2.qza
--verbose
And here is the head of the output I get from the DADA2 denoising:
And here is the quality visualization:
Now, here is the issue: I have heard conflicting advice from people at my University and here on the forum - is it best to trim primers off prior to DADA2 denoising, or should I leave the primers on? It seems the consensus on the forum is to remove primers. The consensus at my University is to leave the primers on to help align the sequences, since the primers are the conserved region, and thus using the conserved portion to align highlights the differences in the variable regions.
Should I leave the primers, and just trim the spacers, or do you think something else is going on here? Returning effectively zero sequences after denoising and chimera removal tells me something isn't right here.
Removing just the spacers is tricky, though, because the sequencing core employs 4 different spacers, thus I would have to run cutadapt four separate times on the four sets of reads with different spacers, and thus run DADA2 four different times. I would think this would mess with modeling used within DADA2, and thus the error rate and filtering, chimera removal, etc. would be modeled on a subset of the data, and not the entire dataset.
Any help is greatly appreciated!