Greetings Qiime2 community,
![]()
I am running qiime2 amplicon 2026.1 in a conda environment.
I am new to qiime2, and learning a lot using the great tutorials to analyze mock libraries before analyzing our lab's experimental 16S data.
Currently I am analyzing a Miseq mock library. My understanding of Miseq read structure is that the PCR primers used to generate the amplicons are directly adjacent to the insert (grey), and the Illumina indices, sequencing primers, etc are distal to the insert, as shown in this image:
To analyze sequencing data, I could trim the PCR primers used to generate the amplicons (such as 515f and 806r) and that would remove the sequencing primers (orange and blue), the indices (green), and the adapters (?) in red and black.
As indicated in this q2 link, it would be important to remove non-biological sequence by trimming before analyzing reads. That makes sense to me, because taxonomy classifiers would frequently misclassify reads that include non-biological sequence.
My question is about primer trimming in the qiime2 16S tutorials. I'm afraid I'm missing something obvious, due to my inexperience with bioinformatics / q2. The trimming suggestions in the tutorials seem like they would be leaving a lot of non-biological sequence (untrimmed primers/adapters/etc).
For example, in the gut-to-soil tutorial: "16S rRNA gene was amplified using the F515-R806 primers .... Paired-end sequencing was performed on an Illumina MiSeq". We are recommended to use the quality scores viewed from demux.qzv to choose trimming/ truncation, and the suggested levels are as below:
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux.qza \
--p-trim-left-f 0 \
--p-trunc-len-f 250 \
--p-trim-left-r 0 \
--p-trunc-len-r 250 \
This would leave a lot of non-biological sequence on the 5' end of the forward and reverse reads, if I'm understanding correctly.
I checked some of the other tutorials to see if I could learn more suggestions about primer trimming with Miseq data.
The Atacama soil tutorial also uses Miseq data, and suggests only trimming the first 13 nt on the 5' ends, which is insufficient to remove all non-biological sequence, if I'm understanding correctly.
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux.qza \
--p-trim-left-f 13 \
--p-trim-left-r 13 \
--p-trunc-len-f 150 \
--p-trunc-len-r 150 \
The Parkinson's mouse tutorials also use 16S Miseq data, and does not trim any 5' primer sequence. The Moving Pictures tutorial uses HiSeq data, which according to the Illumina image link above, has the same read structure as MiSeq. The tutorial also does not trim any nucleotides from the 5' end of reads.
All of the tutorials are using DADA2 during the read truncation step, and the sequences have already been demultiplexed. Is it possible that indices etc are trimmed during the demux step when raw data is imported into q2? I see that if cutadapt is used to demux, then primers are automatically removed, but it seems like none of the q2 tutorials I looked at are using cutadapt to demux. Also, the DADA2 16S tutorial indicates that data should already have primers removed, so I would assume that DADA2 is not automatically removing primers during the denoising step in the q2 tutorials either.
I could use grep to check my raw seq data before and after importing to check for primer seq etc, but it would be reassuring to have some experienced folks chime in (ha
) so that I don't just rely on my own command line experiments to make sense of things.
I see lots of folks in the forum using cutadapt in qiime2 in order to trim primers in a way that makes more sense to me, for example:
qiime cutadapt trim-paired
--i-demultiplexed-sequences demux.qza \
--p-cores 4 \
--p-front-f GTGYCAGCMGCCGCGGTAA \
--p-front-r CCGYCAATTYMTTTRAGTTT \
--p-match-adapter-wildcards \
--p-match-read-wildcards \
--p-discard-untrimmed \
--o-trimmed-sequences demux_trimmed.qza \
This would remove the amplicon PCR primers like 515f and 806r, and also the distal sequences mentioned above, leaving only biological sequence for downstream steps.
My questions are:
- Does the process of importing/demuxing of samples into q2 as shown in the tutorials (e.g. NOT demux using cutadapt) also remove common Illumina adapters and I just didn't realize it?
- Is that why in the tutorials, the demuxed reads are not trimmed, and primer sequence removal is not described?
- If above is true, then why are folks advised to also cutadapt to remove primers from their data that has been imported and demuxed by q2, for example here? Seems like it should be one or the other; not both.
Thanks for any info or suggestions you might have! ![]()


