I have a 16s dataset consisting of demultiplexed Illumina paired-end reads, each at 250 bp using primers 341F CCTACGGGNGGCWGCAG (17nt) and 805R GACTACHVGGGTATCTAATCC (21nt). I successfully imported the data manifest and was able to visualize quality scores for the samples.
Initially I planned to use dada2 without any truncation, but after reading further about parameters for truncation and looking at my data I thought this might be a good idea. However, the reads being only 250 bp each and the amplicon total length being approx 464 bp, this is only a 36 bp difference, and assuming minimum 20 bp are needed for overlap, this leaves only 16 bp left to truncate across both reads.
I also realized that the sequencing service didn't trim the primers from the 5' end of the reads, and that the 22nt f (17 bp primer plus what seems to be a 5 bp adapter before this) and 21nt r adapters were included in the 250 total bp read. I decided to trim these using the following cutadapt code anyway:
qiime cutadapt trim-paired
--i-demultiplexed-sequences G1_pairedend.qza
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--p-match-adapter-wildcards
--o-trimmed-sequences G1_cut_pairedend.qza
But I'm wondering if I can even use these cut sequences at all since there will now be a gap between forward and reverse when merging reads. It also seems that using cutadapt reduced sequence quality to some extent based on the interactive quality plot, but maybe I am misinterpreting this.
Here are some images of data quality scores:
So how much wiggle room do I have? I don't know if I can remove the adapters with cutadapt or do much truncation in dada2 without hindering sample merging across the board. Some advice would be appreciated!