Dada2 Low Feature Counts with Paired-End Reads

Hi, the typical length of the 16S rRNA V4 region is about 250 bp. We usually sequence the amplicon on Illumina platforms, such as Miseq, by pair-ended sequencing. After sequencing, we merge the forward and reverse reads. Check this thread for a graphical illustation.

You should have an overlapping region of at least 20 bp when merging your forward and reverse reads. In practice, one should try to maximize the length of the overlapping region as it reduces sequencing error rates in your merged sequences. For the calculation, the (trunc-len-f - trim-left-f) + (trunc-len-r - trim-left-r) - expected amplicon length should give you the overlapping length. For the numbers that Ben suggested, here's the calculation: 180-40 + 180-40 - 250 = 30 bp.

Your forward and reverse reads are of high quality. For the truncate lengths, I suggest to use 240 bp for both forward and reverse reads (–p-trunc-len-f 240, –p-trunc-len-r 240). The first 35 bp of your reads look peculiar to me. It's longer than the typical lengths of universal primer sets targetting the V4 region. You can inspect your reads using FASTQC to find out whether the adapter/primer sequences have been trimmed. After that, just use the –p-trim-left-f and –p-trim-left-r to trim off your adapter/primer sequence.

Hope that helps.
-Yanxian

4 Likes