ubiome paired-end reads will not join!

Hi all, I'm running into similar issues when running uBiome data through DADA2. Here's what my quality quality plots looked like after import (.fastq files were already demultiplexed, and to construct each F and R file I concatenated .fastq files from four separate lanes):

I'm new to this, but the sequence quality looks good, no? I read that uBiome trims barcodes, primers and linkers, but I couldn't find whether they do any additional quality filtering before packaging the .fastq files for download.

Here's what I ran:

qiime dada2 denoise-paired \
--i-demultiplexed-seqs paired-end-demux.qza \
--p-trim-left-f 1 \
--p-trim-left-r 1 \
--p-trunc-len-f 150 \
--p-trunc-len-r 150 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza

And this is what the output looked like:

Any idea what happened here? Should I be setting the trim and trunc parameters differently? The amplified region is ~254bp.

Thanks and all the best,

I forgot to include the demultiplexed sequence counts summary. Here it is:

Let me know if any additional information would be helpful.

Hi Charlie,
Your reads are clearly not long enough to successfully join.

I am guessing you may be targeting the V4 domain? 150 + 150 is not quite enough to overlap this ~290nt-long amplicon with 20 nt of overlap (the default requirement for dada2 to join the reads).

Your only option now will be to use the forward reads and proceed as if you have single-end data.

Good luck!

Hi Nicholas,

Thanks for the quick reply. Yes, it’s the V4 region (515F/806R). Out of curiosity, where are you getting the 290bp number from? ~250bp is the number I’ve seen before, but browsing the literature a bit, I’m seeing both ~250bp and 300+bp. Why the discrepancy - what am I missing?

That makes sense that dada2 wouldn’t be able to join the reads if the overlap is so small, but if that’s the case, any idea how uBiome joins reads for its in-house analysis?


From the position of those primers on the E. coli reference genome (that's what the numbers 515 + 806 refer to): 806 - 515 = 291

But that's just the reference, and there is a little bit of variation to either side for different species.

~250 is after the primers have been removed, but again there is still length variation to either side. Hence all the variation in the literature.

probably by permitting a lower minimum overlap? (this could lead to inappropriate joining)

you can also try qiime vsearch join-pairs to see if you get more successful read joining.

Please note though, if you go this route, you will not be able to use q2-dada2 for denoising, since the joined reads quality scores invalidate the error model.

1 Like

Thank you both for your responses, that all makes sense! I think I’ll proceed with just the forward reads.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.