Trouble with merging Paired end reads

Dear Qiime 2 users, I'm new to the community and to the software itself so I have a bunch of questions that the tutorials were not able to respond. When I heard about the denoising algorithms I wanted to use on my data because I do think that clustering in OTUs is a waste of results and, since we have new strategies, they need to be used and popularized.

Well, I just started a microbiome project, it was sequenced using a V3 2x 250 library with Illumina miseq platform, I've imported my demultiplexed fastqs, trimmed for adaptors and phix sequences, in a Qiime 2 artifact, the overwall sequence quality of my samples is :

I do think that my R2 files quality are not that good but, well, I used the following command to denoise my data using Dada2 -

qiime dada2 denoise-paired --i-demultiplexed-seqs Bacteria-paired.qza --o-table Dada2/Bacteria_table2.qza --o-representative-sequences Dada2/Bacteria_repseq2.qza --p-trunc-len-f 220 --p-trunc-len-r 200 --p-n-threads 5 --p-n-reads-learn 100000 --o-denoising-stats Dada2/Bacteria_denoising2.qza

But I got the following results:

Despite being able to denoise my data, the software was not able to merge my R1/R2 files. Does anyone here know how to proceed?

Thank you for your time.

Hi @Bruno_Andrade,

As far as I know, the V3 region is about 150 to 170 bp in length. Which makes me wonder how your reads can be 250 bp in length? The combined R1 and R2 reads would be longer than the fragment you sequenced.

Are you sure you trimmed all adapters properly? Which method did you use for trimming? And which primer pair did you use for your library preparation? Could it be that you actually sequenced V3 and V4?

Also, I found that if I don’t trim the forward primers from R1 and reverse primers from R2, Dada removes a large portion of my reads as chimeric. So it might be a good idea to add --p-trim-left-f and --p-trim-left-r to your command to prevent future problems.

Kind regards,
Roger

2 Likes

Hi @Bruno_Andrade,
Welcome aboard!
Thanks for providing us with the images and your initial investigation, always helps with troubleshooting.

I think @Roger_Huerlimann is right on point with his question:

I believe when you said V3 2x250 you meant the V3 Illumina chemistry kit and not V3 region, correct?
And if you did indeed target a long region such as V3-V4 it is going to be rather difficult merging them with the 2x250 bp method since that would give us ~ 500-460(V3V4 amplicon)=40bp overlap. With your existing truncating parameters, you are already cutting 80bp in total which leaves you with no overlap --> thus the failure to merge.
With dada2 you are going to need a minimum of 20bp overlap + whatever natural variation you expect from your target amplicon, let's just say another 10bp.
If your data can't meet the minimum overlap requirements using paired-ends, your best is to simply just use your forward reads.

Also a good point! Though if your adapters have already been trimmed (as you mentioned they have) prior to DADA2 you don't need to do anything there. FYI, DADA2 does have its own PhiX filter so in the future you don't have to worry about that either.

4 Likes

Thank you for your reply @Mehrbod_Estaki.
Yes, when I said V3 2x250 I was talking about the V3 Illumina kit.
My amplicon library contains portions of the V3-V4 region of bacterial 16S. I was afraid that the overlaping region would not be enough to merge them after the truncating parameters and yes, when I used the forward reads as SE I was able to recover a huge amont of ASVs, its a pitty to not use the full amplicon fragment though…

2 Likes

Thank you for your help @Roger_Huerlimann.

When I said V3 2x250 I was talking about the illumina sequencing kit/library prepraration kit and not about the hypervariable regions. My set of primers targeted the V3-V4 region and their sequences are:

Forward - CCTACGGGNGGCWGCAG
Reverse - GACTACHVGGGTATCTAATCC

I filtered my reads for adaptors using cutadapt, and I was not able to find any of them in the FastQC files.

Kind regards,
Bruno.

1 Like

Hi @Bruno_Andrade
Glad you're able to get the data to work.

Indeed. though the resolution drop will not be as bad as you may think. That's the trick with longer regions is that you want to make sure you are sequencing 2x300 for proper overlap otherwise it actually is a waste of money (imo)
Good luck!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.