merging R1 and R2 reads with mixed-orientation files

azel2 · July 9, 2019, 5:48pm

Hello qiime2 community,

I have a couple of questions regarding paired-end reads and the joining/merging step with dada2.

I am working with fastq files from a paired-end MiSeq 16S amplicon run (V4 region). These files have a format that I have not seen before in that the R1.fastq and R2.fastq files seem to have a mixture of orientations (R1.fastq file contains both forward and reverse reads, as does R2.fastq file).

For example, within the R1.fastq file, the first sequence ID is

@D00420:170:HYC5WBCXY:1:1101:1457:2198 1:N:0:AGTCAA

And the second sequence ID is

@D00420:170:HYC5WBCXY:1:1101:1616:2223 2:N:0:AGTCAA

I have searched on the qiime2 forum and the closest thread relating to my issue was this one posted by Martin in December 2017. I am wondering if we used the same sequencing center.

However, his main problem was that he still had barcodes in his sequences, which made it difficult for him to import the data into qiime2. Mine do not seem to have any non-biological sequences (barcodes, primers, or linkers) within the sequences themselves.

I went ahead and imported and ran a complete analysis with these files using qiime2, version 2019.4, and everything ran smoothly. I obtained a feature table that seems consistent with what I was expecting from the samples I worked with.

I have 2 main questions:

Based on the sequence ID, am I correct in interpreting that these files have both forward and reverse reads?
If these files are mixed, would the joining step during dada2 (the denoiser I used for this analysis) correctly pair sequences from the R1.fastq and R2.fastq file, even if the files themselves have a mix of forward and reverse reads?

I have always thought that the R1.fastq must only contain forward reads, and R2.fastq must only contain reverse reads, and that all sequences in the R2.fastq file are reverse complimented during the joining step. I never heard of mixed-files such as the ones I am working with, and am hoping someone out there has experience with this type and can provide me with some clarification.

Thank you for any info you can provide!
Anna

Nicholas_Bokulich · July 11, 2019, 12:17pm

Yes, that is my interpretation as well

as long as the "forward" and "reverse" files contain the correct pairs. If you did not get an error, I would not worry about it.

Perhaps your sequencing center is finding reads that are in the forward reads file that are actually in the reverse orientation and swapping with the reverse reads file? Otherwise I am not sure why your forward and reverse files would contain a mix of forward and reverse reads. If they are doing that, perfect! You should figure out what they did and this may be useful to others on this forum — every now and then mixed orientation reads cause problems for some forum users.

azel2 · July 11, 2019, 5:28pm

Thank you Nicholas! Yes I was able to run everything without errors, so qiime2/dada2 must be able to merge these reads appropriately.

I’m not exactly sure how the center generates these types of files. They sent me a protocol with the following information that is supposed to explain why this happens, but I don’t fully understand the explanation. Perhaps others may understand better and be able to clarify? I asked the sequencing center for clarification, but they simply referred me back to the written protocol they sent me, so I wasn’t able to get more details from them unfortunately.

To keep amplification bias to a minimum, we do not use long concatamer primers as part of Illumina data (ie 50bp of linker and barcode and a 20bp primer). We do create actual libraries out of each of our individual amplicons. This results in the amplicons being found in both 5’-3’ as usual and 3’-5’ orientation in the r1 and r2 files, this is normal for ligated libraries. Note the R1 and R2 are both in the 5’-3’ orientation as raw files.

Nicholas_Bokulich · July 11, 2019, 5:33pm

Sounds like they may be doing what I suspected:

What a helpful sequencing center! They delivered things to you in such a way that things went seamlessly... usually mixed-orientation reads create havoc with denoising/clustering, and sometime taxonomy classification.

azel2 · July 11, 2019, 8:20pm

Thank you so much for your help Nicholas! I appreciate the clarification and can feel confident now that my run went well. Thank you!

system · August 12, 2019, 2:31am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.