Analyzing only the forward sample with mixed orientation reads

aiaoz · November 7, 2022, 4:25pm

Hi all,
I have paired ends sequencing data for the v4 region. The reads are too short to be merged, with a 4bp overlap at most. So, I took this forum's advice and tried to analyze the forward reads only.
However, it turns out that I have both forward and reverse orientations in the same file. Moreover, they appear to be from both ends of the v4 region, i.e. the forward reads are the first half of the region and the reverse are the second half.
I have a third file for each sample, called an index file. I hope this might be helpful but I still haven't found out how. It's in the form of:

@M01759:102:000000000-KCKNN:1:1102:17856:1694 2:N:0:CTTAGGAC+GTAGGAGT
GTTCTACGC
+
BBBFGGGGG
@M01759:102:000000000-KCKNN:1:1102:12654:1695 2:N:0:CTTAGGAC+GTAGGAGT
GTCTGTTAA
+
CCCFGGGGG

The files were demultiplexed by the sequencing center and the adapters etc. removed as well.

I'm at a bit of a loss of what to do and would appreciate any help,
Thanks,
Aia

gregcaporaso · November 21, 2022, 11:54pm

Hi @aiaoz, and welcome to the :qiime2: forum! Sorry for the slow reply - we just realized that no one had responded to this message.

The mixture of forward and reverse reads would cause problems with the typical workflows for getting started with QIIME 2, but we should be able to figure something out. How did you determine that you have both forward and reverse reads in the same file? And have you identified a pattern in how are they structured in that file (e.g., all forward reads, then all reverse reads; alternating forward and reverse reads; randomly mixed forward and reverse reads)?

For one of your samples, can you provide the first ~25 lines of each of the three files you have? You could do this using the head -n 25 command on each of your files. For example:

$ head -n 25 sample1_R1.fastq
...
$ head -n 25 sample1_R2.fastq
...
$ head -n 25 sample1_I1.fastq
...

As far as that third file for each sample, that's your index or barcode reads. That would have been generated during demultiplexing most likely. Typically that wouldn't be delivered with the other files for demultiplexed reads as it's only used in demultiplexing, but it doesn't hurt to have them.

Thanks!

aiaoz · November 26, 2022, 11:34am

We realized that the forward and reverse are mixed using multiple sequence alignment: we aligned several reads from the R1 file along with several reads from a previous sequencing session, and it was obvious that they clustered into two groups. Half of the sequences from the mixed R1 file aligned well with the good R1 sequences, while the other half showed a different pattern.
In addition we did a simple blast of the forward and reverse reads, and you can see to which strand they belong and it was clear tat the pairings are good but the sequences are mixed. The reads are mixed randomly into forward and reverse, so there was no easy fix for it.
We are actually on our way to develop a tool to fix this problem by sorting the pairs into the correct files. While googling I saw that this is something that happens when the wrong library preparation method is used by the sequencing center.

gregcaporaso · November 27, 2022, 6:36pm

That makes sense @aiaoz. If you develop a good way to address this, we'd love to hear about it here in case others run into the same issue.

Good luck!

Nicholas_Bokulich · November 28, 2022, 10:21am

Hi @aiaoz ,

We already have a method in the QIIME 2 plugin RESCRIPt for reorienting mixed-orientation reads — but it currently only works on FASTA data, not FASTQ. Making that edit in RESCRIPt might be easier than a de novo method. VSEARCH also has an orient method that operates on FASTQ, so this should work out of the box.

QIIME 2's q2-cutadapt plugin also allows demultiplexing/re-orientation of mixed-orientation barcoded reads. I think this should work for your case. You can see some relevant discussion here:

system · January 7, 2023, 2:01am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.