how to import these reads: Mixed-oriented or not?

Hi, I received an old multiplexed data set. I imported ,demultiplexed and denoised the reads. but classifying with sklearn-classifier didn't work The job was killed. As I read in several posts that Mixed oriented reads were a problem for sklearn, I looked back at my fastq-files and found in both 2.1-file ( which I renamed forward in importing) as in the 2.2-fastq-file, both the forward and reverse primer used.(marked in yellow)
image

As I understood the 2:N:0 would indicate reverse reads and this is in the heading of all sequences in the 2.2 file But as you can see with both primers. When I tried to use Cutadapt with the -p-mixed-oriented flag : it responded that that could be used with dual-indexed reads. I was not aware that I had those?
So My first question is : Are these mixed oriented reads ?
and my second one is : What is the right way to process these?

Thx in advance Wieneke

should I use this first then?

This was recommended by Mike in this post : Discarded results produced by q2-cutadapt's trim-paired when working with mixed-orientation reads

Hello Wieneke,

I'm not sure I have a perfect solution to this problem...

Are these reads in a random orientation, or is this a pair of R1 and R2 files that have been 'interleaved' so the one fastq file includes both the forward and reverse Illumina reads?

If so, you can use the reformat.sh tool from the BBTools package to deinterleave a fastq file, and perhaps your fasta file too.
reformat.sh in=reads.fq out1=read1.fq out2=read2.fq

This should fix interleaved reads, but will not help with mixed orientation reads.

The big question here is 'how did my reads get like this,' which we have to answer before we can move on to 'how do I convert them back.'

Hi Colin, thx for your reply. I'm not sure if I can answer the question if these are interleaved or mixed orientation reads. This is what I know:
This is a dataset which was sequenced in 2017 on the Illumina HiSeq PE300 platform with V4 primers. As usual, all the people analysing at the time have moved on in science and are not available anymore. I got the files (4 libraries with each 2 files named 2.1 and 2.2) and a metafile containing 1 barcode per sample (the first 8 nucleotides of each read). These are fastq.files, but I deleted the lines with quality information in the post to get a better over view.
I really hope that this is enogh information to dissect which type of problem it is.



image

Classic. Good luck to them! :clinking_glasses:

Good, fastq files are the ideal input for our read repair tools. But I'm not sure that's needed.

As far as I can tell, these are normal paired end Illumina sequencing reads.
Files ending in _1.fastq.bz2 are your forward reads (confirmed by 1:N: in the read header), and files ending in _2.fastq.bz2 are your reverse reads (confirmed by 2:N: in the header).

When processing these reads, did you use DADA2 or vsearch or deblur? Where you able to get them to pair?

Maybe I'm getting ahead of myself. How did you import these demultiplexed reads into Qiime2?

I'm thinking the reads themselves may be fine, but there some other issue in the pipeline to resolve and I'm looking for clues!

But how can it be then that there are reverse primers in the 1:0 reads and forward primers in the 2:0 reads?
image

I used the "standard" pipeline with DADA2 (version 2021.8) renaming the sequences to forward and reverse in directory muxed-pe-barcode-in-seq-L26
qiime tools import --type MultiplexedPairedEndBarcodeInSequence --input-path muxed-pe-barcode-in-seq-L26 --output-path multiplexed-seqsL26.qza
then demultiplexing
demultiplexed-seqsL26.qzv (317.4 KB)

qiime cutadapt demux-paired --i-seqs multiplexed-seqsL26.qza --m-forward-barcodes-file metadata_library26.tsv --m-forward-barcodes-column forward-barcode-seq --m-reverse-barcodes-file metadata_library26.tsv --m-reverse-barcodes-column reverse-barcode-seq --o-per-sample-sequences demultiplexed-seqsL26.qza --o-untrimmed-sequences untrimmedL26.qza --verbose
with barcode file
image

denoising-statsL26.qzv (1.2 MB)

Yes, it's strange that both your 1:N:0 forward and 2:N:0 reverse reads include both the forward and reverse primers in them. This is what made me think these were 'interleaved' by some upstream program.

Given that you can't get the original BCL files and demultiplex again, my only idea would be to run that reformat.sh tool and see what it can produce.

I think I may be stumped! Have any other @moderators seen this before?

Hi @Wieneke, I think you can go ahead and import them as you normally would and then follow these two posts from the same thread.

Let us know if this works.

1 Like

Thx a lot both for your help. I gonna try and report back.

Hi Mike, I was going to try your solution. But it is still not so clear to me. ( I'm not that experienced wit QIIME. :grimacing:
In the protocol described in the solution it seems demultiplexed reads are used.
image
But after importing my reads are still multiplexed. The demultiplexing is already the steps were things go wrong isn't it? Because is demux otherwise not only keeping the reads with the barcodes belonging to the forward files in the forward.fastq.gz file and vice versa in the reverse file ?

Is there a way that you can filter/ separate forward and reverse reads out based on the primers before the demultiplexing step? That has to be done without trimming then, otherwise you would lose the barcodes. But then again how do you know which forward and reverse read belong together as you then fiddle with the order of the reads?

A lot of questions for one post maybe , but I hope it makes clear what/ where I don't get my head around.I hope it makes sense. Wieneke

Hi @Wieneke,

Running the cutadapt command is only removing the primers from the sequences, not demultiplexing. Though cutadapt has some demultiplexing options. If those are not appropriate for your data, you can try this approach:

-Mike

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.