importing from a single file containing both forward and reverse and different sample reads

Hello! I have downloaded a single fastq.gz file which contains both Fw and Rv and different samples reads of PE-runs. Is there a way to import these in qiime (without having R1 and R2 indifferent files?) or should I run any other software previously?

Thanks a lot

How are the reads differentiated? Do the FASTQ headers include information on which direction the read was sequenced in? Can you share the first 10-20 lines of the file? Thanks!

This is how they look like.
Thanks a lot

@SRR11113593.1.1 1 length=251
AAGGTGGGGATGACGTGCCCAGGAATGATGAGTACGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAGTAATCCAATTTAGGAGTATGACAGAAAGCATAAAAATAAAAAAAATAAAAGTCAGACTACAACAGCAGTGGAGTAAGTAGTCAACATTGACACAATACACGGAATTATATCAGATAATGACATGAAAGCATGTAACATAACAATAAATCACGACAAATAACATATTTAGAAACAAG
+SRR11113593.1.1 1 length=251
BCCBAFBBCCCCGGGGGGGGGFFGHHHHHHHHHHHGGHCGHHGGHHGHGGGHGHHHGHHHHHHHGGFEFGE>EE/344B?43?4?44?322B2222B22B2222=22?2220?2DD2/<-0<11111111>110/0<0</./<00/0.0/000;00;00::;00000:::00009.--..0;000000:00009000000000/000:900;090009:00000;0..---/0:00900000000;0/;./
@SRR11113593.1.2 1 length=251
GGGTATCTAATCCGGTTACGAGGGGGGTACACGTCGTGTGTGGATATCGGTGGTAGCGGGGTGCGTGAAAAAAAAGAGATGTACTAACAAAAAATAGAGCGCTAGAAAGTAAAAAAACCAACGAGACACCTTGACCCAAACCCCAAACTAACAATCAAACACTAAACAAACACCCTCCACACAACTAAAACCAATAAAACAATATCCAGCAGATCCTCCAACAAACACACAACCACAGAAACTGCAACCAA
+SRR11113593.1.2 1 length=251
AAAAAFFFFFF4GEGGGEGGEEAECGGGEGHHGFHEC///00/2?222///?B/221///---.-<...;0:---..//00900000//;...-.00000--.-.//////////9-.;..;....;.../9/9/:.9......;.../////9//9/;;.9//9///;A......9...9.;..//////.../9////9./9//9///;.//9////9/;A9..9.;9.9..:.;.//9///////...

Thanks @botellaflotante! It looks like you have what is known as “interleaved” files. Unfortunately, we don’t have support for this directly in QIIME 2 (it has become a bit more uncommon these days, at least from what I have seen here on this forum).

It does however look like cutadapt has support for this: https://cutadapt.readthedocs.io/en/stable/guide.html#interleaved-paired-end-reads

Your QIIME 2 environment already comes with cutadapt, so you could give it a try there, then, once demultiplexed and split up, you could follow a traditional QIIME 2 import.

Keep us posted! :qiime2:

2 Likes

@botellaflotante I think @thermokarst’s advice to use cutadapt directly is the best way to accomplish this.

But I just want to share what I’ve done in the past to quickly split apart interleaved fastq files, in case this serves as a useful backup, or perhaps some inspiration to learn how to use grep!

grep '@.*\.1\.1' -A 3 path-to-interleaved-seqs.fastq | grep -v '^--$' > forward-seqs.fastq
grep '@.*\.1\.2' -A 3 path-to-interleaved-seqs.fastq | grep -v '^--$' > reverse-seqs.fastq

:arrow_up: the first grep grabs headers indicating forward/reverse, and the following 3 lines
the second grep eliminates spacer lines (which grep inserts when the -A option is used)

3 Likes

Thank you! I think qiime cutadapt does not allow the interleaved option, but the grep did the job! Also I had it interleaved because these were many reads downloaded from SRA and this download generated a single interleaved file. I am assuming that there is no need to cut any adapter from these…

Correct. Please see my comment above:

cutadapt supports interleaved demux, q2-cutadapt doesn’t (yet).

I actually found an error with this. The first grep I think it should be '@.*\.1 ' (with a space) Otherwise the output is weird and doesn’t work. Also ‘.1’ alone will result in sequences like ‘.1.2’ or ‘.10.2’ being included in forward…
Thanks again

Sure, you may need to adjust to optimize for the patterns that actually occur in your sequences… grep is a powerful tool, but not an intelligent one!

This pattern will be better: '^@.*.1 ’

1 Like