importing from a single file containing both forward and reverse and different sample reads

botellaflotante · July 8, 2020, 3:18pm

Hello! I have downloaded a single fastq.gz file which contains both Fw and Rv and different samples reads of PE-runs. Is there a way to import these in qiime (without having R1 and R2 indifferent files?) or should I run any other software previously?

Thanks a lot

thermokarst · July 9, 2020, 2:48pm

How are the reads differentiated? Do the FASTQ headers include information on which direction the read was sequenced in? Can you share the first 10-20 lines of the file? Thanks!

botellaflotante · July 9, 2020, 6:45pm

This is how they look like.
Thanks a lot

@SRR11113593.1.1 1 length=251
AAGGTGGGGATGACGTGCCCAGGAATGATGAGTACGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAGTAATCCAATTTAGGAGTATGACAGAAAGCATAAAAATAAAAAAAATAAAAGTCAGACTACAACAGCAGTGGAGTAAGTAGTCAACATTGACACAATACACGGAATTATATCAGATAATGACATGAAAGCATGTAACATAACAATAAATCACGACAAATAACATATTTAGAAACAAG
+SRR11113593.1.1 1 length=251
BCCBAFBBCCCCGGGGGGGGGFFGHHHHHHHHHHHGGHCGHHGGHHGHGGGHGHHHGHHHHHHHGGFEFGE>EE/344B?43?4?44?322B2222B22B2222=22?2220?2DD2/<-0<11111111>110/0<0</./<00/0.0/000;00;00::;00000:::00009.--..0;000000:00009000000000/000:900;090009:00000;0..---/0:00900000000;0/;./
@SRR11113593.1.2 1 length=251
GGGTATCTAATCCGGTTACGAGGGGGGTACACGTCGTGTGTGGATATCGGTGGTAGCGGGGTGCGTGAAAAAAAAGAGATGTACTAACAAAAAATAGAGCGCTAGAAAGTAAAAAAACCAACGAGACACCTTGACCCAAACCCCAAACTAACAATCAAACACTAAACAAACACCCTCCACACAACTAAAACCAATAAAACAATATCCAGCAGATCCTCCAACAAACACACAACCACAGAAACTGCAACCAA
+SRR11113593.1.2 1 length=251
AAAAAFFFFFF4GEGGGEGGEEAECGGGEGHHGFHEC///00/2?222///?B/221///---.-<...;0:---..//00900000//;...-.00000--.-.//////////9-.;..;....;.../9/9/:.9......;.../////9//9/;;.9//9///;A......9...9.;..//////.../9////9./9//9///;.//9////9/;A9..9.;9.9..:.;.//9///////...

thermokarst · July 13, 2020, 4:34pm

Thanks @botellaflotante! It looks like you have what is known as "interleaved" files. Unfortunately, we don't have support for this directly in QIIME 2 (it has become a bit more uncommon these days, at least from what I have seen here on this forum).

It does however look like cutadapt has support for this: User guide — Cutadapt 3.7 documentation

Your QIIME 2 environment already comes with cutadapt, so you could give it a try there, then, once demultiplexed and split up, you could follow a traditional QIIME 2 import.

Keep us posted!

Nicholas_Bokulich · July 13, 2020, 5:01pm

@botellaflotante I think @thermokarst’s advice to use cutadapt directly is the best way to accomplish this.

But I just want to share what I’ve done in the past to quickly split apart interleaved fastq files, in case this serves as a useful backup, or perhaps some inspiration to learn how to use grep!

grep '@.*\.1\.1' -A 3 path-to-interleaved-seqs.fastq | grep -v '^--$' > forward-seqs.fastq
grep '@.*\.1\.2' -A 3 path-to-interleaved-seqs.fastq | grep -v '^--$' > reverse-seqs.fastq

the first grep grabs headers indicating forward/reverse, and the following 3 lines
the second grep eliminates spacer lines (which grep inserts when the -A option is used)

botellaflotante · July 17, 2020, 7:20pm

Thank you! I think qiime cutadapt does not allow the interleaved option, but the grep did the job! Also I had it interleaved because these were many reads downloaded from SRA and this download generated a single interleaved file. I am assuming that there is no need to cut any adapter from these…

thermokarst · July 17, 2020, 7:56pm

Correct. Please see my comment above:

cutadapt supports interleaved demux, q2-cutadapt doesn't (yet).

botellaflotante · July 17, 2020, 8:58pm

I actually found an error with this. The first grep I think it should be '@.*\.1 ' (with a space) Otherwise the output is weird and doesn’t work. Also ‘.1’ alone will result in sequences like ‘.1.2’ or ‘.10.2’ being included in forward…
Thanks again

Nicholas_Bokulich · July 17, 2020, 9:58pm

Sure, you may need to adjust to optimize for the patterns that actually occur in your sequences… grep is a powerful tool, but not an intelligent one!

This pattern will be better: '^@.*.1 ’

system · August 18, 2020, 4:00am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.