Importing paired end reads that have been both paired and demultiplexed

ariel · July 25, 2018, 7:43pm

Hello,

Thank you in advance for your help! I was given some paired end 16S sequences that were run on miseq that have already been paired and demultiplexed, and so there is only one file for each (no R1, R2).

When looking at the different import possibilities, this one seems to be appropriate:
Casava 1.8 single-end demultiplexed fastq
Format description

In this format, there is one fastq.gz file for each sample in the study, and the file name includes the sample identifier. The file name for a single sample might look like L2S357_15_L001_R1_001.fastq.gz. The underscore-separated fields in this file name are the sample identifier, the barcode sequence or a barcode identifier, the lane number, the read number, and the set number.

however, it states that there should be an R1/R2 for each, and seems to be for unpaired data. There is also the import option for single end, however, these were initially paired end sequences so I wasn't sure if that would be appropriate.

Is there an appropriate way for me to import the files that I have? It may be difficult for me to obtain the data at a different stage of processing.

Thank you!
Ariel

colinbrislawn · July 26, 2018, 7:41pm

Hello Ariel,

Great question!

Because your data has already been joined, you can treat it as single-end data. My suggestion is to use the fastq manifest format format, although many options would work. My manifest file would look like this:

sample-id,absolute-filepath,direction
sample-1,$PWD/some/filepath/sample1_R1.fastq,forward
sample-1,$PWD/some/filepath/sample1_R2.fastq,reverse

And then you could use --type 'SampleData[PairedEndSequencesWithQuality]'

Let me know if this helps!

Colin

P.S. Some plugins (most notably dada2) will want you to have un-joined fastq files. Passing in reads before joining is another good option.

ariel · July 26, 2018, 9:27pm

Hi Colin,

Thank you so much for getting back to me! A couple of follow-up questions:

colinbrislawn:

My manifest file would look like this:

sample-id,absolute-filepath,direction
sample-1,$PWD/some/filepath/sample1_R1.fastq,forward
sample-1,$PWD/some/filepath/sample1_R2.fastq,reverse

--I don't have any R1/R2 for the samples, there is just one file for each

--I specifically want to use dada2, does this mean I cannot proceed forward with the current files I've been give, and that I need to obtain the un-joined fastq files?

I really appreciate your help, it's very enlightening!

thermokarst · July 27, 2018, 12:29am

In QIIME 2 we call these sequences "Joined."

Correct - DADA2 uses the quality scores when processing your reads. Quality scores of joined reads are a bit of a mystery --- some joining algorithms might sum the overlapping nt scores, others might average them, while others might use a placeholder value.

If you don't have access to the unjoined reads, you can proceed with denoising with deblur, which does not use the quality scores.

Keep us posted! :qiime2:

colinbrislawn · July 27, 2018, 4:28pm

Hello Ariel,

Whoops! I mean to post this:

sample-id,absolute-filepath,direction
sample-1,$PWD/some/filepath/sample1_joined.fastq,forward
sample-1,$PWD/some/filepath/sample2_joined.fastq,forward

Based on feedback from Matt, I wonder if we should replace forward with joined, like this

sample-1,$PWD/some/filepath/sample1_joined.fastq,joined

Of course if you want to use dada2, then you can simply follow the examples online that use the forward and reverse reads.

Sorry to confuse you!

Colin

ariel · August 22, 2018, 4:04am

Hi Colin,

Luckily I was able to (slowly!) obtain the un-joined files. Thank you! Now I am working on importing them!

Unfortunately I am getting an error!

An unexpected error has occurred:

Forward and reverse reads must be provided exactly one time each for each sample. The following samples had forward but not reverse read fastq files: Sample1_R1

And then it lists what looks like all of them. However when I search for some of the samples that are listed as only having forward reads within my manifest, they are there and have reverse reads as well. So I'm confused as to how to proceed.

Please advise! If possible it is fairly urgent.

Thank you so much I sincerely appreciate that you offer this forum for help.

ariel · August 22, 2018, 4:05am

By the way, here is the command I used and some of the rest of the extremely long error

compressed-demux]$ qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path manifest.txt --output-path ../paired-end-demux.qza --source-format PairedEndFastqManifestPhred33
Traceback (most recent call last):
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2cli/tools.py", line 116, in import_data
view_type=source_format)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/result.py", line 219, in import_data
return cls.from_view(type, view, view_type, provenance_capture)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/result.py", line 244, in _from_view
result = transformation(view)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py", line 338, in _8
single_end=False)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py", line 268, in _fastq_manifest_helper
absolute=True)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py", line 158, in _parse_and_validate_manifest
_validate_paired_end_fastq_manifest_directions(manifest)
File "/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py", line 240, in _validate_paired_end_fastq_manifest_directions
% ', '.join(forward_but_no_reverse))

and here is what my manifest file looks like (had to change the sample name here for privacy reasons)

sample-id,absolute-filepath,direction
Sample1,$PWD/Sample1.fastq.gz,forward
Sample1,$PWD/Sample1.fastq.gz,reverse

Mehrbod_Estaki · August 22, 2018, 4:24am

@ariel,

This error is popping up because the file-path for both the forward and reverse reads are leading to the exact same file. The same file can't be both forward and reverse. If you got the original unjoined reads you should have 2 files per sample.
You just need to make sure those 2 files are being provided in your manifest, such as @colinbrislawn example from above:

ariel · August 22, 2018, 4:54am

Hi Mehrbod,

Thank you so much for your quick response! I accidentally changed it when I tried to change the name for privacy, but here is what it actually looks like:

(names changed less)

sample-id,absolute-filepath,direction
HMP2_J008_R1,$PWD/HMP2_J008_R1.fastq.gz,forward
HMP2_J008_R2,$PWD/HMP2_J008_R2.fastq.gz,reverse

To me, this looks the same as the example. Thank you for your help. also is the .gz a problem? Should I unzip them? in the example they look like they can be zipped. Importing data — QIIME 2 2018.6.0 documentation

Mehrbod_Estaki · August 22, 2018, 6:09am

No problem @ariel! We've all been there, when you have the least amount of time, the most basic things won't work
You're almost there, now though your sample-ids are different for the same pair.
Have a close look at the example again, the sample-id must be identical while the absolute-filepath to your forward and reverse must lead to their respective files. That way the script will know to take $PWD/HMP2_J008_R1.fastq.gz (forward) and $PWD/HMP2_J008_R2.fastq.gz (reverse) and pair them under the sample-id HMP2_J008 (or whatever you want to call it, doesn't matter).
And no need to unzip your files, they will be fine the way they are.
Hope that helps

ariel · August 22, 2018, 6:10am

Thank you!! You are so right!