Thank you in advance for your help! I was given some paired end 16S sequences that were run on miseq that have already been paired and demultiplexed, and so there is only one file for each (no R1, R2).
In this format, there is one fastq.gz file for each sample in the study, and the file name includes the sample identifier. The file name for a single sample might look like L2S357_15_L001_R1_001.fastq.gz. The underscore-separated fields in this file name are the sample identifier, the barcode sequence or a barcode identifier, the lane number, the read number, and the set number.
however, it states that there should be an R1/R2 for each, and seems to be for unpaired data. There is also the import option for single end, however, these were initially paired end sequences so I wasn’t sure if that would be appropriate.
Is there an appropriate way for me to import the files that I have? It may be difficult for me to obtain the data at a different stage of processing.
Because your data has already been joined, you can treat it as single-end data. My suggestion is to use the fastq manifest format format, although many options would work. My manifest file would look like this:
Thank you so much for getting back to me! A couple of follow-up questions:
--I don't have any R1/R2 for the samples, there is just one file for each
--I specifically want to use dada2, does this mean I cannot proceed forward with the current files I've been give, and that I need to obtain the un-joined fastq files?
I really appreciate your help, it's very enlightening!
Correct - DADA2 uses the quality scores when processing your reads. Quality scores of joined reads are a bit of a mystery --- some joining algorithms might sum the overlapping nt scores, others might average them, while others might use a placeholder value.
If you don't have access to the unjoined reads, you can proceed with denoising with deblur, which does not use the quality scores.
Luckily I was able to (slowly!) obtain the un-joined files. Thank you! Now I am working on importing them!
Unfortunately I am getting an error!
An unexpected error has occurred:
Forward and reverse reads must be provided exactly one time each for each sample. The following samples had forward but not reverse read fastq files: Sample1_R1
And then it lists what looks like all of them. However when I search for some of the samples that are listed as only having forward reads within my manifest, they are there and have reverse reads as well. So I’m confused as to how to proceed.
Please advise! If possible it is fairly urgent.
Thank you so much I sincerely appreciate that you offer this forum for help.
By the way, here is the command I used and some of the rest of the extremely long error
compressed-demux]$ qiime tools import --type ‘SampleData[PairedEndSequencesWithQuality]’ --input-path manifest.txt --output-path …/paired-end-demux.qza --source-format PairedEndFastqManifestPhred33
Traceback (most recent call last):
File “/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2cli/tools.py”, line 116, in import_data
view_type=source_format)
File “/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/result.py”, line 219, in import_data
return cls.from_view(type, view, view_type, provenance_capture)
File “/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/result.py”, line 244, in _from_view
result = transformation(view)
File “/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/qiime2/core/transform.py”, line 70, in transformation
new_view = transformer(view)
File “/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py”, line 338, in _8
single_end=False)
File “/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py”, line 268, in _fastq_manifest_helper
absolute=True)
File “/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py”, line 158, in _parse_and_validate_manifest
_validate_paired_end_fastq_manifest_directions(manifest)
File “/home/abganz/.conda/envs/qiime2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py”, line 240, in _validate_paired_end_fastq_manifest_directions
% ', '.join(forward_but_no_reverse))
and here is what my manifest file looks like (had to change the sample name here for privacy reasons)
This error is popping up because the file-path for both the forward and reverse reads are leading to the exact same file. The same file can't be both forward and reverse. If you got the original unjoined reads you should have 2 files per sample.
You just need to make sure those 2 files are being provided in your manifest, such as @colinbrislawn example from above:
Thank you so much for your quick response! I accidentally changed it when I tried to change the name for privacy, but here is what it actually looks like:
To me, this looks the same as the example. Thank you for your help. also is the .gz a problem? Should I unzip them? in the example they look like they can be zipped. https://docs.qiime2.org/2018.6/tutorials/importing/
No problem @ariel! We’ve all been there, when you have the least amount of time, the most basic things won’t work
You’re almost there, now though your sample-ids are different for the same pair.
Have a close look at the example again, the sample-id must be identical while the absolute-filepath to your forward and reverse must lead to their respective files. That way the script will know to take $PWD/HMP2_J008_R1.fastq.gz (forward) and $PWD/HMP2_J008_R2.fastq.gz (reverse) and pair them under the sample-id HMP2_J008 (or whatever you want to call it, doesn’t matter).
And no need to unzip your files, they will be fine the way they are.
Hope that helps