Cannot import fastq manifest file

jhines1 · May 19, 2019, 9:37pm

Hello all (again)! I had to take some time away from my analyses, but I'm back with a fresh new set of issues.

I have a FASTQ file of Sanger sequences I'm trying to import into q2-2019.4. I have made a manifest file (.tsv) that has been verified by Keemei as a valid q2 manifest file. When I go to import, using this code:

qiime tools import \
--type 'SampleData[SequencesWithQuality]'
--input-path Hines-q2-manifest-v2.tsv
--output-path single-end-demux.qza
--input-format SingleEndFastqMainfestPhred33V2

I get the following error:

File "/Users/haselkornlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/util.py", line 89, in parse_format
format_record = pm.formats[format_str]
KeyError: 'SingleEndFastqMainfestPhred33V2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/haselkornlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/builtin/tools.py", line 152, in import_data
view_type=input_format)
File "/Users/haselkornlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/result.py", line 206, in import_data
view_type = qiime2.sdk.parse_format(view_type)
File "/Users/haselkornlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/util.py", line 91, in parse_format
raise TypeError("No format: %s" % format_str)
TypeError: No format: SingleEndFastqMainfestPhred33V2

An unexpected error has occurred:

No format: SingleEndFastqMainfestPhred33V2

See above for debug info.

I'm stumped on why the manifest file isn't considered a Phred33v2 file. As it refers to Sanger sequences, I know that this is the correct format/offset as per this post (referring to this explanation).

Here is my manifest file:
Hines-q2-manifest-v2.tsv (42.6 KB)

The file path should be correct. I have the manifest file & the FASTQ file in the same working directory. I did check into BOM issues when importing data, so I downloaded BBEdit & changed my encoding from UTF-8 to UTF-16. The line breaks are in CR format (legacy Mac).

Any ideas?

thermokarst · May 19, 2019, 10:06pm

Hi there @jhines1,

Yep, the format name is misspelled. You provided a command above that says: SingleEndFastqMainfestPhred33V2, but the format name is SingleEndFastqManifestPhred33V2. In particular, look at the word "Manifest" in that format name.

Hope that helps! :qiime2:

jhines1 · May 20, 2019, 2:29pm

So, I'm an idiot. I fixed that & am now getting this error:

qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path Hines-q2-manifest-v2.tsv --output-path single-end-demux.qza --input-format SingleEndFastqManifestPhred33V2
There was a problem importing Hines-q2-manifest-v2.tsv:

Hines-q2-manifest-v2.tsv is not a(n) SingleEndFastqManifestPhred33V2 file:

Filepath on line 1 and column "absolute-filepath" could not be found ($PWD/Desktop/Hines/Data/Hines_AllSeqData_PrimerIntact.fastq/HCl-1_R1.fastq) for sample "HCl-1".

I'm not understanding why my manifest file path isn't linking up with the actual FASTQ file. Does every sample need its own FASTQ file? Or, is it the way I have my file path constructed? Again, the manifest file & the FASTQ sequence file are in the same directory.

thermokarst · May 21, 2019, 9:42pm

Is there a file at that path on your machine? You can check by navigating to the same directory that you ran the import command from, then running:

ls $PWD/Desktop/Hines/Data/Hines_AllSeqData_PrimerIntact.fastq/HCl-1_R1.fastq

jhines1 · May 23, 2019, 12:07am

@thermokarst Ok, so I've tried a few things & it still doesn't work.

For starters, I've changed my absolute file path from

$PWD/Hines/Data/Hines_AllSeqData_PrimerIntact.fastq/R1_sample name.fastq

to

$PWD/Hines_AllSeqData_PrimerIntact.fastq/R1_sample name.fastq

and that worked when I tried

ls $PWD/Hines_AllSeqData...

However, it still cannot find any of my individual samples in that .fastq file. I keep getting this error:

There was a problem importing Hines-q2-manifest-v4.tsv:
Hines-q2-manifest-v4.tsv is not a(n) SingleEndFastqManifestPhred33V2 file:
Filepath on line 1 and column "absolute-filepath" could not be found ($PWD/Hines_AllSeqData_PrimerIntact.fastq/HCl-1_R1.fastq) for sample "HCl-1".

Do I need to make a .fastq file for each individual sample? I thought the point of the manifest file was to direct q2 to a specific file & have that read each unique sample name/path in that file. Perhaps I'm understanding this incorrectly. I feel like I'm missing something extremely simple that will make sense to me once I get this imported. Is it because I have ".fastq" in the file path both in the actual path as well as the sample name (e.g. ...PrimerIntact**.fastq**/R1_HCl-1**.fastq**)?

Sorry for all the trouble. This is my first foray into programming & bioinformatics, & I feel it is more difficult having to work full time on something completely different (other than this project) while still trying to figure all of this out. I've learned a lot through this journey, but obviously still have a long way to go.

Thank you for your guidance & patience.

thermokarst · May 23, 2019, 9:03pm

Ah, yeah, that is the problem --- the manifest is for importing already demultiplexed data - that is, data that has one (or two for PE) fastq files per sample. If your reads are multiplexed all samples in one file (or two for PE), then you will need to import and demux. Can you tell me more about where your barcodes are w.r.t. your sequences? That is, are they inline, or are they in a separate fastq file. Thanks!

jhines1 · May 29, 2019, 1:56pm

Ahh ok, yes they are in a seperate .fastq file. Looking at the Moving Pictures Tutorial, it looks like I import both the sequences & the barcodes. That's what I was missing.

A quick question about that importing step: since these aren't EMP sequences, would I use

qiime tools import
--type 'SampleData[SequencesWithQuality]'
--input-path whatever-file.fastq
--output-path whatever-file.qza
--input-format SingleEndFastqManifestPhred33

then demux with the manifest file? Just want to make sure I'm understanding this correctly.

thermokarst · May 29, 2019, 9:18pm

We can't really do anything until you have a more clear idea as to what you do have in hand.

Maybe you missed my question above, will repeat it here:

No --- as I mentioned above, the manifest format is only for demultiplexed data --- not multiplexed data.

jhines1 · May 31, 2019, 2:04pm

I'm working with a .fastq file from Sanger sequencing, so I have one file (.fastq) with sample IDs, sequences, & quality scores. I suppose they would be inline as I have only one file, though I'm not really sure I have barcodes, per se. At least, not with the file that I have now.

I'm uploading my file to get your take on this:
Hines_AllSeqData_PrimerIntact.fastq (316.9 KB)

thermokarst · May 31, 2019, 9:29pm

Thanks for sharing, @jhines1. Unfortunately, we do not have a way for you to work with these data in QIIME 2 (this is the first time I have seen this format in the wild in a few years). Maybe you can demux with some other tool (mothur? QIIME 1?), and then import the demuxed reads into QIIME 2? Sorry I can't be of more help.

jhines1 · June 1, 2019, 5:49pm

Well shucks. Thanks for the response. I'll see what I can do to get it imported.

Thanks for the help!

colinbrislawn · June 2, 2019, 5:22pm

Hello @jhines1

I took a look at your fastq file, I have a workaround to consider.

For ref, here is what your reads look like:

@JH_HCl_1
AGCCTCCGC...
+
NUXUUUNNU...
@JH_HCl_2
AGCCTCCGCT...
+
UUXUUUIIUUU...

So it's a normal fastq file, with read names in this format
@JG_SampleName_Read Number
That's super close to the Qiime 1 format for fasta files.
>SampleName_Read Number

So, my first thought is to trim / filter low quality reads using a third party program (maybe you have already done this), then convert them into fasta files.
You can can make the read names Qiime 1 compatible using
sed 's/JG_//g' Hines.fasta > Hines.q1.fasta
then import into Qiime 2 :qiime2: as shown here.

This is not an officially supported method, but I think it will get the job done. If you feel comfortable trying sed then I think it's worth a try.

Let me know what you try next!

Colin