Hello all (again)! I had to take some time away from my analyses, but I'm back with a fresh new set of issues.
I have a FASTQ file of Sanger sequences I'm trying to import into q2-2019.4. I have made a manifest file (.tsv) that has been verified by Keemei as a valid q2 manifest file. When I go to import, using this code:
File "/Users/haselkornlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/util.py", line 89, in parse_format
format_record = pm.formats[format_str]
KeyError: 'SingleEndFastqMainfestPhred33V2'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/haselkornlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/builtin/tools.py", line 152, in import_data
view_type=input_format)
File "/Users/haselkornlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/result.py", line 206, in import_data
view_type = qiime2.sdk.parse_format(view_type)
File "/Users/haselkornlab/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/util.py", line 91, in parse_format
raise TypeError("No format: %s" % format_str)
TypeError: No format: SingleEndFastqMainfestPhred33V2
An unexpected error has occurred:
No format: SingleEndFastqMainfestPhred33V2
See above for debug info.
I'm stumped on why the manifest file isn't considered a Phred33v2 file. As it refers to Sanger sequences, I know that this is the correct format/offset as per this post (referring to this explanation).
The file path should be correct. I have the manifest file & the FASTQ file in the same working directory. I did check into BOM issues when importing data, so I downloaded BBEdit & changed my encoding from UTF-8 to UTF-16. The line breaks are in CR format (legacy Mac).
Yep, the format name is misspelled. You provided a command above that says: SingleEndFastqMainfestPhred33V2, but the format name is SingleEndFastqManifestPhred33V2. In particular, look at the word "Manifest" in that format name.
So, I'm an idiot. I fixed that & am now getting this error:
qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path Hines-q2-manifest-v2.tsv --output-path single-end-demux.qza --input-format SingleEndFastqManifestPhred33V2
There was a problem importing Hines-q2-manifest-v2.tsv:
Hines-q2-manifest-v2.tsv is not a(n) SingleEndFastqManifestPhred33V2 file:
Filepath on line 1 and column "absolute-filepath" could not be found ($PWD/Desktop/Hines/Data/Hines_AllSeqData_PrimerIntact.fastq/HCl-1_R1.fastq) for sample "HCl-1".
I'm not understanding why my manifest file path isn't linking up with the actual FASTQ file. Does every sample need its own FASTQ file? Or, is it the way I have my file path constructed? Again, the manifest file & the FASTQ sequence file are in the same directory.
However, it still cannot find any of my individual samples in that .fastq file. I keep getting this error:
There was a problem importing Hines-q2-manifest-v4.tsv:
Hines-q2-manifest-v4.tsv is not a(n) SingleEndFastqManifestPhred33V2 file:
Filepath on line 1 and column "absolute-filepath" could not be found ($PWD/Hines_AllSeqData_PrimerIntact.fastq/HCl-1_R1.fastq) for sample "HCl-1".
Do I need to make a .fastq file for each individual sample? I thought the point of the manifest file was to direct q2 to a specific file & have that read each unique sample name/path in that file. Perhaps I'm understanding this incorrectly. I feel like I'm missing something extremely simple that will make sense to me once I get this imported. Is it because I have ".fastq" in the file path both in the actual path as well as the sample name (e.g. ...PrimerIntact**.fastq**/R1_HCl-1**.fastq**)?
Sorry for all the trouble. This is my first foray into programming & bioinformatics, & I feel it is more difficult having to work full time on something completely different (other than this project) while still trying to figure all of this out. I've learned a lot through this journey, but obviously still have a long way to go.
Ah, yeah, that is the problem --- the manifest is for importing already demultiplexed data - that is, data that has one (or two for PE) fastq files per sample. If your reads are multiplexed all samples in one file (or two for PE), then you will need to import and demux. Can you tell me more about where your barcodes are w.r.t. your sequences? That is, are they inline, or are they in a separate fastq file. Thanks!
Ahh ok, yes they are in a seperate .fastq file. Looking at the Moving Pictures Tutorial, it looks like I import both the sequences & the barcodes. That's what I was missing.
A quick question about that importing step: since these aren't EMP sequences, would I use
I'm working with a .fastq file from Sanger sequencing, so I have one file (.fastq) with sample IDs, sequences, & quality scores. I suppose they would be inline as I have only one file, though I'm not really sure I have barcodes, per se. At least, not with the file that I have now.
Thanks for sharing, @jhines1. Unfortunately, we do not have a way for you to work with these data in QIIME 2 (this is the first time I have seen this format in the wild in a few years). Maybe you can demux with some other tool (mothur? QIIME 1?), and then import the demuxed reads into QIIME 2? Sorry I can't be of more help.
So it's a normal fastq file, with read names in this format @JG_SampleName_Read Number
That's super close to the Qiime 1 format for fasta files. >SampleName_Read Number
So, my first thought is to trim / filter low quality reads using a third party program (maybe you have already done this), then convert them into fasta files.
You can can make the read names Qiime 1 compatible using sed 's/JG_//g' Hines.fasta > Hines.q1.fasta
then import into Qiime 2 :qiime2: as shown here.
This is not an officially supported method, but I think it will get the job done. If you feel comfortable trying sed then I think it's worth a try.