Importing demultiplexed fastq data into qiime2-2017.7


I did Miseq pair-ended sequencing. I used other softwares to merge and demultiplex the sequences. Now, I am trying to import sequence data (each sample has one fastq file) into QIIME2.

My command:

qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path /Volumes/xxxx/V4_QIIME2test \
  --output-path /Volumes/xxxx/V4_single-end-demux.qza \
  --source-format SingleEndFastqManifestPhred33

The manifest file:

ValueError: InPath('/var/folders/zr/vqccwhr90rl1hdcykhvdly100000gp/T/q2-SingleLanePerSampleSingleEndFastqDirFmt-x_5h6kva/BS_0_2_2_4_L001_R1_001.fastq.gz') is not formatted as a FastqGzFormat file.

All my sequence files are fastq format. Does anybody know how to fix it? Thanks.

Hi @eDNA! The file that QIIME 2 is complaining about has the name BS_0_2_2_4_L001_R1_001.fastq.gz, which looks like a pretty different naming scheme compared to the one example sample you posted above (BS_0_1_1.fastq). We typically see filenames that are pretty consistently named — are you sure your manifest file has the right filenames in it? Can you please post your complete manifest file, and a listing of the files in the data directory (could be a screenshot, or ls or tree), that will help us with troubleshooting this problem. Thanks!

1 Like


Yes, my manifest file has the right filenames in it.

The listing of files in the data directory
$ ls
BS_0_1_1.fastq BS_15_1_3.fastq BS_30_2_1.fastq
BS_0_1_2.fastq BS_15_2_1.fastq BS_30_2_2.fastq
BS_0_1_3.fastq BS_15_2_2.fastq BS_30_2_3.fastq
BS_0_2_1.fastq BS_15_2_3.fastq BS_500_1_1.fastq
BS_0_2_2.fastq BS_30_1_1.fastq BS_500_1_2.fastq
BS_0_2_3.fastq BS_30_1_2.fastq BS_500_1_3.fastq
BS_15_1_1.fastq BS_30_1_2_2.fastq BS_Blank1.fastq
BS_15_1_2.fastq BS_30_1_3.fastq BS_NC1.fastq

I follow the “Fastq manifest” formats tutorial. The “BS_0_2_2_4_L001_R1_001.fastq.gz” looks like a name in the “Casava 1.8 single-end demultiplexed fastq” tutorial. Did the command thought I was following the “Casava 1.8 single-end demultiplexed fastq” tutorial? It mentioned “q2-SingleLanePerSampleSingleEndFastqDirFmt” in the error.

Hi @eDNA, can you please also post your complete manifest file? Thanks!

Hi Matthew,

Can the reason be the sequence problem? My sequences are in lower case. Thanks.

Hello Matthew,

I used one sample to test. I think the error was caused by sequence data (lower case in DNA sequence).
--------------- 1st test ---------------------
My command:
qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path ImportTest2
–output-path Test2_single-end-demux.qza
–source-format SingleEndFastqManifestPhred33

ImportTest2 file:

The 1st sequence in VP_15_2_3.fastq:
@M_M03670:6:000000000-BCN4V:1:1101:17050:1464_SUB_SUB_CMP reverse_score=80.0; forward_score=76.0; direction=reverse; avg_quality=39.7032967033; forward_tag=gcatcatc; seq_length_ori=182; seq_length=124; sample=VP_15_2_3; forward_match=aagggcaccacaagaacgc; experiment=eDNA; status=full; location=VP_15; tail_quality=35.4; reverse_match=ccacctatcacacaatcatg; reverse_tag=gcatcatc; forward_primer=aagggcaccacaagaacgc; grab=VP_15_2; head_quality=35.2; reverse_primer=ccacctatcacayaatcatg; mid_quality=40.2469135802; 1:N:0:4

Error: ValueError: InPath(’/var/folders/zr/vqccwhr90rl1hdcykhvdly100000gp/T/q2-SingleLanePerSampleSingleEndFastqDirFmt-87too7ky/VP_15_2_3_0_L001_R1_001.fastq.gz’) is not formatted as a FastqGzFormat file.

-------------- 2nd test-----------
qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path ImportTest1
–output-path Test1_single-end-demux.qza
–source-format SingleEndFastqManifestPhred33


The 1st sequence in the sequence file:
@M_M03670:6:000000000-BCN4V:1:1101:15780:1331_SUB_SUB_CMP seq_length=132; sample=VP_500_1_1; experiment=eDNA; location=VP_500; grab=VP_500_1; forward_score=76.0; forward_tag=cgatgaca; reverse_tag=cgatgaca; forward_match=aagggcaccacaagaacgc; forward_primer=aagggcaccacaagaacgc; reverse_score=80.0; status=full; direction=reverse; reverse_match=ccacctatcacataatcatg; reverse_primer=ccacctatcacayaatcatg; head_quality=34.6; avg_quality=39.5052083333; seq_length_ori=192; tail_quality=34.6; mid_quality=40.0755813953; 1:N:0:4

The command run successfully.

@jairideout I found that you mentioned lower case issue in DNA sequence in other posters.

My demultiplexed data (upper case) is in one fastq file (e.g. the VP_P1_assigned.fastq I used for 2nd test). I split the file based on the “sample” attribute in each sequence to obtain a fastq file for each sample so that I can import them into QIIME2, but the sequences are in lower case. Does anyone have any idea how to import my data in QIIME2? I am very keen to use QIIME 2 to analyze my data. Thank you.


I did another test. I converted the sequences in VP_15_2_3.fastq to uppercase and it can be imported. I will have to convert each of my 280 fastq files before I use QIIME2.
My problem has been solved so far. Looking forward to getting results from QIIME2.


Great, thanks for following up @eDNA. We have an existing open issue on one of our bug trackers about the lowercase sequence issue. We will update this thread when a solution has been implemented, but at this time we have no ETA on when that will happen (I would assumed before for the end of 2018). Sounds like you have a workaround in place to get you moving forward. Thanks!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.