Importing demultiplexed fastq data into qiime2-2017.7

eDNA · August 21, 2017, 6:55pm

Hello,

I did Miseq pair-ended sequencing. I used other softwares to merge and demultiplex the sequences. Now, I am trying to import sequence data (each sample has one fastq file) into QIIME2.

My command:

qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path /Volumes/xxxx/V4_QIIME2test \
  --output-path /Volumes/xxxx/V4_single-end-demux.qza \
  --source-format SingleEndFastqManifestPhred33

The manifest file:

sample-id,absolute-filepath,direction
BS_0_1_1,/Volumes/xxxx/BS_P1_V4/BS_0_1_1.fastq,forward
......

Error:
ValueError: InPath('/var/folders/zr/vqccwhr90rl1hdcykhvdly100000gp/T/q2-SingleLanePerSampleSingleEndFastqDirFmt-x_5h6kva/BS_0_2_2_4_L001_R1_001.fastq.gz') is not formatted as a FastqGzFormat file.

All my sequence files are fastq format. Does anybody know how to fix it? Thanks.

thermokarst · August 22, 2017, 5:09pm

Hi @eDNA! The file that QIIME 2 is complaining about has the name BS_0_2_2_4_L001_R1_001.fastq.gz, which looks like a pretty different naming scheme compared to the one example sample you posted above (BS_0_1_1.fastq). We typically see filenames that are pretty consistently named --- are you sure your manifest file has the right filenames in it? Can you please post your complete manifest file, and a listing of the files in the data directory (could be a screenshot, or ls or tree), that will help us with troubleshooting this problem. Thanks!

eDNA · August 22, 2017, 5:46pm

Hey

Yes, my manifest file has the right filenames in it.

The listing of files in the data directory
$ ls
BS_0_1_1.fastq BS_15_1_3.fastq BS_30_2_1.fastq
BS_0_1_2.fastq BS_15_2_1.fastq BS_30_2_2.fastq
BS_0_1_3.fastq BS_15_2_2.fastq BS_30_2_3.fastq
BS_0_2_1.fastq BS_15_2_3.fastq BS_500_1_1.fastq
BS_0_2_2.fastq BS_30_1_1.fastq BS_500_1_2.fastq
BS_0_2_3.fastq BS_30_1_2.fastq BS_500_1_3.fastq
BS_15_1_1.fastq BS_30_1_2_2.fastq BS_Blank1.fastq
BS_15_1_2.fastq BS_30_1_3.fastq BS_NC1.fastq

I follow the “Fastq manifest” formats tutorial. The "BS_0_2_2_4_L001_R1_001.fastq.gz" looks like a name in the "Casava 1.8 single-end demultiplexed fastq" tutorial. Did the command thought I was following the "Casava 1.8 single-end demultiplexed fastq" tutorial? It mentioned "q2-SingleLanePerSampleSingleEndFastqDirFmt" in the error.

thermokarst · August 22, 2017, 6:53pm

Hi @eDNA, can you please also post your complete manifest file? Thanks!

eDNA · August 22, 2017, 7:02pm

Hi Matthew,

Can the reason be the sequence problem? My sequences are in lower case. Thanks.

eDNA · August 23, 2017, 4:33pm

Hello Matthew,

I used one sample to test. I think the error was caused by sequence data (lower case in DNA sequence).
--------------- 1st test ---------------------
My command:
qiime tools import
--type 'SampleData[SequencesWithQuality]'
--input-path ImportTest2
--output-path Test2_single-end-demux.qza
--source-format SingleEndFastqManifestPhred33

ImportTest2 file:
sample-id,absolute-filepath,direction
VP_15_2_3,/Users/xxx/Test2/VP_15_2_3.fastq,forward

The 1st sequence in VP_15_2_3.fastq:
@M_M03670:6:000000000-BCN4V:1:1101:17050:1464_SUB_SUB_CMP reverse_score=80.0; forward_score=76.0; direction=reverse; avg_quality=39.7032967033; forward_tag=gcatcatc; seq_length_ori=182; seq_length=124; sample=VP_15_2_3; forward_match=aagggcaccacaagaacgc; experiment=eDNA; status=full; location=VP_15; tail_quality=35.4; reverse_match=ccacctatcacacaatcatg; reverse_tag=gcatcatc; forward_primer=aagggcaccacaagaacgc; grab=VP_15_2; head_quality=35.2; reverse_primer=ccacctatcacayaatcatg; mid_quality=40.2469135802; 1:N:0:4
gtggagcatgtggcttaatttgactcaacgcagggaatcttaccgggtccggacacactgaggattgacagattaaagcggttgtcagtcttatgactggctccgttgaaagttacagctcttt
+
GGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHH

Error: ValueError: InPath('/var/folders/zr/vqccwhr90rl1hdcykhvdly100000gp/T/q2-SingleLanePerSampleSingleEndFastqDirFmt-87too7ky/VP_15_2_3_0_L001_R1_001.fastq.gz') is not formatted as a FastqGzFormat file.

-------------- 2nd test-----------
qiime tools import
--type 'SampleData[SequencesWithQuality]'
--input-path ImportTest1
--output-path Test1_single-end-demux.qza
--source-format SingleEndFastqManifestPhred33

ImportTest1:
sample-id,absolute-filepath,direction
VP_P1_all,/Users/xxx/Test/VP_P1_assigned.fastq,forward

The 1st sequence in the sequence file:
@M_M03670:6:000000000-BCN4V:1:1101:15780:1331_SUB_SUB_CMP seq_length=132; sample=VP_500_1_1; experiment=eDNA; location=VP_500; grab=VP_500_1; forward_score=76.0; forward_tag=cgatgaca; reverse_tag=cgatgaca; forward_match=aagggcaccacaagaacgc; forward_primer=aagggcaccacaagaacgc; reverse_score=80.0; status=full; direction=reverse; reverse_match=ccacctatcacataatcatg; reverse_primer=ccacctatcacayaatcatg; head_quality=34.6; avg_quality=39.5052083333; seq_length_ori=192; tail_quality=34.6; mid_quality=40.0755813953; 1:N:0:4
GTGGAGCATGTGGCTTAATTTGACTCAACGTGGGAAATCTTACCGGGTCCGGACATACTGAGGATTGACAGGCAATTGATGATTGCTTCGGTGTTAAAACCAGGCTTTCATCGCTAAATATGCTAGTCCTTT
+
GGGGGHHHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHHHHHH

The command run successfully.

@jairideout I found that you mentioned lower case issue in DNA sequence in other posters.

My demultiplexed data (upper case) is in one fastq file (e.g. the VP_P1_assigned.fastq I used for 2nd test). I split the file based on the "sample" attribute in each sequence to obtain a fastq file for each sample so that I can import them into QIIME2, but the sequences are in lower case. Does anyone have any idea how to import my data in QIIME2? I am very keen to use QIIME 2 to analyze my data. Thank you.

eDNA · August 23, 2017, 8:50pm

Hello,

I did another test. I converted the sequences in VP_15_2_3.fastq to uppercase and it can be imported. I will have to convert each of my 280 fastq files before I use QIIME2.
My problem has been solved so far. Looking forward to getting results from QIIME2.

Thanks.
Tom.

thermokarst · August 24, 2017, 3:15am

Great, thanks for following up @eDNA. We have an existing open issue on one of our bug trackers about the lowercase sequence issue. We will update this thread when a solution has been implemented, but at this time we have no ETA on when that will happen (I would assumed before for the end of 2018). Sounds like you have a workaround in place to get you moving forward. Thanks!

system · September 24, 2017, 9:15am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.