Data importing problems

Hi,

I am trying to but was unable to import fastq file into qiime2 (qiime2-2018.6). The data are for v4 region of 16S. For each patient, I have two files, with one for forward and one for reverse. All files are decompressed. There are no barcode files. The files are stored in a folder called Yang / Yang2, and the file name is like:

AB2S74_01_L001_R1_001.fastq
AB2S76_02_L001_R1_001.fastq
AB2S74_01_L001_R2_001.fastq
AB2S76_02_L001_R2_001.fastq

Part of the content of the first file (AB2S74_01_L001_R1_001.fastq) is shown below:

@M00307:30:000000000-B3PKC:1:1101:14934:1406 1:N:0:GGAGCTAC+GAGCCTTA
ATTGGGCGTAAAGTGAGCGTAGACGGACTTGCAAGTCTGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTGGAAACTGTAGGTCTAGAGTGCTGGAGAGGTAAGTGGAATTCCTCGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACAGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTA
+
AABBBBFBBBBBGGGGFGEEFE2FEG2AAGGGH5GFGFFGFFGGGHHFHHFFGGGEEHFHHHGGFGCEGHHHHHGHHEFFGHGH4GFBFG3G?GFGFHHGFG3?EBGFHGHHFHHHHHH3CFGGBGHCGGCEDGGGGGDFGFGHHG1FB<GDECG<GGEHHGHGGGGCGHGGAFG/BBFBBFCEFGGFGGGGFE…;FFFFA;9…AB.;;;;DEA.AFF?.;?AFFFFF9;BFFFFEBF
@M00307:30:000000000-B3PKC:1:1101:15666:1410 1:N:0:GGAGCTAC+GAGCCTTA
ATTGGGTTTAAAGTGCATGTAGGCGGTTATCTAAGCTTGGTGTGAAAGGCAGGGGCTCCACTCCTGGACTGCATTGAGAACTGGATGACTAGAGTTACTGAAGTGAAATCAGAATTCCAGGTGTAGGGGTGAAATCTGTAGATATCTGGAAGAATACCAATGGCGAAGGCAGGTTTCAGGCAGATAACTGACGCTGAGGTGCGAAGGTGCGGGGAGCAAACGGGATTAGATACCCGCGTA
+
AABBBBAABFFFG[email protected][email protected]@[email protected]/<AEFCDGCGHE==GGFFAGFFHGFC;AED?.CFHHFGE-;.;;9BFAD;BFEBB9DDFDFFFBBF/BBFDBAFA
@M00307:30:000000000-B3PKC:1:1101:17304:1441 1:N:0:GGAGCTAC+GAGCCTTA
ACTGGGCGTAAAGAGAGCGTAGACGGCAGAGCAAGTCTGATGTGAAAGGCAGGGGCTCAACCCCTGGACTGCATTGGAAACTGTTCGGCTTGAGTGCCGGAGAGGGAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTCCTGGACGGCAACTGACGTTGAGGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCCAGTA

I am trying to use Casava 1.8 paired-end demultiplexed fastq to import the data according to the QIIME2 tutorial with the following command:

qiime tools import --type ‘SampleData[PairedEndSequencesWithQuality]’ --input-path Yang/Yang2 --source-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-paired-end.qza

However, I got the following error message:

Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: '.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz’

So my questions are:

  1. How to fix this problem?
  2. I expect the length should be around 150 bp for either forward or reverse because these are paired-end data (v4 region). However, I found the long is around 250 bp. I don’t why?

Any suggestions are greatly appreciated!

James

To import as Casava 1.8 format, all files must be compressed (.gz). You can use a manifest format instead if you don’t want to take the trouble of compressing each file.

That’s a question for your sequencing center. You get what you pay for — the most probable answer is you purchased 250X PE sequencing for your samples!

I hope that helps!

Thanks @Nicholas Bokulich!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.