How to process NCBI public demultiplexed fastq files using QIIME2

Zhang_Wang · March 7, 2019, 8:45am

Hello,
I used to use QIIME1 and am new to QIIME2. I wanted to use QIIME2 to process some public 16S datasets from NCBI SRA. When I downloaded the files they were already demultiplexed and named as below:

SRR000001_1.fastq
SRR000001_2.fastq
SRR000002_1.fastq
SRR000002_2.fastq
SRR000003_1.fastq
SRR000003_2.fastq

I wanted to process them through DADA2 deonise and OTU picking steps, but realized that I have to import them to a .qza file first. I read through the tutorial and was unable to find an intuitive way to do this. I tried:

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path raw_reads_paired/ --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-paired-end.qza

with the error message:

Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: '.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz'

Should I rename my fastq files to this artificial name format and then use this command? Or is there another easier way to do this?

Thanks very much for any help.
Zhang

jwdebelius · March 7, 2019, 8:47am

Hi @Zhang_Wang,

Welcome to :qiime2:!

I think you'll be a lot happier with the manifest format. You can import the samples without having to change the names. You can even re-name them if you want to.

Best,
Justine

Zhang_Wang · March 7, 2019, 12:45pm

Got it. This is really helpful. Thanks!