Importing paired-end demultiplexed multi-lane sequences

Andrew_Mead · June 18, 2019, 11:03am

Hello, I have multiple sample folders that each contain 8 files representing the 4 lanes for each the forward and reverse sequences. I'm running qiim2-2019.4 and need to import the data.

I wasn't sure if there was an easy way to do this so used the cat command to concatonate the data into a single forward and single reverse fastq.gz files (each sample now has 2 files). These are in folder Concatonated-sequences.

I tried to import these using the following script but got an error.

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path Concatonated-sequences --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-paired-end.qza

There was a problem importing Concatonated-sequences:
Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: '.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz'

This is my first time using QIIME and having gone through the tutorials and Illumina import documents I'm nots sure where I'm going wrong.

Thank you for your help.

Andrew

jwdebelius · June 18, 2019, 1:26pm

Hi @Andrew_Mead,

Welcome to microbiome analysis and the :qiime2: forum!

My best recomendation is to use a manifest format rather than the Casava format. Pro is that it gives you want more control. Con is that you have to build the manifest file. (Not so bad in your case, I suspect).

Do you know what you plan to do wtih the data after you get it loaded? Denoising, OTU picking? Something else? If you're planning to run Dada2, you should do that on a per-sequencing run level, but I think there's a weird scaling thing around depth and sample size. If you're doing open ref or de novo OTUs, you should import all the data and cluster together. Deblur doesn't really care.

Best,
Justine

Andrew_Mead · June 18, 2019, 6:06pm

HI Justine,

Thankyou for your feedback. I have used the manifest format as you advised and was able to import my data. I have produced a summary which shows the number of samples, sequences and interactive quality, so I think I'm on the right track.

I was planning to denoise, and work my way through to alpha and beta diversity and hopefully taxonomy so I can compare the different samples. I will take a look at using deblur for denoising and go from their.

Thank you very much for your help.

Andrew

system · July 20, 2019, 12:06am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.