Import per-sample paired-end fastq files

Hi,

(I am new to qiime/qiime2, but I am able to run the Moving Pictures” tutorial.)
My data consists of a directory with 2 files per sample, being the R1 and R2 file. The fastq files are thus already demultiplexed. What are the steps to get these data into an artefact?

I tried (among other things):

qiime tools import --type "SampleData[PairedEndSequencesWithQuality]" --input-path ./../raw_data/ --output-path ./raw-sequences.qza

and

qiime tools import --type RawSequences --input-path ./../raw_data/ --output-path ./raw-sequences.qza

I get errors like:
ValueError: Missing one or more files for EMPMultiplexedDirFmt: 'sequences.fastq.gz'

Any help is appreciated. This should be basic, but I have been googling for 3 hours now… Thanks!

Kind regards,
Filip

Hi @fvnieuwe, thanks for posting to the forum! We just published a new tutorial to the QIIME 2 documentation site:

https://docs.qiime2.org/2.0.6/tutorials/import-sequence-data/

We can help you import your paired-end data into an artifact, but unfortunately we don’t have any methods that support operations on paired-end sequences yet (the next release will support this, though, currently scheduled for Q1 2017). If this is holding you up right now, we recommend that you import either R1 or R2 reads, using the single-end Casava format in the tutorial linked above. It is important to note that the Casava format has specific naming requirements, so you may need to rename your files (there is a bit of discussion here):

An example filename: L2S357_15_L001_R1_001.fastq.gz.

The underscore-separated fields in this file name are the sample identifier (you should make sure this matches your sample id), the barcode sequence or a barcode identifier (this shouldn’t matter, you can put 01 here if you want), the lane number (this shouldn’t matter, you can put 001 here), the read number (these should match whichever read direction you choose), and the set number (must be 001).

We are also working on adding a new sequences directory format that doesn’t have these strict requirements — we will make an announcement when that becomes available.

If you need any help during this process, feel free to reach out here again. Thanks!

A post was split to a new topic: Long-running dada2 denoise

A post was split to a new topic: Join and demultiplex paired-end sequences