Import Paired End Illumina Data

Lexie_Keding · June 12, 2017, 3:16pm

We are trying to import data from a paired end Illumina HiSeq run, which is already demultiplexed and in fastq format. We have followed the tutorial; however each of our samples are distributed over two lanes of the sequencer.

So for example, we have files such as

Sample1_id1_L001_R1_001.fastq.gz
Sample1_id1_L001_R2_001.fastq.gz
Sample1_id1_L002_R1_001.fastq.gz
Sample1_id1_L002_R2_001.fastq.gz
Sample2_id2_L001_R1_001.fastq.gz
Sample2_id2_L001_R2_001.fastq.gz
Sample2_id2_L002_R1_001.fastq.gz
Sample2_id2_L002_R2_001.fastq.gz

etc.

The tutorial uses a source-format of CasavaOneEightSingleLanePerSampleDirFmt which doesn't seem to fit here, as "SingleLanePerSample" doesn't describe the setup.

Is there an appropriate source-format option to use here?
More generally, is there documentation for the available source-format options and how they are used?

pjtorres · June 12, 2017, 9:47pm

Hi @Lexie_Keding ,

Have you tried using this command

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path casava-18-paired-end-demultiplexed \
  --source-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path demux-paired-end.qza

from the [Importing Data] (Importing data — QIIME 2 2017.5.0 documentation) tutorial?

gregcaporaso · June 13, 2017, 1:33pm

@pjtorres, that format won't work here unfortunately, since @Lexie_Keding's data isn't single lane.

gregcaporaso · June 13, 2017, 2:09pm

Hi @Lexie_Keding,

each of our samples are distributed over two lanes of the sequencer

It may not currently be possible to do what you want to do with QIIME 2, unfortunately. I'll explain how you can proceed with this, where the issue is, and then a couple of possible work-arounds.

I'm assuming that you're going to denoise these data with DADA2 as a next step in the process, and DADA2 works on a single lane at a time. You'll therefore need to create two SampleData[SequencesWithQuality] objects, one per lane, and run dada2 denoise-paired twice. You'll then end up with two FeatureTable[Frequency] objects, which is where the issue arises. At the moment, we don't support merging of tables with overlapping sample ids (see q2-feature-table issue #86). We plan to add this functionality in the July release.

So, to proceed...

First, it looks like your file names are formatted correctly to be used with the CasavaOneEightSingleLanePerSampleDirFmt, once you split the files by lane. Alternatively, you could use the FastqManifest format with two manifests, one per lane - this would let you avoid having to move files around prior to importing. That should get you past the importing step.

Next, to work-around the issue with not being able to merge the tables after DADA2, you have a few options but none are great.

Treat the samples in different lanes as replicates, rather than as the same sample. You could do this by creating different names for the samples on a per-lane basis and then merging your tables. See the FMT tutorial for an example of how to merge tables without overlapping sample ids.
Export your FeatureTable[Frequency] objects, merge them with QIIME 1.9.1's collpase_samples.py and then import the resulting merged table. This will work for you now, but the disadvantage is that the imported FeatureTable object won't have its provenance tracked prior to the import (i.e., the object won't have the full history of how it was created recorded since you lose that on export, so you'll need to track that yourself - the provenance concept is described at a high level here).
Wait for this functionality to be present in QIIME 2 (in the July release).

Sorry to not have a better solution for you at this time!

More generally, is there documentation for the available source-format options and how they are used?

There isn't at the moment. It's one of the areas of the documentation that we're planning to improve while we're in our alpha release stage.

*Edited to note that we'll implement the missing functionality described here for the July 2017 release.

Lexie_Keding · June 15, 2017, 3:35pm

Hi @gregcaporaso,

Thanks for providing us a solution so quickly. We feel confident proceeding forward with the data!

Sincerely,
Lexie

gregcaporaso · June 15, 2017, 6:05pm

No problem - thanks for your interest in QIIME 2!

ebolyen · September 29, 2017, 7:21pm

Just to update: 2017.9 is out, and we support merging on both axis with feature-table merge!
(feature-table group is also a useful command for this situation)