We are trying to import data from a paired end Illumina HiSeq run, which is already demultiplexed and in fastq format. We have followed the tutorial; however each of our samples are distributed over two lanes of the sequencer.
So for example, we have files such as
The tutorial uses a source-format of
CasavaOneEightSingleLanePerSampleDirFmt which doesn’t seem to fit here, as “
SingleLanePerSample” doesn’t describe the setup.
- Is there an appropriate source-format option to use here?
- More generally, is there documentation for the available source-format options and how they are used?
Hi @Lexie_Keding ,
Have you tried using this command
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path casava-18-paired-end-demultiplexed \
--source-format CasavaOneEightSingleLanePerSampleDirFmt \
from the [Importing Data] (https://docs.qiime2.org/2017.5/tutorials/importing/) tutorial?
@pjtorres, that format won’t work here unfortunately, since @Lexie_Keding’s data isn’t single lane.
each of our samples are distributed over two lanes of the sequencer
It may not currently be possible to do what you want to do with QIIME 2, unfortunately. I’ll explain how you can proceed with this, where the issue is, and then a couple of possible work-arounds.
I’m assuming that you’re going to denoise these data with DADA2 as a next step in the process, and DADA2 works on a single lane at a time. You’ll therefore need to create two
SampleData[SequencesWithQuality] objects, one per lane, and run
dada2 denoise-paired twice. You’ll then end up with two
FeatureTable[Frequency] objects, which is where the issue arises. At the moment, we don’t support merging of tables with overlapping sample ids (see q2-feature-table issue #86). We plan to add this functionality in the July release.
So, to proceed…
First, it looks like your file names are formatted correctly to be used with the
CasavaOneEightSingleLanePerSampleDirFmt, once you split the files by lane. Alternatively, you could use the
FastqManifest format with two manifests, one per lane - this would let you avoid having to move files around prior to importing. That should get you past the importing step.
Next, to work-around the issue with not being able to merge the tables after DADA2, you have a few options but none are great.
- Treat the samples in different lanes as replicates, rather than as the same sample. You could do this by creating different names for the samples on a per-lane basis and then merging your tables. See the FMT tutorial for an example of how to merge tables without overlapping sample ids.
FeatureTable[Frequency] objects, merge them with QIIME 1.9.1’s
collpase_samples.py and then import the resulting merged table. This will work for you now, but the disadvantage is that the imported
FeatureTable object won’t have its provenance tracked prior to the import (i.e., the object won’t have the full history of how it was created recorded since you lose that on
export, so you’ll need to track that yourself - the provenance concept is described at a high level here).
- Wait for this functionality to be present in QIIME 2 (in the July release).
Sorry to not have a better solution for you at this time!
More generally, is there documentation for the available source-format options and how they are used?
There isn’t at the moment. It’s one of the areas of the documentation that we’re planning to improve while we’re in our alpha release stage.
*Edited to note that we’ll implement the missing functionality described here for the July 2017 release.
Thanks for providing us a solution so quickly. We feel confident proceeding forward with the data!
No problem - thanks for your interest in QIIME 2!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.
Just to update: 2017.9 is out, and we support merging on both axis with
feature-table group is also a useful command for this situation)