It appears that qiime dada2 denoise-paired
can incorrectly match the paired read files of samples if they have identical names, except one name includes "_[0-9]" at the end of the name. For instance, a dada2 job I recently ran generated the following error:
# excerpt
1) Filtering Error in filterAndTrim(unfiltsF, filtsF, unfiltsR, filtsR, truncLen = c(truncLenF, :
These are the errors (up to 5) encountered in individual cores...
Error in (function (fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0, :
Mismatched forward and reverse sequence files: 6069, 295.
...and exporting the demultiplexed seqs .qza file (the input for the dada2 job) generated the following sequence files:
# file_name: number_of_sequences
./V130_166_L001_R1_001.fastq.gz: 295
./V130_2_167_L001_R1_001.fastq.gz: 6069
./V130_2_743_L001_R2_001.fastq.gz: 6069
./V130_742_L001_R2_001.fastq.gz: 295
So, it appears that dada2 matched the samples based on the sort order, which incorrectly combined "V130 R1" with "V130_2 R2" and "V130_2 R1" with "V130 R2". I would imagine that this command matches samples based on the "sample-id" value as listed in the MANIFEST. I checked the manifest and the "V130" + "V130_2" samples seem be to labeled correctly for the correct read files. So, does dada2 just match read files based on sort order, because this would cause incorrect sample <--> read_file mapping?
Another section of the dada2 error was:
# excerpt
Error in (function (fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0, :
Mismatched forward and reverse sequence files: 2768, 7554.
...and this corresponded with the following samples:
# file_name: number_of_sequences
./V378_1016_L001_R2_001.fastq.gz: 7554
./V378_2_1017_L001_R2_001.fastq.gz: 2768
./V378_2_441_L001_R1_001.fastq.gz: 2768
./V378_3_1018_L001_R2_001.fastq.gz: 12390
./V378_3_442_L001_R1_001.fastq.gz: 12390
./V378_440_L001_R1_001.fastq.gz: 7554
So, it seems to be another case of sample <--> read_file mismatch.
$ qiime info
System versions
Python version: 3.5.5
QIIME 2 release: 2018.6
QIIME 2 version: 2018.6.0
q2cli version: 2018.6.0
Installed plugins
alignment: 2018.6.0
composition: 2018.6.0
cutadapt: 2018.6.0
dada2: 2018.6.0
deblur: 2018.6.0
demux: 2018.6.0
diversity: 2018.6.0
emperor: 2018.6.0
feature-classifier: 2018.6.0
feature-table: 2018.6.0
gneiss: 2018.6.0
longitudinal: 2018.6.0
metadata: 2018.6.0
phylogeny: 2018.6.0
quality-control: 2018.6.1
quality-filter: 2018.6.0
sample-classifier: 2018.6.0
taxa: 2018.6.0
types: 2018.6.0
vsearch: 2018.6.0
Application config directory
/ebio/abt3/nyoungblut/.config/q2cli
Getting help
To get help with QIIME 2, visit https://qiime2.org