Discrepency in read counts between QIIME2 Import and Dada2 in R

I'm working on a small dataset from a Miseq run and have some questions. I used the standard import function in QIIME2 for demultiplexed paired-end reads and it worked fine with no errors. However, ended up with really low read counts. This surprised my colleague who used dada2 in R to join the reads and we discovered a discrepancy. My read counts are almost 10,000 less per sample than his and thats before using the Dada2 plugin in QIIME2 for trimming and chimera removal etc. I can't see any information anywhere about changing the import parameters or if there is a standard quality setting for the import step. It just seems very strange that we would have such large differences between the two methods.

This are the import commands I used in QIIME2:

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path /RawReadsSponge/reads/ --source-format CasavaOneEightSingleLanePerSampleDirFmt --output-path SpongeVacuSIP_demux.qza

qiime demux summarize --i-data SpongeVacuSIP_demux.qza --o-visualization SpongeVacuSIP_demux.qzv

qiime tools export SpongeVacuSIP_demux.qzv --output-dir demultiplexed.sequences.per.sample.view

The demux visualization file shows good quality on the paired reads (>28 until 245 bp) and I would only trim off 5 to 10 bp at the end of the reverse read in the dada plugin step.

My colleague is using the following R commands:

library(dada2); packageVersion("dada2")
#File parsing
pathF <- "FWD" # CHANGE ME to the directory containing your demultiplexed forward-read fastqs
pathR <- "REV" # CHANGE ME ...
filtpathF <- file.path(pathF, "filtered") # Filtered forward files go into the pathF/filtered/ subdirectory
filtpathR <- file.path(pathR, "filtered") # ...
fastqFs <- sort(list.files(pathF, pattern="fastq"))
fastqRs <- sort(list.files(pathR, pattern="fastq"))
if(length(fastqFs) != length(fastqRs)) stop("Forward and reverse files do not match.")
filterAndTrim(fwd=file.path(pathF, fastqFs), filt=file.path(filtpathF, fastqFs),
rev=file.path(pathR, fastqRs), filt.rev=file.path(filtpathR, fastqRs),
truncLen=c(240,150), maxEE=2, truncQ=11, maxN=0, rm.phix=TRUE,
compress=TRUE, verbose=TRUE, multithread=TRUE, matchIDs = TRUE)

#File parsing
filtpathF <- "FWD2/filtered"
filtpathR <- "REV2/filtered"
filtFs <- list.files(filtpathF, pattern="fastq.gz", full.names = TRUE)
filtRs <- list.files(filtpathR, pattern="fastq.gz", full.names = TRUE)
sample.names <- sapply(strsplit(basename(filtFs), ""), [, 1)
sample.namesR <- sapply(strsplit(basename(filtRs), "
"), [, 1)
if(!identical(sample.names, sample.namesR)) stop("Forward and reverse files do not match.")
names(filtFs) <- sample.names
names(filtRs) <- sample.names
set.seed(100)

#Learn forward error rates
errF <- learnErrors(filtFs, nbases=1e8, multithread=TRUE)
#Learn reverse error rates
errR <- learnErrors(filtRs, nbases=1e8, multithread=TRUE)

#Sample inference and merger of paired-end reads
mergers <- vector("list", length(sample.names))
names(mergers) <- sample.names
for(sam in sample.names) {
cat("Processing:", sam, "\n")
derepF <- derepFastq(filtFs[[sam]])
ddF <- dada(derepF, err=errF, multithread=TRUE)
derepR <- derepFastq(filtRs[[sam]])
ddR <- dada(derepR, err=errR, multithread=TRUE)
merger <- mergePairs(ddF, derepF, ddR, derepR)
mergers[[sam]] <- merger
}
rm(derepF); rm(derepR)

#Construct sequence table and remove chimeras
seqtab <- makeSequenceTable(mergers)
saveRDS(seqtab, "seqtab_K1.rds")

Any help you can offer on why we would get such huge read count differences between the two methods would be really helpful.

Thank you!

Hi @reige012!

All importing does is simply copy the files into a QZA file. If you didn't run any additional filtering or other qa/qc steps then the QZA file should be identical to what was imported.

You can confirm this by running the following command in your QIIME 2 environment and comparing it to the per-sample read counts in the viz generated by demux summarize:

cd /RawReadsSponge/reads
for f in *.fastq.gz; do r=$(( $(gunzip -c $f | wc -l | tr -d '[:space:]') / 4 )); echo $r $f; done

If you're still observing a discrepancy with your colleague's observation, my suggestion is to double-check their method for computing the raw reads is comparing the same thing as you, at the same steps.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.