I'm working on a small dataset from a Miseq run and have some questions. I used the standard import function in QIIME2 for demultiplexed paired-end reads and it worked fine with no errors. However, ended up with really low read counts. This surprised my colleague who used dada2 in R to join the reads and we discovered a discrepancy. My read counts are almost 10,000 less per sample than his and thats before using the Dada2 plugin in QIIME2 for trimming and chimera removal etc. I can't see any information anywhere about changing the import parameters or if there is a standard quality setting for the import step. It just seems very strange that we would have such large differences between the two methods.
This are the import commands I used in QIIME2:
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path /RawReadsSponge/reads/ --source-format CasavaOneEightSingleLanePerSampleDirFmt --output-path SpongeVacuSIP_demux.qza
qiime demux summarize --i-data SpongeVacuSIP_demux.qza --o-visualization SpongeVacuSIP_demux.qzv
qiime tools export SpongeVacuSIP_demux.qzv --output-dir demultiplexed.sequences.per.sample.view
The demux visualization file shows good quality on the paired reads (>28 until 245 bp) and I would only trim off 5 to 10 bp at the end of the reverse read in the dada plugin step.
My colleague is using the following R commands:
library(dada2); packageVersion("dada2")
#File parsing
pathF <- "FWD" # CHANGE ME to the directory containing your demultiplexed forward-read fastqs
pathR <- "REV" # CHANGE ME ...
filtpathF <- file.path(pathF, "filtered") # Filtered forward files go into the pathF/filtered/ subdirectory
filtpathR <- file.path(pathR, "filtered") # ...
fastqFs <- sort(list.files(pathF, pattern="fastq"))
fastqRs <- sort(list.files(pathR, pattern="fastq"))
if(length(fastqFs) != length(fastqRs)) stop("Forward and reverse files do not match.")
filterAndTrim(fwd=file.path(pathF, fastqFs), filt=file.path(filtpathF, fastqFs),
rev=file.path(pathR, fastqRs), filt.rev=file.path(filtpathR, fastqRs),
truncLen=c(240,150), maxEE=2, truncQ=11, maxN=0, rm.phix=TRUE,
compress=TRUE, verbose=TRUE, multithread=TRUE, matchIDs = TRUE)
#File parsing
filtpathF <- "FWD2/filtered"
filtpathR <- "REV2/filtered"
filtFs <- list.files(filtpathF, pattern="fastq.gz", full.names = TRUE)
filtRs <- list.files(filtpathR, pattern="fastq.gz", full.names = TRUE)
sample.names <- sapply(strsplit(basename(filtFs), ""), [
, 1)
sample.namesR <- sapply(strsplit(basename(filtRs), ""), [
, 1)
if(!identical(sample.names, sample.namesR)) stop("Forward and reverse files do not match.")
names(filtFs) <- sample.names
names(filtRs) <- sample.names
set.seed(100)
#Learn forward error rates
errF <- learnErrors(filtFs, nbases=1e8, multithread=TRUE)
#Learn reverse error rates
errR <- learnErrors(filtRs, nbases=1e8, multithread=TRUE)
#Sample inference and merger of paired-end reads
mergers <- vector("list", length(sample.names))
names(mergers) <- sample.names
for(sam in sample.names) {
cat("Processing:", sam, "\n")
derepF <- derepFastq(filtFs[[sam]])
ddF <- dada(derepF, err=errF, multithread=TRUE)
derepR <- derepFastq(filtRs[[sam]])
ddR <- dada(derepR, err=errR, multithread=TRUE)
merger <- mergePairs(ddF, derepF, ddR, derepR)
mergers[[sam]] <- merger
}
rm(derepF); rm(derepR)
#Construct sequence table and remove chimeras
seqtab <- makeSequenceTable(mergers)
saveRDS(seqtab, "seqtab_K1.rds")
Any help you can offer on why we would get such huge read count differences between the two methods would be really helpful.
Thank you!