Denoise-single error associated with --p-trunc-len values

Hi, I am getting different error messages associated with different --p-trunc-len values. My data set contains sequences up to max 240bp. When I specify --p-trunc-len 240, the command runs successfully but only keeps 11 sequences out of 317,000+.

Here is the full generic command: qiime dada2 denoise-single --i-demultiplexed-seqs single-end-demux.qza --p-trim-left 0 --p-trunc-len “variable” --p-chimera-method ‘consensus’ --o-representative-sequences rep-seqs-dada2.qza --o-table table-dada2.qza --verbose

When --p-trunc-len 0, DADA2 goes through step 1 (Filtering) and then stops in step 2 (Learning Error Rates) with the repeated error message:
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.

Because lengths less than 80 are not useful for me, I tried using --p-trunc-len 80. DADA2 goes through step 1 (Filtering) and then stops in step 2 (Learning Error Rates) with the error message:

Error in dada(drps[1:i], err = NULL, selfConsist = TRUE, multithread = multithread) :
derep$quals matrix has an invalid maximum Phred Quality Scores of 63
Execution halted

To give more information about the data set, I imported this (originally paired end) data into QIIME2 after performing merging and demultiplexing in OBItools (illuminapairedend and ngsfilter, respectively). I have no idea what the OBItools commands have done with the quality scores. When I look at the summary table for the data after importing it does give a warning that some of the PHRED quality values are out of range. The raw data was encoded with Phred33 but I suspect that OBItools is changing the quality scores during these commands. I used OBItools for this because the demultiplexing was complicated and used both inline barcodes and primers as MIDs.

Hey @kmw,

I’m not familiar with OBItools, but it does seem that with the --fastq-output option supplied it uses the Sanger PHRED 33 offset, so my guess is that it is just adding the quality scores where the overlap is (I could be entirely wrong here).

Of greater concern is I don’t think using DADA2 on this data really makes sense as it is expecting raw sequence data which allows it to infer where sequencing error may have occurred. @benjjneb, is this a fundamental expectation of DADA2, or would you still get “reasonable”, but not ideal, results if merging was done before-hand?

Perhaps we can focus on getting your paired-end data into QIIME 2 as SampleData[PairedEndSequencesWithQuality] (skipping OBItools)? Then dada2 denoise-paired can take you the rest of the way.

Otherwise another option might be to do your own quality filtering, and then import an OTU table and representative sequences; using QIIME 2 only for the “downstream” analysis.

1 Like

Of greater concern is I don’t think using DADA2 on this data really makes sense as it is expecting raw sequence data which allows it to infer where sequencing error may have occurred. @benjjneb, is this a fundamental expectation of DADA2, or would you still get “reasonable”, but not ideal, results if merging was done before-hand?

Short answer: Merging before DADA2 is not recommended.

Long answer: You can get reasonable results on previously merged reads if you used a read-merging program that assigns quality scores to the merged bases that are consistent with the quality scores of the unmerged bases on in the non-overlap regions. But many (even most) read-merging programs don't, hence the blanket recommendation to not do this.

2 Likes

Hi @ebolyen and @benjjneb

Thanks for the quick and detailed responses! I didn’t know that it was not ideal to merge before DADA2, and I can perform the demultiplexing in OBItools without merging beforehand. I have since learned that the merging function in OBItools adds Phred scores on overlapping bases, which has caused these error messages. I am now running raw demultiplexed single end sequences successfully in DADA2 and will extend this to importing raw demultiplexed PE data as SampleData[PairedEndSequencesWithQuality] Thanks again, you’ve been a great help!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.