Very low denoising stats from DADA2 using Ion Torrent data


@Jen_S and I were able to successfully run vsearch for taxonomic classification in our pipeline aftering running DADA2-pyro option since we are using Ion Torrent data. When we examined the results using the taxa barplot, one of the regions looked decent (not great) and one barely classified anything. Since we are using mock samples, we are able to compare expected vs. our results. See below:

V2 region (green is unassigned)

V4 region (much lower percentage of unassigned but staph genus level taxa and hugely overrepresented)

This was concerning, so we worked backwards to see where the problem was. We had no issues importing the files as a qiime artifact and then cutadapt was performed for to remove adaptors and separate by V region.

We used DADA2_pyro and when we ran the denoising stats, we realized this is where our problem lied. For the V2 region (many unclassified taxa). Less than 1% of input passed filter. See example below (this is for only 2 samples from 1 run but other runs for v2 looked similar):

Our better performing region still only had about a 20-30% pass rate (see below):

Comparing this to the results from the Parkinson’s tutorial, it is clear why we have such discrepancy with our taxonomic classification.

We also compared the results from using qiime dada2 denoise-pyro and qiime dada2 denoise-single just out of curiosity and the denosing stats were the same.

Below is our dada2 script-

qiime dada2 denoise-pyro
–p-trim-left 15
–p-trunc-len 250
–i-demultiplexed-seqs ./v2f/run01_v2f_trimmed.qza
–o-table ./v2f/dada2_pyro/dada2_pyro_run01_v2f_table.qza
–o-representative-sequences ./v2f/dada2_pyro/dada2_pyro_run01_v2f_rep_seq.qza
–o-denoising-stats ./v2f/dada2_pyro/dada2_pyro_run01_v2f_stats.qza

Do you have any suggestions of changes we can make to improve what passes through the filter?

For our V2 region since the input passing is so low and we would have to improve it by >95% to match with the Parkinson’s denoising stats, can we trust the data from this region?

Thank you!

Hello Katherine,

Good detective work! :female_detective: I concur with your conclusions. There has got to be a way to get more reads pass the filter.

Dada2 trims reads based on the trim-* and trunc-* settings, then by expected error rate. So by trimming off additional low quality bases, or by reducing your expected error rate, you should keep more reads.

Because the regions are different lengths and have different quality, you might want to trim at different levels for different each of them. How do the quality score plots look for each of your regions?