Challenges encountered with the denoise-ccs plugin: Very few reads successfully passed denoising

Hi, thanks to anyone willing to help me.

I am analyzing a batch of single-end third-generation 16S sequencing data, which has been demultiplexed.
The Interactive Quality Bar Plot and the demux.qzv file obtained after import are shown below:


demux.qzv (309.7 KB)
(I have some concerns here because, based on my script, the Phred scores for this batch of data should be 33.

zcat $1 | head -n 1000 | awk '{if(NR%4==0) printf("%s",$0);}'
| od -A n -t u1 -v
| awk 'BEGIN{min=100;max=0;}
{for(i=1;i<=NF;i++) {if($i>max) max=$i; if($i<min) min=$i;}}END
{if(max<=126 && min<59) print "Phred33";
else if(max>73 && min>=64) print "Phred64";
else if(min>=59 && min<64 && max>73) print "Solexa64";
else print "Unknown score encoding"; }'

However, both the bar plot and the seven-number summary suggest that the quality scores for these reads are much higher than 40. Is this normal?)

I consistently observe a low percentage of input passed filter when using denoise-ccs for data denoising.
I have experimented with adjusting several parameters, including setting --p-trunc-len 1520 and --p-trim-left 10, but the filter percentage hasn't increased.
The command I used and the denoising statistics are as follows:

qiime dada2 denoise-ccs
--i-demultiplexed-seqs demux.qza
--p-front AGRGTTTGATYNTGGCTCAG
--p-adapter RGYTACCTTGTTACGACTT
--verbose
--p-min-len 1000
--p-max-len 1600
--p-n-threads 8
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza


May I ask what possible adjustments can be made to increase the final non-chimeric percentage?

1 Like

Hi @jjj33

This is a little higher than usual. Usually if it was the wrong Phred score you will get a warning on the demux.qzv about it. Did you get that warning?

Check out --p-min-fold-parent-over-abundance. Maybe that will help you! Reminder though, We dont want chimeras in our sequences, so be aware that as you are messing around with parameters you are most likely making it easier for chimeras to slip through! :mag:

  --p-min-fold-parent-over-abundance NUMBER
                         The minimum abundance of potential parents of a
                         sequence being tested as chimeric, expressed as a
                         fold-change versus the abundance of the sequence
                         being tested. Values should be greater than or equal
                         to 1 (i.e. parents should be more abundant than the
                         sequence being tested). Suggest 3.5. This parameter
                         has no effect if chimera-method is "none".
                                                                [default: 3.5]

I also notice from your screen shot that your filter passing seems low as well. Are the majority of your samples getting 50% or less sequences pass the filtering step? If so, I might try messing around more with --p-trunc-len to see if you can get more sequences past the filtering step.
I hope that helps!
:turtle:

3 Likes

Hi, thank you for helping me.

Usually if it was the wrong Phred score you will get a warning on the demux.qzv about it.

Actually, I tried to import the data with the setting --input-format SingleEndFastqManifestPhred64V2, and I got the following error:

ValueError: Decoded Phred score is out of range [0, 62].

An unexpected error has occurred:

  Decoded Phred score is out of range [0, 62].

This seemed to remind me that the Phred score should be 33.

And I will try those parameters you suggest to see if the results could be better.

Cheers,

Jason

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.