Hi @Shuxian_LI ,
This is actually normal because many sequencing providers do not provide the pacbio ccs reads with satisfactory quality so the dada2 will drop those "bad ccs reads" .
Ask your sequencing provider what parameters they had used in ccs
, especially minfullpass and minPredictedAccuracy ? I bet the minPredictedAccuracy is below 0.99.
I guess the reason why those sequencing providers do not deliver better ccs reads is because of the sequecing costs. More raw reads, better quality but higher costs which result in higher service price.
Check out @benjjneb 's repo , you will find a set of high-quality ccs reads which was delivered by Pacbio itself and work just fine using dada2.
Right now I suggest using otu clustering with --p-perc-identity 0.99
instead of dada2 denoising.
Do not forget to remove the primer and re-orient the ccs reads using dada2::removePrimers
in R first.
Kind regards,
Sixvable