When processing Pacbio full-length 16S data, the number of sequences generated in the OTU table or ASV table is too small

Hi friends:
I'm not sure if this is the right place to post this question, but this forum is the most professional I could find, thanks for any replies!
I'm working on a batch of Pacbio full-length 16S data from soil samples, and the primer processing as well as the QC phase went well, with an average of tens of thousands of ccs sequences retained in each sample.
My installation of qiime2 has not been going well due to our college's lousy network firewall, and I had to choose to use the DADA2 R package in the begining. I followed the official tutorial build the pipeline and the front went smoothly until I generated the ASV table.The number of sequences in the ASV table is too low, with just over two hundred sequences in the smallest sample.I checked the inputs and outputs of the various parts of the pipeline as shown in the figure.The number of input tens of thousands of sequences is drastically reduced after step dada, generating an ASV table that retains less than 10% of the sequences.
I turned to usearch to process the data, however similarly, the unoise3 command identified 1397 amplicons, but when generating the OTU table in otutab naming, it showed just 164054 / 836493 mapped to OTUs (19.6%)
I tried to use the traditional OTU clustering method with 97% threshold and clustered 6098 OTUs, but again when generating the OTU table with otutab naming, it shows only 279385 / 836493 mapped to OTUs (33.4%)
This is not seen in any NGS data processing I have done, is this a case of error somewhere in my processing or something else? Thanks for any replies, this is important to me.

2 Likes

For future reference, please see the discussion of this issue at the DADA2 issues forum: When processing full-length 16S data, the ASV table output by makeSequenceTable has an unusually low number of sequences in the sample · Issue #1975 · benjjneb/dada2 · GitHub