After closed-reference OTU picking, nearly half of the data is discarded. Is this normal?
Welcome to the .
Can you explain a little bit more about your target gene and region, your database, how you pre-processed the sequences, and what environment they came from? It’s hard to make any kind of determination without that kind of information.
Thank you for your attention.
The 16S sequencing data was downloaded from NCBI, and the samples came from human gastric mucosa. After quality control with Deblur, closed-reference OTU picking is performed, and the database used by pickingk is Greengene.
Greengenes has fairly good coverage of the human gut, so I don’t think this is an issue of the sequences failing to hit the reference.
Instead, I would check your deblur statistics, I suspect that’s where you’re losing the data rather than the clustering step. Deblur tend to be quite conservative and so its not uncommon to lose half your sequences in that step. But, I would verify that.