I kind of understand each column in deblur-stats.qzv file.
But I don't know that what proportion of raw reads is discarded in the deblur step is normal/acceptable?
For example, the sampling depth choose in deblur step is 120.
In sample A: raw reads 10000; reads-derep 7000; reads-deblur 6060; reads-artifact 60; reads-chimeric 2000; reads-missed-reference 3000; reads-hit-reference 1000.
In this example, I have lots of reads that are supposed to be chimeric or missed hit the reference, only 10% reads left after deblur. I would like to suspect that there is sth. wrong with this sample.
So what proportion of reads-hit-reference, reads-artifact, reads-chimeric, reads-missed in the deblur step is normal/acceptable?
I think these questions depend a lot on the microbial community in question and how the samples were prepared.
Having ~30% percent of your reads be chimeric is a little high, but that could be expected if you had to use a lot of PCR cycles to get a signal from low biomass samples.
Having ~50% () of your reads miss the database is quite high for a human microbiome project, but might be expected if you are working with samples from a novel environment comprised of understudied taxa
Have you tried processing your data with a database independent method like DADA2? What percentage of your reads does dada2 think are chimeric?
The extreme example above is not my real data. I made the proprotion of chimeric and missed reads too high to give a “bad” example.
You said, ~30% reads are chimeric was a little high, ~50% of reads miss the database was quite high. Then what porportion of chimeric and reads miss reference is acceptable for a human microbiome project like gut microbiome?
And most important, what proportion of reads hit reference is acceptable in a normal situation? Is 30% too little? I understand the proportion depends, is there a recommended range?
Another question: I should compare chimeric/missed/hit reads with reads-deblur but not reads-raw?