I have been looking at the first sequencing run of my data which comes from chronic wound infection swabs. The data is Illumina 2x300 paired end reads and samples were prepared following the human microbiome project protocol using primers 515F and 806R. As far as I know there has been no QC performed by the sequencing center but I did receive the reads demultiplexed. From the summary statistics I used the following commands with Dada2:
Initially I though this might be down to not trimming my reverse reads enough to remove all the poor quality bases. So I repeated the denoising with the following commands:
So I guess my question is, do you think that I am losing too many reads during the chimera removal stage? If so, is there anything I can do to improve on this?
Let me know if I have left out some vital information
This is more than I typically see, but chimera errors can be prevalent and depend largely on PCR amplification protocol etc. Notably, you do still have plenty of good-quality merged sequence after chimera filtering so I would personally just move on and accept that dada2 is probably correct. Your trimming parameters, etc, look appropriate, nor do I think that would be likely to result in chimera, so I'm inclined to agree with dada2 on this one.
You can get a second opinion by using a different chimera filter (e.g., export your data and use another tool or use q2-vsearch to dereplicate your data and then remove chimera).
Thanks for your reply. Sorry I was called away to another project. I thought it was a bit strange as we are basically following the HMP protocol. The only difference is that we are using a touchdown PCR for the first 10 cycles as we were getting some non-specific banding, but only 25 cycles in total.
I now have all my data for this project so I reassessed the truncation perimeters based on all for sequencing runs which means I am now running:
Looking at the statistics now I am loosing hardly any during the chimera step, literally >1000 reads for most of the samples (at least in the first pool).
Fingers crossed that the other 3 pools look the same
Not sure why there is such a dramatic difference though