Losing almost all reads after running deblur on paired-end sequences

Mehrbod_Estaki · August 9, 2018, 11:58pm

Thanks for the detailed explanation and your awesome detective work up to this point!
There's a combination of things that may driving what you see so let's take a closer look.

First I should mention that the loss of sequences with merged reads in deblur compared to just the forward reads has been observed and discussed before. A good discussion was had here and here. The latter link having a good calculation example of how many reads you may expect to lose with deblur, especially with longer reads. Personally I've also found this to be true when comparing DADA2 paired-end workflow to merge-->deblur approach. So the loss of reads is not unheard of, that being said I think you're still losing a bit more than expected, which just may be rooted in the nature of your data.

I know this sounds a bit confusing but the "fraction-artifact-with-minsize" doesn't have to do with the length-read but rather the --p-min-size parameter which sets the min abundance of a read to be retained, meaning most of your reads are singletons and so are discarded. It's hard to know why this is but a few possibilities are improper merging, not having removed primer/barcodes from your reads, or that there just is that many singletons (though unlikely imo). I would look at those points first before we do further troubleshooting. Can you also give us a bit more info about the nature of your run like sample type, primer sets, your run's cycles numbers?