Sampling depth - Is my sampling depth ideal

ChrisKeefe · May 11, 2020, 10:52pm

Hi again @ptalebic!
These are good questions, and have been discussed at length here on the forum. I've done my best to answer them again here, but you'll really want to dig into the resources at the end of this response.

Your sampling depth is quite good, so you have more wiggle room than most studies. You're mostly right that the goal here is to find a balance between "the most sequences" and "the most samples". Considering how metadata categories are impacted is critical.
For example:
You can keep all of your samples if your depth is 9634. That's still a fairly large frequency per sample, and if sample collection is expensive, you don't have many samples, or if the samples you might lose are critical to your study, then this might be a great choice.
image1258×427 30.6 KB

If these things don't impact your study particularly, then you might be able to crank up the depth without sacrificing important samples. At 20825, you've lost quite a few samples. You happen to have many samples to lose, and most of the loss is evenly distributed across bins, or comes from heavily-sampled bins. This depth may be OK for you.

Of note, though: at this depth you've lost most samples from hs60 and hs61:

... and you may have disproportionately dropped samples from older patients, which could introduce bias if not handled carefully:

.

I'd recommend taking a long look at your data using these tables and alpha-rarefaction curves, and making your choices based on your study's unique needs.

Sequence-count differences may be related to how the two tools handle error, but could also be a byproduct of the parameters you chose. Some parameter choices will yield more sequences, and some will yield fewer. If you haven't already, you can use some combination of trial-and-error and the data in your denoising stats (both tools produce something like this) to optimize your parameters.

As for which tool to choose, the answer is again, "what does your study need?" DADA2 provides some really nice features - quality filtering, read joining, and sequence repair. A look at the resources below and some searching should give you the background you need to start making the best choices for your study.

Tutorials - the QIIME 2 team, with the help of some awesome community members, has built a solid collection of tutorials. They're all worth working through, but you'll probably get the most value right now out of this bit on alpha rarefaction, and the rest of the moving pictures tutorial. This section on denoising may also be useful. Let me know here if there's anything unclear.
The docs - In addition to useful details on how commands work, plugin documentation includes citation information, so you can look at the papers that describe DADA2 and Deblur directly, and see how they differ.
Existing forum posts can be super useful in figuring out how things work, and why people prefer one tool over another. Plus, the answers are already there, so you won't get stuck waiting for a response. The magnifying-glass icon at the top right of the screen will take you amazing places.

Good luck,
Chris