P sampling depth

Mehrbod_Estaki · October 11, 2018, 11:49pm

HI @Fatemah,
Sorry about the delay on this!

Glad we were able to increase your coverage and avoid having to drop a bunch of samples. Always important to play with the data a bit! Depending on your expected community the minimum 2,635 could be enough to give a good depiction of the community. I've certainly seen publications with less.

So, we did this not because you had primers left in your beginning but because of the mystery N assignments. It's hard for me to suggest whether or not we should keep this cut off or not. You could try re-running with and without the Ns and compare. Unfortunately the head command isn't so comprehensive of what is going on in your sample so your best bet is just re-run it if you want. I wouldn't worry too much about the extra coverage though, those 20 bp are not going to reduce your resolution too much.

Yes! It may seem counter-intuitive at first but this is actually what we expected. So as include more/longer read into DADA2 it does introduce more of those poor quality bases on the 3'. As so DADA2 actually will discard those reads instead of using them if the quality score drops below a certain threshold (see the --help command for more info on these). So while you are keeping longer reads, you are infact causing more of those reads to be discarded. When you trim those poor quality tails, then more reads make it pass this filtering step, thus more reads! Hope that makes sense.

Yes, the dark black line in the middle of your reads is the median. You can also zoom in that graph by drawing a box onto the image if you wanted to see more depth. And remember that what I said is it not a rule or a one-size-fit-all recommendation, it is just a starting point. You will have to play around with your data to find what works best. For that reason I can't tell you what you need to pick or how to pick, that is just something you have to figure out. I suggested 170 and 190 in your reads purely based on a vague rough visual inspection of your reads. Just sort of where I noticed the reads were dropping in quality. More importantly though is the fact that you have a region with lots of overlap which means we could actually truncate even earlier (as discussed earlier) without losing information. So to be safe I recommended being way more stringent than the min 20 median rule, which seemed to work better for your data! Again, just need to figure out what works for your data:P

Sorry there isn't a clear cut answer, but that's just the nature of this beast. Good luck!

system · November 12, 2018, 6:01am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.