Qiime 2 diversity core-metrics-phylogenetic sampling depth

I was wondering how would you choose the sampling depth when there is now even sampling.
I was following the tutorial but I can't seem to get why they have chosen 1109.
here is my table.qzv under Deblur
invertsdeblur-table.qzv (1.3 MB)

Thanks,
Imee19

Hi @Imee19!

From the tutorial:

Here we set the --p-sampling-depth parameter to 1109. This value was chosen based on the number of sequences in the L3S341 sample because it’s close to the number of sequences in the next few samples that have higher sequence counts, and because it is considerably higher (relatively) than the number of sequences in the one sample that has fewer sequences. This will allow us to retain most of our samples. The one sample that has fewer sequences will be dropped from the core-metrics-phylogenetic analyses and anything that uses these results.

This does not appear to be the Moving Pictures tutorial dataset, is this your own data?

1 Like

Hi Matt,
Yes,I attached my data. I did the tutorial and I attended one of the Qiime 2 workshops last year. I tried to make sense of this step but not sure if I am correct ,like, in my data,anything below 2000 sequences will be dropped, is this correct way to decipher it? Or I will just be conservative with it let’s say 1000?
Thanks,
Imelda

Thanks for clarifying --- your post above seemed to imply otherwise, I just wanted to double-check.

Yep, any samples with less than 2000 features will be dropped; and, every remaining sample will be subsampled down to 2000 features.

You could try at a few even sampling depths - (also, check out alpa-rarefaction and beta-rarefaction for assistance). I like to pick an even sampling depth that corresponds with a sample-defined threshold, like 2148 for your dataset (instead of 2000) - otherwise you are just throwing away reads for no real reason. The same is true for 1000 --- you would lose the same samples, but your remaining samples would have less than half as many reads compared to a 2148 even sampling depth. Make sense?

With that said, you are going to have to make a tough decision here --- your sequencing depth distribution for these samples is a bit extreme, and, judging by your metadata included in that file, the lower-depth samples correspond to potentially lower biomass samples --- that means that rarefaction can preferentially eliminate samples from your study, which might not be ideal for you.

Hi Matt,
Thank you for helping me out.I will take into consideration your suggestions.

An off-topic reply has been split into a new topic: Help with picking an even sampling depth

Please keep replies on-topic in the future.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.