Qiime 2 diversity core-metrics-phylogenetic sampling depth

Imee19 · February 26, 2019, 4:13am

I was wondering how would you choose the sampling depth when there is now even sampling.
I was following the tutorial but I can't seem to get why they have chosen 1109.
here is my table.qzv under Deblur
invertsdeblur-table.qzv (1.3 MB)

Thanks,
Imee19

thermokarst · February 26, 2019, 1:31pm

Hi @Imee19!

From the tutorial:

Here we set the --p-sampling-depth parameter to 1109. This value was chosen based on the number of sequences in the L3S341 sample because it’s close to the number of sequences in the next few samples that have higher sequence counts, and because it is considerably higher (relatively) than the number of sequences in the one sample that has fewer sequences. This will allow us to retain most of our samples. The one sample that has fewer sequences will be dropped from the core-metrics-phylogenetic analyses and anything that uses these results.

This does not appear to be the Moving Pictures tutorial dataset, is this your own data?

Imee19 · February 27, 2019, 1:12am

Hi Matt,
Yes,I attached my data. I did the tutorial and I attended one of the Qiime 2 workshops last year. I tried to make sense of this step but not sure if I am correct ,like, in my data,anything below 2000 sequences will be dropped, is this correct way to decipher it? Or I will just be conservative with it let's say 1000?
Thanks,
Imelda

thermokarst · February 27, 2019, 2:44pm

Thanks for clarifying --- your post above seemed to imply otherwise, I just wanted to double-check.

Yep, any samples with less than 2000 features will be dropped; and, every remaining sample will be subsampled down to 2000 features.

You could try at a few even sampling depths - (also, check out alpa-rarefaction and beta-rarefaction for assistance). I like to pick an even sampling depth that corresponds with a sample-defined threshold, like 2148 for your dataset (instead of 2000) - otherwise you are just throwing away reads for no real reason. The same is true for 1000 --- you would lose the same samples, but your remaining samples would have less than half as many reads compared to a 2148 even sampling depth. Make sense?

With that said, you are going to have to make a tough decision here --- your sequencing depth distribution for these samples is a bit extreme, and, judging by your metadata included in that file, the lower-depth samples correspond to potentially lower biomass samples --- that means that rarefaction can preferentially eliminate samples from your study, which might not be ideal for you.

Imee19 · February 28, 2019, 1:14am

Hi Matt,
Thank you for helping me out.I will take into consideration your suggestions.

thermokarst · February 28, 2019, 2:45pm

An off-topic reply has been split into a new topic: Help with picking an even sampling depth

Please keep replies on-topic in the future.

system · March 31, 2019, 8:45pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.