Choose appropriate sampling depth when the sequences abundance between groups is very high

Hello everyone,

I have a question regarding the appropriate sampling depth value in a special case: I'm comparing larvae and adults of the same specie, as is known in the literature, in the composition of the microbiota in adults, there is a prevalence of about 80% of a particular bacterium, in larvae there is little.
Because of this, when I exclude this bacterium in QUIIME2, most reads are excluded for adults, so there is a big imbalance in the number of sequences between the two groups (adults and larvae).

How can I choose an appropriate value of sampling depth in this case? I tried with the lowest value in my feature table but it seems to be too low for rarefaction and I'm not sure whether this is could lead to significant results.

I'm running QIIME2 (2022.11.1) within nextflow pipeline ampliseq (release 2.5.0).

Here's the table:

sample DADA2_input filtered denoisedF denoisedR merged nonchim input_tax_filter filtered_tax_filter lost retained_percent lost_percent
Adult_1 152855 127357 126201 126198 113773 93474 93474 67063 26411 71.74508419 28.25491581
Adult_2 163191 146676 146117 146293 145041 138309 138309 7811 130498 5.64749944 94.35250056
Adult_3 161331 144498 144126 144210 143583 138738 138738 2240 136498 1.614554052 98.38544595
Adult_4 169671 147911 147439 147648 146810 134383 134383 31565 102818 23.48883415 76.51116585
Adult_5 167978 150744 150234 150408 149521 142514 142514 12492 130022 8.765454622 91.23454538
Adult_6 158778 143335 143073 143058 142435 137380 137380 4254 133126 3.0965206 96.9034794
Adult_7 129341 112680 112336 112485 111733 108275 108275 5557 102718 5.132302009 94.86769799
Adult_8 170857 153491 153093 153177 152154 142846 142846 7738 135108 5.417022528 94.58297747
Adult_9 166493 149215 148910 148732 147943 142271 142271 12146 130125 8.537228247 91.46277175
Adult_10 138279 116174 115522 115465 112174 89940 89940 34177 55763 37.99977763 62.00022237
Adult_11 144995 130107 129894 129958 129475 125832 125832 1628 124204 1.293788543 98.70621146
Larvae_1 133345 111436 110632 111012 108694 84321 84321 82264 2057 97.5605128 2.439487198
Larvae_2 144530 123233 122742 122969 122190 106975 106975 106839 136 99.87286749 0.127132508
Larvae_3 135974 112743 112156 112393 111053 90134 90134 90134 0 100 0
Larvae_4 94754 70744 70172 70436 69124 55900 55900 55830 70 99.87477639 0.125223614
Larvae_5 139360 116427 116035 116119 115368 104819 104819 103389 1430 98.63574352 1.36425648
Larvae_6 149082 128228 127699 127838 126821 109922 109922 109896 26 99.97634686 0.023653136
Larvae_7 133744 113178 112628 112877 111642 93885 93885 93885 0 100 0
Larvae_8 142614 120690 120308 120441 119418 105092 105092 101903 3189 96.96551593 3.034484071
Larvae_9 136987 117459 116822 117101 115672 93840 93840 93748 92 99.90196078 0.098039216
Larvae_10 143251 125898 125122 125559 123729 102846 102846 101427 1419 98.6202672 1.379732804
Larvae_11 129573 107129 106779 106917 106251 84668 84668 84658 10 99.98818916 0.011810838

Any suggestion? Thanks in advance

Hi @STE40,

I'm not seeing any issues with the depths. You're at well over 50K reads (which would be excessive in my environment, but I tend to go shallow and high sample number over ultra deep sequencing on a few samples). I think you've exceeded expectations on your ability to do high abundance profiling, and you're likely picking up a lot of reads on the low abundance profiling.

My recommendation would be to pick a round number below the lowest depth. You can do multiple rarefaction and average if you want to make sure the models are stabilized (I tend to just run like 10 tables at the same depth, and then use q2-feature-table merge with "average" overlap to get to where I need). You may find you want to adjust for sequencing depth, particularly on your unweighted/richness metrics (Unweighted UniFrac, Jaccard, Observed Features, Chao1 - although inappropriate for denoised data, or Faiths PD) or zero-sensitive metrics, like Aitchison.

You may also want to consider how the depth impacts filtering and try to filter out your low prevalence reads based on the depth.

I would also recommend being aware of potential contamination (reagent and splashover) in your larve samples.

Best,
Justine

4 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.