Technical replicates - different faith_pd index

thedam · February 25, 2021, 1:51am

Hi All,

I've sequenced two times the same batch of 30 samples.
Each sample from Batch_1 is covered with ~160k reads.
Each sample from Batch_2 is covered with ~40k reads.

Then I made two Qiime2 analysis approaches:

I've put all .fastq files to the same input file (PairedEndFastqManifestPhred33V2)
I've analysed two batches separately

In both cases, using Weighted UniFrac the samples clustered perfectly together and I am happy. With another Beta-indexes the samples don't cluster together every time, so I don't consider them.

But I am very unhappy with Alfa indexes - every sample from Batch_1 (the batch with higher raw coverages) has bigger index than corresponding sample from Batch_2.

For some tests, I've removed from each Batch features, that constitute less then 0.1% of total features, but still the Alfa indexes are not close to each other.

Is there a way to make such analysis comparable in terms of Alfa diversity? Especially I mean Faith_PD, as I trust it the most.

Thanks for suggestions!!

ps. I use vsearch, q-score, deblur and sepp tree.

jwdebelius · February 25, 2021, 5:32am

Hi @thedam,

This is the hazard of differences in depth. The actual value of a richness metric is pretty sensitive to your sequencing depth, generation method (denoising algorithm, OTU clustering), and rarefaction depth. Because the absloute value isn't externally valid (like you're observing), I think looking at a relatuionship using some sort of normalization for richness can help. I tend to z-normalize my richness metrics (Ive found they're asymptotically normal) - so I subtract the mean and divide by the standard deviation. Then, your question becomes whether the relationship and relative diversity is consistent - not the absolute value. In the same vein, you could also check the direct correlation: do the values correlate, even if batch 2 is higher?

You could also try working with an approach that is less sensitive/better able to handle differences in depth. Shannon diversity will be less sensitive than the pure richness metrics (observed features, faith's pd) and tends to saturate. Do you see behavior like that in your data? I think there's an evenness weighted variant of Faith's PD which is analogous somewhere, but I dont know the name or citation off the top of my head. (One of the other mods might, though, and can hopefully jump in).

You could explore an option like breakaway (although you may have better luck with the R version, I'm not sure when q2-breakaway was last updated) which accounts for sampling depth and makes uncertainity estimates.

Finally, as a general observation, when I model richness, I often incorperate the sequencing depth into my model, either as a linear or log term (depends on what version makes my model fit best). If I were using your data in a linear mixed effects model accounting for the replicate, I might do that.

Best,
Justine