Help on differential abundance analysis of different library sizes (>60x)


I’m trying to do differential abundance analysis between gut samples and mesenteric lymph nodes. Understandably, the relative abundance in lymph nodes is very low. The challenge I’m having is how to do the differential abundance analysis between those groups, given that:

  1. The mean library size difference between the groups is >10x. ( Lymph node= 995, gut= 70,000).
  2. The smallest library size is 311, the second smallest is 492.
  3. The sample size of the groups is small. For some groups is 4 and for some is 5.

I f I do rarefying, I would loose two samples in the group of 4, so I would have only two samples left. I’ve read in the literature than DESEQ2 is sensitive for small sample sizes. However, I don’t know how to handle this enormous differences in library sizes and if rarefying to 492 would make any sense considering the other libraries have > 60,000 reads.

Thank you very much for your help on this issue.

Hi @claudia,

I think comparing the Lymph nodes and gut quantitatively will be really difficult. (And Im one of the first people to say "do it quantitatively".) So, you may want to think about your goal in this analysis and what your hypothesis is. Do you want to figure out if the taxa in the lymphnodes are a subset of the gut organisms? If the lymphnode looks more similar to the gut? Something else? Because all of those might be different approaches.

Additionally, because I have to be that person... how do you know the sequences in your lymphnodes are real sequences and not external contamination?

i think there's good evidence (including from the original DeSeq paper) that you don't want to rarify the data. However, DeSeq actually isn't the best option either (see Weiss et al, Gloor et al).

Sorry for (probably) creating more questions than answers.