OTU sample depth presence/absence matrix

Hi all!

I have a OTU table of 1495 fungal OTUs for 481 wood samples. These are the result of ITS2 Illumina sequencing. I’m planning to perform alpha and beta diversity analyses with a presence / absence matrix, then a NMDS. I already removed OTUs with less than 10 reads. However, I want to make sure that I’m using a reasonable and biologically meaningful number of OTUs for the following analyses. Since I am quite new to NGS, I am not sure how to proceed after this step.

I was wondering what sequence depth do you recommend as reliable in order to remove samples that could cause significant noise in the downstream analyses?

I see in my summary that the min counts (1) is too low compared to the maximum counts(104940). Most papers suggest rarefaction of the OTU table, however, it seems the criteria of how to do it is variable.

Do I need any type of rarefaction?

Thanks for any help you can provide

.OTU-summary.txt (7.4 KB)

Hi @patoan5,
You will definitely need to rarefy your data to use most alpha and beta diversity metrics as many of these assume there are an equal number of sequences per sample. There is not a generally applicable rule for what depth to choose though, so this is a common question.

Since the range is so large in your data, I might suggest trying a couple of rarefaction depths - maybe 1000 and 5000, and then comparing the results to see if you observe any important differences at these depths (e.g., do you observe similar patterns in your NMDS plots at the two rarefaction depths? if so, either is probably ok.). I choose these values by looking at the summary you gave me, but it is definitely subjective.

One point based on your description:

I’m planning to perform alpha and beta diversity analyses with a presence / absence matrix, then a NMDS.

Are you sure that you want to perform this on a presence / absence matrix, rather than on the matrix that includes counts (i.e., the standard output from OTU picking or dereplication workflows)? Many of the diversity metrics themselves ignore the counts (i.e., they’re qualitative, so they effective treat the matrix as presence / absence data), but generally we leave the matrix as-is so we have the option to use either qualitative or quantitative metrics.

Does this help?


An off-topic reply has been split into a new topic: Rarefaction and alpha diversity

Please keep replies on-topic in the future.


Thanks for your response. I thought my reply was already sent but it seems that not and I just found out that I hit enter but not “reply”. I highly appreciate your advice. I used counts data as you suggested for diversity analyses. Then I performed NMDS with both matrices, presence/absence and counts log (x+1) transformed. It seemed to worked well…


1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.