Unifrac distance matrix contains 0s between unalike samples

IT_grace · October 25, 2022, 5:32pm

Hello! I have run into an interesting problem with my data that I cannot seem to find an answer for. For context:
I have 16S rRNA sequences that I have run through DADA2. The data I collected comes from two sources: algae and water from an aquaculture system. All samples (1203 total) were processed in DADA2 in qiime2 v2021.8 using a workflow that my lab has used repeatedly with success for a variety of datasets. For the downstream analysis, I intend to use unweighted unifrac.

The problem I found was the unifrac distance matrix (calculated from a subset of the water samples) contained many 0 values between multiple, different samples. It's my understanding a 0 indicates two samples as being identical, which makes sense when comparing a sample to itself. But this matrix contains many 0s sporadically assigned. My first thought was that I encountered an issue when filtering the data post-DADA2 into two datasets (algae with 402 samples and water with 801 samples), but the matrix containing all 1203 samples also contains 0s. There does not seem to a pattern to which samples were deemed identical or not. As in, the problem does not appear isolated to specific samples or subsets of samples.

I was wondering if anyone else has run into the same issue with unifrac and, if so, how did you troubleshoot? I am currently rerunning DADA2 with just the water samples to see if the algal samples were the cause of the problem. However, my labmates and I are confused given how often we use this method/workflow without encountering this issue.

cherman2 · October 27, 2022, 6:47pm

@IT_grace,
Welcome to the :qiime2: forum!
This is weird enough that we should investigate a little more.

Since unweighted unifrac only looks at presence/absence it could be possible that these samples have the same features in different abundances but unweighted unifrac doesn't look at abundances. Also since unweighted unifrac is phylogenetic, they don't even have to be the same feature just at the same place on the tree.

First I would run this using Jaccard (present/absence but not phylogenetic) and Weighted Unifrac(abundance and phlyogenetic) if those don't have any weird 0s then I would say this is weird but probably correct.

Also, can you DM me your .qza of the matrix I would be interested in seeing the provenance!

Let me know if you have any questions!

IT_grace · October 31, 2022, 4:11pm

Hello! Thank you for reaching out to me!

Firstly, apologies for my typo. I am using weighted unifrac, and it does contain the weird 0s.

As an update on my troubleshooting: when running just the water samples, the matrix did not contain the weird 0s. When running the algal + water samples and just algal samples, the matrix had the 0s.

And I am new to qiime forum, so I am not sure how to DM the .qza of the matrix.

cherman2 · November 4, 2022, 6:27pm

@IT_grace ,
This is really interesting and not what I was expecting. I looked at your provenance of your distance matrix and it looks reasonable.
Have you looked at other distance matrices? It could give us information about why these samples are the same according to weighted Unifrac. Let me know if Unweighted, Jaccard, and Bray Curtis also have these weird zeros. That could give us more info!
Thanks!

IT_grace · November 7, 2022, 4:25pm

Hello! I think a colleague and I have located the issue, so I wanted to pass this along.

We had set the sampling depth to 1 when running the core metrics step. Here's the script used:
qiime diversity core-metrics-phylogenetic --i-phylogeny {rooted_tree.qza file} --i-table {~/table.qza} --p-sampling-depth 1 --output-dir {~/10_core_metrics} --m-metadata-file {metadata.tsv file} --verbose
We mistakenly interpreted "sampling depth" to be a cut off parameter. As in, take all samples with more than 1 sequence thereby taking all samples. In troubleshooting, we realized sampling depth actually refers to the number of times each sample will be rarefied by qiime2. So, the script was telling qiime to only rarefy each sample once. Please correct me if I am wrong in this understanding of sampling depth. Regardless, when setting the depth much higher, the issue with the matrix was resolved.

To answer your question, yes, I was also looking at the Bray Curtis matrix since I could readily compare it to the BC matrix Primer calculates. Strangely, with a sampling depth of 1, the Bray Curtis matrix has nothing but 1s and 0s as if it was binary. Setting a higher depth solved this issue as well.

Thank you for your communications! Talking it out here on the forum helped me think through the issue!

system · December 8, 2022, 10:26pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.