Beta diversity on not normalised samples

Hi all,

I’d like to ask some advice on the following point. I’m dealing with a dataset which include few negative controls.
Unfortunately, some of these come up with a number of sequences not far from samples (I de-noised with dada2, the run is 2x300bp on HiSeq). I’m trying to figure out if these samples are similar in composition to the rest of the samples or they are separate in taxonomy. I’m working on two ways, plotting the taxonomy and and with PCoA.
Now, I got a problem in running the ‘diversity beta’ plug in. I would like to do the analysis without any kind of normalisation, as to do a bit of diagnostic on the data.

When I run:
qiime diversity beta --i-table
…/ASVs/table-dada2.qza --p-metric braycurtis
–p-n-jobs 4 --o-distance-matrix dada2.notNormalised.diversity.bc.qza

I got the following error:

Plugin error from diversity:
Data must be symmetric and cannot contain NaNs.
Debug info has been saved to /tmp/qiime2-q2cli-err-emezfloe.log

With not much more explanation that I can found on the log.

The data contains 292 samples and 18,016 ASVs, and I’m running on a machine with enough RAM, I think.
If I try the script from QIIME1.9.1, I got a segmentation fault error, suggesting is a memory issue, but this is on a machine with 2 Tb of RAM.

Any suggestion?
(Also, any suggestion on how to deal with no so negative control samples?)
The dataset is not mine so I am afraid I can not share the data atm.

Many thanks,

Hi @llenzi,

There can be many reasons for this: e.g., reagent contamination, cross contamination, and index jumping. There are lots of other discussions on the forum about how to handle negative controls and contaminants. Short story is it’s a very open area of research, but these posts may be useful to you.

Now on to your problem:

I suspect you have one or more samples with zero reads. Since you are not doing any rarefaction, these will be retained (they are dropped automatically during single rarefaction).

You can use qiime feature-table filter-samples to drop any samples with 0 sequences (or better yet do a higher threshold).

See the discussion link above. I suspect the majority of this is cross-contamination from real samples, in which case you cannot do something like drop all of the taxa observed in the negatives (in general that’s a bad idea and one I do not recommend ever, if you read some of the other forum discussions).

I hope that helps!

Hi Nicholas,
yes, there are few samples with 0 reads after dada2 correction, I will discard these sample for a quick assessment.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.