qiime2R: samples with 0 reads become duplicates filled with data from other samples in read_qza function?

Hi all and @jbisanz,

As noted in the the title, I noticed some very odd behavior in how qiime2R handles samples with zero counts. I was hoping to get some insights as to why this is happening.

Package versions: Qiime 2019.10, R version 3.6.1, qiime2R 0.99.11.

I am characterizing microbial isolates, so my data is much, much simpler (and lower coverage) than complex communities. I used Qiime2 to process and classify my data, and then used qiime2R to import my data into R, where where I do downstream filtering/analysis in phyloseq. However in my finalized data I noticed that there were a large number of samples that had identical read counts and compositions. Here's a snippet of that dataset:

Since this seemed unlikely to be real, I manually checked these samples on Qiime2 View, and in an exported BIOM file. I found that most of these "duplicate" samples actually had 0 total reads in the original file. Indeed, they do appear to be duplicates. Take P7-D9 and P7-E1 (highlighted in yellow) for example: one of them is the real sample with that composition, and the other is a 0 count sample that somehow was filled with that data. Even more strange, are the samples in red. None of them are real, they are all 0 count samples. After poking around in my data, my best guess is that they are a partial copy of another sample, given their read count/ASV classification.

I suspect that the conversion of 0 read samples into false samples takes place in the qiime2r command read_qza, because the false read counts are present in the $data section of my read_qza object. The nonzero samples all look fine, as far as I can tell.

I saw in a couple threads that there is an issue with qiime2R and samples with 0 total reads, such as Qiime2r file read issue. But unlike this thread, the read_qza did not throw an error or warning, so I didn't realize there was a problem until I was looking at my output data, line by line.

Is it supposed to do this or am I making a mistake somewhere? My current plan is to simply remove 0 count reads in qiime2 before exporting to them to R, but if this is a true action of this function, could the fact that 0 count samples must be removed before exporting be stated explicitly in the qiime2R tutorial?

Thanks,
Caroline

Hi Caroline, this is an issue with with biomformat, but it has not been patched. A more robust method to catch this case is needed, or perhaps I will try to cook something up myself.
Jordan

2 Likes

Hi Caroline,

I just changed the method of biom import and in testing with my own files, it is faithfully importing qza files/biom files which 0-count samples. Please test this new version (v0.99.3) and let me know if it works for you!

Jordan

1 Like

Hi Jordan,

I just tried the new version – and it worked! My zero count samples now have zero counts.

Thanks so much for the speedy reply and fix, I really appreciate it.

Caroline

1 Like