Hi all and @jbisanz,
As noted in the the title, I noticed some very odd behavior in how qiime2R handles samples with zero counts. I was hoping to get some insights as to why this is happening.
Package versions: Qiime 2019.10, R version 3.6.1, qiime2R 0.99.11.
I am characterizing microbial isolates, so my data is much, much simpler (and lower coverage) than complex communities. I used Qiime2 to process and classify my data, and then used qiime2R to import my data into R, where where I do downstream filtering/analysis in phyloseq. However in my finalized data I noticed that there were a large number of samples that had identical read counts and compositions. Here's a snippet of that dataset:
Since this seemed unlikely to be real, I manually checked these samples on Qiime2 View, and in an exported BIOM file. I found that most of these "duplicate" samples actually had 0 total reads in the original file. Indeed, they do appear to be duplicates. Take P7-D9 and P7-E1 (highlighted in yellow) for example: one of them is the real sample with that composition, and the other is a 0 count sample that somehow was filled with that data. Even more strange, are the samples in red. None of them are real, they are all 0 count samples. After poking around in my data, my best guess is that they are a partial copy of another sample, given their read count/ASV classification.
I suspect that the conversion of 0 read samples into false samples takes place in the qiime2r command read_qza, because the false read counts are present in the $data section of my read_qza object. The nonzero samples all look fine, as far as I can tell.
I saw in a couple threads that there is an issue with qiime2R and samples with 0 total reads, such as Qiime2r file read issue. But unlike this thread, the read_qza did not throw an error or warning, so I didn't realize there was a problem until I was looking at my output data, line by line.
Is it supposed to do this or am I making a mistake somewhere? My current plan is to simply remove 0 count reads in qiime2 before exporting to them to R, but if this is a true action of this function, could the fact that 0 count samples must be removed before exporting be stated explicitly in the qiime2R tutorial?
Thanks,
Caroline