Discrepancy between feature-table QZA and QZV

brfrancis · July 25, 2019, 12:12pm

Good Morning All,

I've recently been using Qiime2 Studio and wanted to do some calculations myself over in R to familiarise myself with the data before pipelining the study analysis.

I have a file "Feature Table 1.qza" that I ran through the feature-table > Summarize table menu. In the "Interactive Sample Detail" tab, I found that there were five samples with zero counts.

S7 0
S78 0
S51 0
S36 0
S9 0

Using the qiime2R package in R, I read the "Feature Table 1.qza" file and performed colSums on these zero IDs above:

S7 18150
S78

Nicholas_Bokulich · July 25, 2019, 12:48pm

Hi @brfrancis,
Welcome to the forum!

Would you mind sharing “Feature Table 1.qza” so that we could take a look? You can send it in a private message to me (click on my icon and hit the "message" button) if you don't want to post the data publicly.

jbisanz · July 25, 2019, 9:04pm

This behaviour is interesting as at last check there was an underlying issue with read_biom (the function used by qiime2R) importing tables that had samples with 0 counts. Sharing your artifact would be really helpful to see what is going on!

brfrancis · July 26, 2019, 11:33am

Feature Table 1.qza (303.4 KB)

I don't think I have messaging privileges, so attaching with this reply. Thank you for any help you can give.

brfrancis · July 26, 2019, 11:34am

Thanks both, apologies but the second half of my message seems to have been hacked off! I'll send the QZA along to Nicholas, really appreciate the help!

Nicholas_Bokulich · July 26, 2019, 11:57am

Thanks for sharing, @brfrancis,

The feature table you sent definitely has no reads for those samples you listed:

>>> import qiime2
>>> import pandas as pd
>>> tab = qiime2.Artifact.load('FeatureTable1.qza')
>>> df = tab.view(pd.DataFrame)
>>> df = df.loc[['S7', 'S78', 'S51', 'S36', 'S9']]
>>> df.sum(axis=1)
S7     0.0
S78    0.0
S51    0.0
S36    0.0
S9     0.0

So this looks like it is probably an issue with read_biom, as @jbisanz mentioned.

If you have any familiarity with python you could use pandas to explore your data in R-like dataframes. Otherwise, you could bypass the buggy read_biom and instead:

export your feature table to a biom
use biom convert --to-tsv to convert to a TSV file
load that TSV into R

jbisanz · July 26, 2019, 3:33pm

I was able to reproduce this behavior:

  SampleID read_biom_count `biom-convert_count` `feature-table_count`
  <chr>              <dbl>                <dbl>                 <dbl>
1 S36                   22                    0                     0
2 S51               500667                    0                     0
3 S7                 18150                    0                     0
4 S78                  883                    0                     0
5 S9                  1193                    0                     0

The good news is that the non-zero samples counts per feature are faithfully imported for the other samples:

.

The counts for the zero-samples are being put into only 2 features which would hopefully tip off most users that there is an issue as these samples would be extreme outliers in every analysis. I was able to reproduce this behavior in a second dataset.

I have identified that rbiom (GitHub - cmmr/rbiom: Interact with Biological Observation Matrix files.) does not have this behavior and rather just does not important zero samples at all. I will switch the function used in qiime2R in the very near future.

Jordan

brfrancis · July 26, 2019, 3:34pm

Thanks so much for your reply. I'll work in Python instead, makes far more sense!

Seems like odd behaviour from the qiime2R package but imagine these translators are tough to create!

system · August 26, 2019, 9:34pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.