Corrupted BIOM export?

Hey!

I am experiencing some issues with trying to get my feature table out of Qiime2.

The version of qiime2 I am using is 24.10-amplicon.

The issue I am having is seemingly corrupted or flawed feature table file. I have a feature table that I have filtered into sub-cohorts. I have been developing my analysis scripts with a small sub-cohort without issues.

The filtering has been done with;

!qiime feature-table filter-samples
--i-table qiime/cohort/table.qza
--m-metadata-file qiime/cohort/BCA/bca_samples.tsv
--o-filtered-table qiime/cohort/BCA/table.qza

!qiime feature-table filter-seqs
--i-data qiime/cohort/rep-seqs.qza
--i-table qiime/cohort/BCA/table.qza
--o-filtered-data qiime/cohort/BCA/rep-seqs.qza

I then move to an R enviroment and proceed. Now.. for some reason doing the same filtering for a different subset of samples results in a table.qza that fails to read into my analysis pipeline.

I have also tried;

!qiime tools export
--input-path qiime/cohort/BCA/table.qza
--output-path qiime/cohort/BCA/exported-table

To get a .biom file and then reading it into R with the biomformat package. The smaller subset, no issues, everything works fine. But when trying to read in the larger subset feature table I get an error;

biomFile <- read_biom(biom_file_path)
Error in read_biom(biom_file_path) : Both attempts to read input file:
qiime/cohort/CC/exported-table/feature-table.biom
either as JSON (BIOM-v1) or HDF5 (BIOM-v2).
Check file path, file name, file itself, then try again.
In addition: Warning messages:
1: In strsplit(conditionMessage(e), "\n") :
unable to translate 'lexical error: invalid char in json text.
<89>HDF (right here) ------^
' to a wide string
2: In strsplit(conditionMessage(e), "\n") : input string 1 is invalid

I am preplexed as to why the smaller subset works without issues, both the R package 'mia' and 'biomformat' read it in from the .qza and .biom files without issue. But the other filtered feature-table both as an .qza and exported .biom does not. I have tried in multiple environments and the problem persists. Smaller subset works downstream, larger subset spits out errors when trying to read in.

Sadly the data is sensitive and I can not provide any files. But I hope someone can provide some small insight or suggestion on how to troubleshoot this?

Understood. Yeah, I also work with data I can't post publically.

Do you think it would be okay to privately share the problem file with just the qiime2 devs? You can send us a private message by clicking on my profile image then selecting 'Message.'

This is the most direct way forward, though I understand if it's not possible.

I've been advised to rerun the analysis with a sanitized metadata file to be on the safe side.

If the problem replicates I will send over the sanitized .qza file/s causing issues to you privately.

I'll get it done within this week.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.