Hey!
I am experiencing some issues with trying to get my feature table out of Qiime2.
The version of qiime2 I am using is 24.10-amplicon.
The issue I am having is seemingly corrupted or flawed feature table file. I have a feature table that I have filtered into sub-cohorts. I have been developing my analysis scripts with a small sub-cohort without issues.
The filtering has been done with;
!qiime feature-table filter-samples
--i-table qiime/cohort/table.qza
--m-metadata-file qiime/cohort/BCA/bca_samples.tsv
--o-filtered-table qiime/cohort/BCA/table.qza
!qiime feature-table filter-seqs
--i-data qiime/cohort/rep-seqs.qza
--i-table qiime/cohort/BCA/table.qza
--o-filtered-data qiime/cohort/BCA/rep-seqs.qza
I then move to an R enviroment and proceed. Now.. for some reason doing the same filtering for a different subset of samples results in a table.qza that fails to read into my analysis pipeline.
I have also tried;
!qiime tools export
--input-path qiime/cohort/BCA/table.qza
--output-path qiime/cohort/BCA/exported-table
To get a .biom file and then reading it into R with the biomformat package. The smaller subset, no issues, everything works fine. But when trying to read in the larger subset feature table I get an error;
biomFile <- read_biom(biom_file_path)
Error in read_biom(biom_file_path) : Both attempts to read input file:
qiime/cohort/CC/exported-table/feature-table.biom
either as JSON (BIOM-v1) or HDF5 (BIOM-v2).
Check file path, file name, file itself, then try again.
In addition: Warning messages:
1: In strsplit(conditionMessage(e), "\n") :
unable to translate 'lexical error: invalid char in json text.
<89>HDF (right here) ------^
' to a wide string
2: In strsplit(conditionMessage(e), "\n") : input string 1 is invalid
I am preplexed as to why the smaller subset works without issues, both the R package 'mia' and 'biomformat' read it in from the .qza and .biom files without issue. But the other filtered feature-table both as an .qza and exported .biom does not. I have tried in multiple environments and the problem persists. Smaller subset works downstream, larger subset spits out errors when trying to read in.
Sadly the data is sensitive and I can not provide any files. But I hope someone can provide some small insight or suggestion on how to troubleshoot this?