How to import a BIOM file containing sequences?

peterjc · March 1, 2024, 5:31pm

I have been trying qiime2-amplicon-2024.2 on Linux, which I hope to analyses a dataset in BIOM 2.1 "OTU table" format ( ASV vs sample counts, with sequence as metadata). I can import the BIOM file fine with:

$ qiime tools import --input-format BIOMV210Format \
        --type "FeatureTable[Frequency]" \
    	--input-path asv_table.biom \
    	--output-path asv_table.qza

However, this does not seem to handle the sequence information, as trying to run a classifier on it fails:

$ qiime feature-classifier classify-sklearn \
	--i-classifier classifier.qza \
	--i-reads asv_table.qza \
	--o-classification taxonomy_results.qza 
...
(1/1) Invalid value for '--i-reads': Expected an artifact of at least type
FeatureData[Sequence]. An artifact of type FeatureTable[Frequency] was
provided.

I've seen Import Dada2 ASV feature table and add taxonomy which suggests to make a FeatureData[Sequence] QZA file I need to import a FASTA file. Is that the only way?

i.e. BIOM "OTU table" with sequences --> FASTA file --> Qiime2 FeatureData[Sequence] QZA --> Qiime2 classifier.

And separately, BIOM "OTU table" --> Qiime2 FeatureTable[Frequency], which can be combined with the classifier output later?

Or rephrasing, is there anything built in combining FeatureTable[Frequency] and FeatureData[Sequence] into one QZA datafile? Semantic types — QIIME 2 2024.2.0 documentation suggests not.

Thanks all!

cherman2 · March 1, 2024, 5:43pm

Hi @peterjc

This is how we typically import sequences.

I am not really sure how your sequences are organized in your biom table. Could you summarize your feature table and post it hear so I could take a look?

The q2-clawback(used for making bespoke env weights for classifiers) tutorial has a command called

qiime clawback sequence-variants-from-samples

This could help if your raw sequences are the column names in your table.

I hope that helps!

peterjc · March 1, 2024, 6:28pm

Thanks @cherman2 for your reply.

It is useful for know that FASTA import is the norm.

The sequences are a column of BIOM sequence metadata under the name "Sequence", e.g.

$ biom summarize-table -i example.biom
Num samples: 122
Num observations: 99
Total count: 531,728
Table density (fraction of non-zero values): 0.039

Counts/sample summary:
 Min: 0.000
 Max: 17,239.000
 Median: 4,036.500
 Mean: 4,358.426
 Std. dev.: 2,455.195
 Sample Metadata Categories: None provided
 Observation Metadata Categories: Sequence

Counts/sample detail:
...

i.e. $ biom export-metadata -i /tmp/thapbi_pict/woody_hosts/woody_hosts.tally.biom --observation-metadata-fp /dev/stdout outputs two columns: my ASV identifiers, and their sequence.

I can modify the metadata name from "Sequence" if something slightly different would facilitate Qiime2 import.

colinvwood · March 4, 2024, 5:31pm

Hello @peterjc,

The issue isn't getting the qiime import to work--you've already done that--but having the right type of thing to perform classification on. In qiime we store the asv sequences separately from the feature table.

It sounds like in your case your ASVs are labeled by their actual DNA sequences, is that correct? If you're unsure you can you can use the qiime feature-table summarize command as Chloe suggested. If this is indeed the case then you can use the qiime clawback sequence-variants-from-samples command to separate the sequences from the feature table and allow them to be classified, as Chloe suggested.

peterjc · March 4, 2024, 6:44pm

Thanks @colinvwood for your reply.

No, the ASV sequences have names (something based on the MD5 checksum), with the actual sequence recorded as observation metadata (see Observation Metadata Categories: Sequence in the snippet of output from biom summarize-table shared above).

So the problem is that while I can import the BIOM file as FeatureTable[Frequency] this ignores the sequence information in the BIOM file.

I can make a matching FASTA file, and am exploring using Qiime by importing that as FeatureData[Sequence]. It just seems like it would be more elegant if I could import the sequences as part of importing this annotated BIOM file.

colinvwood · March 4, 2024, 11:18pm

Hello @peterjc,

I see. That's a cool idea, you could open a GitHub issue if interested. Let us know if you're still having trouble after importing the sequences.

peterjc · March 6, 2024, 9:46am

Issue filed as FEAT: Importing sequence data from BIOM files · Issue #315 · qiime2/q2-types · GitHub

I was able to use the FASTA to FeatureData[Sequence] and BIOM to FeatureTable[Frequency] import route successfully, thank you.

system · April 6, 2024, 3:47pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.