I have been trying qiime2-amplicon-2024.2 on Linux, which I hope to analyses a dataset in BIOM 2.1 "OTU table" format ( ASV vs sample counts, with sequence as metadata). I can import the BIOM file fine with:
However, this does not seem to handle the sequence information, as trying to run a classifier on it fails:
$ qiime feature-classifier classify-sklearn \
--i-classifier classifier.qza \
--i-reads asv_table.qza \
--o-classification taxonomy_results.qza
...
(1/1) Invalid value for '--i-reads': Expected an artifact of at least type
FeatureData[Sequence]. An artifact of type FeatureTable[Frequency] was
provided.
I am not really sure how your sequences are organized in your biom table. Could you summarize your feature table and post it hear so I could take a look?
The q2-clawback(used for making bespoke env weights for classifiers) tutorial has a command called
qiime clawback sequence-variants-from-samples
This could help if your raw sequences are the column names in your table.
It is useful for know that FASTA import is the norm.
The sequences are a column of BIOM sequence metadata under the name "Sequence", e.g.
$ biom summarize-table -i example.biom
Num samples: 122
Num observations: 99
Total count: 531,728
Table density (fraction of non-zero values): 0.039
Counts/sample summary:
Min: 0.000
Max: 17,239.000
Median: 4,036.500
Mean: 4,358.426
Std. dev.: 2,455.195
Sample Metadata Categories: None provided
Observation Metadata Categories: Sequence
Counts/sample detail:
...
i.e. $ biom export-metadata -i /tmp/thapbi_pict/woody_hosts/woody_hosts.tally.biom --observation-metadata-fp /dev/stdout outputs two columns: my ASV identifiers, and their sequence.
I can modify the metadata name from "Sequence" if something slightly different would facilitate Qiime2 import.
The issue isn't getting the qiime import to work--you've already done that--but having the right type of thing to perform classification on. In qiime we store the asv sequences separately from the feature table.
It sounds like in your case your ASVs are labeled by their actual DNA sequences, is that correct? If you're unsure you can you can use the qiime feature-table summarize command as Chloe suggested. If this is indeed the case then you can use the qiime clawback sequence-variants-from-samples command to separate the sequences from the feature table and allow them to be classified, as Chloe suggested.
No, the ASV sequences have names (something based on the MD5 checksum), with the actual sequence recorded as observation metadata (see Observation Metadata Categories: Sequence in the snippet of output from biom summarize-table shared above).
So the problem is that while I can import the BIOM file as FeatureTable[Frequency] this ignores the sequence information in the BIOM file.
I can make a matching FASTA file, and am exploring using Qiime by importing that as FeatureData[Sequence]. It just seems like it would be more elegant if I could import the sequences as part of importing this annotated BIOM file.