FeatureTable[Frequency] and [Sequence] from external DADA2 output

Hello all, I'm having some issues importing feature data given a DADA2.txt file from outside of QIIME (qiime2/2021.2).

Aim

I'd like to run feature-classifier to classify sequences found in headers of my count table using greengenes as a reference, then build a tree.

Input

My file is a count file (as seen below), and has a corresponding metadata file that pairs "Run" labels with phenotype information.

I am able to import this file as a FeatureTable[Frequency] table, no problem (called asvTable.qza). I am also able to upload greengenes taxonomy and access the provided gg 13-8 classifier used in qiime.

Problem

Obviously, running feature-classifier with asvTable.qza for --i-reads does not work.

qiime feature-classifier classify-sklearn --i-classifier greengene/gg-13-8-99-nb-classifier.qza --i-reads avsTable.qza --o-classification gg_refTaxonomy.qza

yields error: 'Invalid value for '--i-reads': Expected an artifact of at least type
FeatureData[Sequence]. An artifact of type FeatureTable[Frequency] was
provided.'

But I have no current files that are compatible with the required input.

Question

Using the input I have and my reference files, is it possible to generate a FeatureTable[Sequence] artifact? I have tried making an admittedly redundant .fa file from my column headers, reading the semantic types page, consulting the Overview and moving pictures tutorials, and have read a small handful of other forum topics and am still struggling to find a solution (if one even exists). Thanks in advance for any feedback

Would you mind sharing the imported FeatureTable[Frequency] file?

The problem is that the classification step does not care about the counts of each OTU in each sample, which is what FeatureTable[Frequency] stores. It wants the FeatureData[Sequence] file, which provides the actual sequences to be classified, mapped to the OTU ids.

Usually both these files are outputted by the denoising step. By sharing your qza we can figure out what ids are being used for the OTUs and possibly make a FeatureData[Sequence] artifact.

Thanks.

Thanks so much for the response and the context. Please find attached my FeatureTable[Frequency] artifact.

freqTable.qza (2.0 MB)

@AttilaTheBun,

Okay, there are two things you should do.

Your FeatureTable[Frequency] artifact is improperly structured (not your fault, happened during the import process). Its columns need to be the rows, and the rows the columns. You need to transpose it with $qiime feature-table transpose. This won't fix the classifier problem, but it will make next steps in the qiime ecosystem easier.

To fix the classification step problem, you need a FeatureData[Sequence] artifact, as discussed. You can make one of these yourself. All you need to do is make a fasta file where the headers are your sequences and the sequences are your sequences (yes, you read that correctly).

The structure of this .fasta file will then be:

>ACTGTG...
ACTGTG...
>GATTAC...
GATTAC...
(...)

Then you can import this as a FeatureData[Sequence] artifact and rerun the step you got hung up on.

If any of this didn't make sense, just reach out again!

Thanks.

To follow-up with @AttilaTheBun's suggestion:
Assuming you are using DADA2 in R, you can create this representative sequence file using DADA2's own commands in R:

uniquesToFasta(asvTable, fout='rep-seqs.fna',ids=colnames(asvTable))

This assumes you haven't changed the structure of the DADA2 table output itself of course.

Then to get that new .fna file into QIIME 2:

#First we import our rep-seqs file.
qiime tools import \
--input-path rep-seqs.fna \
--output-path rep-seqs.qza \
--type "FeatureData[Sequence]"
1 Like

Success! Thank you so much for the help, I appreciate it.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.