FeatureData[Sequence] import?

hharder · April 5, 2023, 3:52pm

I am trying to run picrust2. I ran the following code:

qiime picrust2 full-pipeline --i-table sequence_variants.qza --i-seq taxonomy_2.qza --output-dir q2-picrust2_output --p-threads 1 --p-hsp-method mp --p-max-nsti 2 --verbose

I got an error stating "Invalid value for "--i-seq": Expected an artifact of at least type FeatureData[Sequence]. An artifact of type FeatureData[Taxonomy] was provided." I tried to reimport my data as a FeatureData[Sequence] using this code:

qiime tools import --type FeatureData[Sequence] --input-path taxon.tsv --output-path taxonomy_2.qza

But it said it was not a valid FASTA file. I have also attached the .tsv file that I am trying to convert.

taxon.tsv (1.2 MB)

How can I create this FeatureData[Sequence] data to use in Picrust2?

SoilRotifer · April 5, 2023, 4:12pm

Hi @hharder, the error message is correct. It is not a FASTA sequence (i.e. FeatureData[Sequence]) file. The tsv file you shared should be imported as a FeatureData[Taxonomy] file.

As outlined in the --help text you can run these commands to determine the available types and formats you can import:

qiime tools import --show-importable-types 
qiime tools import --show-importable-formats

Try importing like this:

qiime tools import \
    --type 'FeatureData[Taxonomy]' \
    --input-path taxon.tsv \
    --output-path taxonomy_2.qza

If this fails, you can try adding the flag: --input-format 'HeaderlessTSVTaxonomyFormat' to the above command.

Also, are you sure that the sequence_variants.qza file you are passing into --i-table is actually your feature table and not your sequence file? The file name suggests to me that you are passing in a sequence file.

hharder · April 5, 2023, 4:40pm

My sequence_variants.qza file looks like this:

sequence_variants.qza (187.4 KB)

I am not really clear on what needs to go into the --i-table and --i-seq sections. I thought that --i-table was my abundance values, which is in the sequence-variants file, and that --i-seq is the translation from the codes to ASVs, which is my taxon.tsv file. I have been working on already outputted data for downstream processing so I've been kinda guessing my way along.

SoilRotifer · April 5, 2023, 5:03pm

Okay that is correct. Viewing the information via QIIME 2 View, shows that this is a table, FeatureTable[Frequency]. Often when a file contains the word "sequence" it is assumed to contain sequence data, otherwise most use the the term "table" within the file name. I just wanted to confirm.

Simply call qiime picrust2 full-pipeline --help. This will bring up the help documentation that explicitly defines the files and parameters you can input or set. Do you have the sequence file?

I suggest working through general :qiime2: documentation and tutorials. Have you walked through the q2-picrust2 tutorial?

hharder · April 5, 2023, 5:13pm

I was following an Aldex2 tutorial and some of the naming conventions got a little mixed up. I think that's why my titles are a bit off.

--i-seq ARTIFACT FeatureData[Sequence]
Sequences (e.g. ASVs or representative OTUs)
corresponding to the abundance table given.

I don't think I have a file that looks like this. I have the file that says the OTU ID (ex. 04e96f8e3e9a8aaea0cf71feba8b0e16) and then the taxon identifier (k__Bacteria; p__Firmicutes; c__Erysipelotrichi; o__Erysipelotrichales; f__Erysipelotrichaceae; g__Allobaculum; s__). This would be a FASTA type? I don't have any file types from my collaborator that are labeled FASTA. I have three biom tables but those all seem to be abundance data. Would it be exported anywhere as a qza file in the Qiime pipeline?

I have read through the tutorials and they have been super helpful. The problems I mostly have been having is just finding the appropriate files (since I didn't generate them).

SoilRotifer · April 5, 2023, 5:21pm

This is the taxonomy which was assigned to the sequence of the given feature-id 04e96f8e3e9a8aaea0cf71feba8b0e16. The taxonomy is not needed for the q2-picrust command you are running.

No. As I explained in one of my earlier responses, FASTA is a sequence file. See this wiki.

No worries. I think, in this case, you should ask your collaborator for the representative sequence file which should correspond to the features in your feature-table. That is, the IDs in the table should match those in the FASTA headers. PICRUSt 2 requires the actual sequences to perform its processing.

hharder · April 5, 2023, 5:30pm

Thank you - after looking at the wiki I don't think I have that file. I have emailed my collaborator - hopefully he can share these sequences for me so I can keep going!

system · May 6, 2023, 11:31pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.