fasta file analysis in QIIME2

Hi, developer,

Appreciate your time.
I have multiple fasta files, each fasta is one sample. I want to pick OTU/ASV and check chimeras in QIIME2. May I know the procedure?
Sincerely.

Brandon

Hi, I had a similar issue and resolved it by importing my samples using a manifest. I believe you are looking for this:
https://docs.qiime2.org/2019.4/tutorials/importing/?highlight=manifest

“Fastq manifest” formats

If you don’t have either EMP or Casava format, you need to import your data into QIIME 2 manually by first creating a “manifest file” and then using the qiime tools import command with different specifications than in the EMP or Casava import commands.

Format description

First, you’ll create a text file called a “manifest file”, which maps sample identifiers to fastq.gz or fastq absolute filepathsthat contain sequence and quality data for the sample (i.e. these are FASTQ files). The manifest file also indicates the direction of the reads in each fastq.gz or fastq file. The manifest file will generally be created by you, and it is designed to be a simple format that doesn’t put restrictions on the naming of the demultiplexed fastq.gz / fastq files, since there is no broadly used naming convention for these files. You can call the manifest file whatever you want. As well, the manifest format is Metadata-compatible, so you can re-use the manifest file to bootstrap your Sample Metadata, too.

The manifest file is a tab-seperated (i.e., .tsv ) text file. The first column defines the Sample ID, while the second (and optional third) column defines the absolute filepath to the forward (and optional reverse) reads. All of the rules and behavior of this format are inherited from the QIIME 2 Metadata format.

3 Likes

Hi, @ErikaGanda,
Thank you for the help. It is helpful.
May I have the help further? Can I analysis in dada2 or deblur of my imported demux.qza? Or I need to pick OTU and check chimeras?
Really appreciate it!

Sincerely.
Brandon

Hi @Brandon,
The manifest approach @ErikaGanda points you towards is the correct format if you have FASTQ files. In your original inquiry you mentioned you had FASTA files which are currently not supported with the manifest importing.

Other FASTA formats like FASTA files with differently formatted sequence headers or per-sample demultiplexed FASTA files (i.e. one FASTA file per sample) are not currently supported.

Unfortunately it sounds like you fall into this category, if you indeed have separate FASTA files for each sample. I would recommend either starting with raw FASTQ files in QIIME2 if you have access to these files, or if that is not an option perhaps you can try combining your files elsewhere (ex Qiime1) and then try importing your combined .fna file to Qiime2 as per the importing tutorial mentioned.

If you only have FASTA files without quality scores you will not be able to perform DADA2 or Deblur and would have to use OTU picking methods.

1 Like

Hi @Mehrbod_Estaki,
Thank you for your info.
My data are in this format
>A1-22751

GGTACCAGCAGCCGCGGTAATACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAA

AGAGTGCGTAGGCGGTTTAGTAAGTTGGAAGTGAAAGCCCGGGGCTTAACCTCGGAATTG

CTTTCAAAACTACTAATCTAGAGTGTAGTAGGGGATGATGGAATTCCTAGTGTAGAGGTG

AAATTCTTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGTCATCTGGGCTACAACTGA

CGCTGATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGT

AAACGATGAGTGCTAGATATCGGAAGATTCTCTTTCGGTTTCGCAGCTAACGCATTAAGC

ACTCCGCCTGGGGAGTACGGTCGCAAGATTAAACCTCAAAGGAATTGACGGAGTCTC

>A1-25524

GGTACCAGCAGCCGCGGTAATTCGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAA

AGAGTGCATAGGCGGTTTAGTAAGTTGGAAGTGAAAGCCCGGGGCTTAACCTCGGAATTG

CTTTCAAAACTACTAATCTAGAGTGTAGTAGGGGATGATGGAATTCCTAGTGTAGAGGTG

AAATTCTTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGTCATCTGGGCTACAACTGA

CGCTGATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGT

AAACGATGAGTGCTAGATATCGGAAGATTCTCTTTCGGTTTCGCAGCTAACGCATTAAGC

ACTCCGCCTGGGGAGTACGGTCGCAAGATTAAACCTCAAAGGAATTGACGGAGTCTC

I tried
qiime demux summarize
> --i-data demux.qza
> --o-visualization demux.qzv
But when I use
qiime vsearch dereplicate-sequences --i-sequences seqs.qza --o-dereplicated-table table.qza --o-dereplicated-sequences rep-seqs.qza
It gives me error

Plugin error from vsearch:

Parameter 'sequences' received an argument of type FeatureData[Sequence]. An argument of subtype SampleData[JoinedSequencesWithQuality] | SampleData[SequencesWithQuality] | SampleData[Sequ$

See above for debug info.

Are there any other way for me to analyze this data?

Thanks so much.

Hi @Mehrbod_Estaki,
Thank you for your info.
My data are in this format

A1-22751

GGTACCAGCAGCCGCGGTAATACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAA

AGAGTGCGTAGGCGGTTTAGTAAGTTGGAAGTGAAAGCCCGGGGCTTAACCTCGGAATTG

CTTTCAAAACTACTAATCTAGAGTGTAGTAGGGGATGATGGAATTCCTAGTGTAGAGGTG

AAATTCTTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGTCATCTGGGCTACAACTGA

CGCTGATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGT

AAACGATGAGTGCTAGATATCGGAAGATTCTCTTTCGGTTTCGCAGCTAACGCATTAAGC

ACTCCGCCTGGGGAGTACGGTCGCAAGATTAAACCTCAAAGGAATTGACGGAGTCTC

>A1-25524

GGTACCAGCAGCCGCGGTAATTCGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAA

AGAGTGCATAGGCGGTTTAGTAAGTTGGAAGTGAAAGCCCGGGGCTTAACCTCGGAATTG

CTTTCAAAACTACTAATCTAGAGTGTAGTAGGGGATGATGGAATTCCTAGTGTAGAGGTG

AAATTCTTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGTCATCTGGGCTACAACTGA

CGCTGATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGT

AAACGATGAGTGCTAGATATCGGAAGATTCTCTTTCGGTTTCGCAGCTAACGCATTAAGC

ACTCCGCCTGGGGAGTACGGTCGCAAGATTAAACCTCAAAGGAATTGACGGAGTCTC

I tried
qiime demux summarize

--i-data demux.qza
--o-visualization demux.qzv
But when I use
qiime vsearch dereplicate-sequences --i-sequences seqs.qza --o-dereplicated-table table.qza --o-dereplicated-sequences rep-seqs.qza
It gives me error

Plugin error from vsearch:

Parameter ‘sequences’ received an argument of type FeatureData[Sequence]. An argument of subtype SampleData[JoinedSequencesWithQuality] | SampleData[SequencesWithQuality] | SampleData[Sequ$

See above for debug info.

Are there any other way for me to analyze this data?

Thanks so much.

Hi @Brandon,
It looks like you have FASTA files indeed but have a look at the importing tutorial with regards to the required formatting to make sure your FASTA files this.

The ID in each header must follow the format <sample-id>_<seq-id> . <sample-id> is the identifier of the sample the sequence belongs to, and <seq-id> is an identifier for the sequence within its sample.

Can you tell us how you actually have imported your separately demultiplexed FASTA files into qiime2 initially? That is to say how did you end up with your seqs.qza.

Did that work?

Also your error message from vsearch seems to have been cut off in your paste. Could you please re-run the command adding the --verbose and share with us the full error message please.

Hi, @Mehrbod_Estaki
I used code below run in qiime2-2019.1
qiime tools import
> --input-path seqs.fna
> --output-path sequences.qza
> --type ‘FeatureData[Sequence]’
About vsearch error
Traceback (most recent call last):
File “/home/miniconda2/envs/qiime2-2019.1/lib/python3.6/site-packages/q2cli/commands.py”, line 274, in call
results = action(**arguments)
File “</home/miniconda2/envs/qiime2-2019.1/lib/python3.6/site-packages/decorator.py:decorator-gen-128>”, line 2, in dereplicate_sequences
File “/home/miniconda2/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 199, in bound_callable
self.signature.check_types(**user_input)
File “/home/miniconda2/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/core/type/signature.py”, line 301, in check_types
name, kwargs[name].type, spec.qiime_type))
TypeError: Parameter ‘sequences’ received an argument of type FeatureData[Sequence]. An argument of subtype SampleData[JoinedSequencesWithQuality] | SampleData[SequencesWithQuality] | SampleData[Sequences] is required.

Plugin error from vsearch:

  Parameter 'sequences' received an argument of type FeatureData[Sequence]. An argument of subtype SampleData[JoinedSequencesWithQuality] | SampleData[SequencesWithQuality] | SampleData[Sequences] is required.

See above for debug info.

Thank you for your kindness. :star_struck:

Hi @Brandon,

So as the error message implies your seqs.fna file is of the type FeatureData[Sequence] which is not a supported type in dereplicate-sequences. This is because when you imported your file you have set the type incorrectly. See the OTU clustering tutorial for an example of this workflow.

That being said, your FASTA files actually are not in the right format either, see my previous comment about the header ID requirements. You’ll have to change these files so that they follow the <sample-id>_<seq-id> header format prior to importing in order for this to work.
Good luck!

1 Like

Thank you @Mehrbod_Estaki I see it. Appreciate your help.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.