Importing demultiplexed FASTA file & Mapping file

Hello,

I have sequences that are already demultiplexed which are generated using illumina sequencer.
the file is like this:
seqs.fna

3056G_401 M02997:72:000000000-AW89G:1:1101:13437:1777 1:N:0:1 orig_bc=TTGCCAAGAGTC new_bc=TTGCCAAGAGTC bc_diffs=0
TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTTGTTAAGCGAGATGTGAAAGCCCTGGGCTCAACCTGGGAACTGCATTTCGAACTGGCAGGCTAGAGTACAAGAGAGGGTGGTAGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAATACCAGTGGCGAAGGCGGCCACCTGGCTTGATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCGAACAGG
3056G_850 M02997:72:000000000-AW89G:1:1101:18541:1889 1:N:0:1 orig_bc=TTGCCAAGAGTC new_bc=TTGCCAAGAGTC bc_diffs=0
TACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTTGTTAAGCGAGATGTGAAAGCCCTGGGCTCAACCTGGGAACTGCATTTCGAACTGGCAGGCTAGAGTACAAGAGAGGGTGGTAGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAATACCAGTGGCGAAGGCGGCCACCTGGCTTGATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCGAACAGG

I also have a mapping file & it looks like this:
Fasting_Map.txt

I want to import both & proceed on removing chimeras from fasta file.
I do not have a qual file for the fasta file.

Now my questions are:

  1. Based on how the sequences inside the seqs.fna file look like, which data format are my sequences like? I am asking this because I am not sure which one to follow to import my data. Should I follow this:

Per-feature unaligned sequence data (i.e., representative sequences)
Unaligned sequence data is imported from a fasta formatted file containing DNA sequences that are not aligned (i.e., do not contain - or . characters). The sequences may contain degenerate nucleotide characters, such as N, but some QIIME 2 actions may not support these characters. See the scikit-bio fasta format description for more information about the fasta format.
Importing data
qiime tools import *
** --input-path sequences.fna *

** --output-path sequences.qza **
** --type 'FeatureData[Sequence]**

or should I follow this:
Per-feature aligned sequence data (i.e., aligned representative sequences)
Aligned sequence data is imported from a fasta formatted file containing DNA sequences that are aligned to one another. All aligned sequences must be exactly the same length. The sequences may contain degenerate nucleotide characters, such as N, but some QIIME 2 actions may not support these characters. See the scikit-bio fasta format description for more information about the fasta format.
Importing data
qiime tools import *
** --input-path aligned-sequences.fna *

** --output-path aligned-sequences.qza **
** --type 'FeatureData[AlignedSequence]'**

I looked into See the scikit-bio fasta format description for more information about the fasta format.
& I am still lost on how to import my data.

I dowloaded the sample sequences provided for both tutorials:
Per-feature unaligned sequence data (i.e., representative sequences)
&Per-feature aligned sequence data (i.e., aligned representative sequences)

I then compared these to the FASTA file that I have that is already demultiplexed.
Based on my comparison. I dowloaded my data in the format of FeatureData[Sequence]

I then tried to dereplicate my data using:

(qiime2-2018.2) Joans-MacBook-Air:~ joanglenny$ qiime vsearch dereplicate-sequences \

--i-sequences sequences.qza
--o-dereplicated-table table.qza
--o-dereplicated-sequences rep-seqs.qza
Plugin error from vsearch:
Argument to parameter 'sequences' is not a subtype of SampleData[JoinedSequencesWithQuality] | SampleData[SequencesWithQuality] | SampleData[Sequences].

Debug info has been saved to /var/folders/nk/qmrj4r6d3jl131r7r4h959rw0000gn/T/qiime2-q2cli-err-jw1kfhiw.log

How do I fix this error?

Hi @JoanGalarza,

Great job looking at what was available and trying to pick the right types. You are very very close, however the axis (SampleData vs FeatureData) is wrong. You want to import as SampleData[Sequences] instead of FeatureData[Sequence].

The reason it’s SampleData[Sequences] instead of FeatureData[Sequence] is that we haven’t selected those reads as features (and counted them up) yet. You should be able to follow along with this tutorial which starts with seqs.fna, dereplicates, and then does OTU picking.

Hope that helps!

Hello Ebolyen,

I followed your suggestion and I still obtained the following error:

(qiime2-2018.2) Joans-MacBook-Air:~ joanglenny$ qiime tools import \

--input-path sequences.fna
--output-path sequences.qza
--type 'SampleData[Sequences]'
(qiime2-2018.2) Joans-MacBook-Air:~ joanglenny$ qiime vsearch dereplicate-sequences
--i-sequences sequences.qza
--o-dereplicated-table table.qza
--o-dereplicated-sequences rep-seqs.qza
Plugin error from vsearch:

Command '['vsearch', '--derep_fulllength', '/var/folders/nk/qmrj4r6d3jl131r7r4h959rw0000gn/T/qiime2-archive-c3jg5v37/2b53ea80-d389-42bb-b0ce-cde71f1e785f/data/seqs.fna', '--output', '/var/folders/nk/qmrj4r6d3jl131r7r4h959rw0000gn/T/q2-DNAFASTAFormat-99qukbrx', '--relabel_sha1', '--relabel_keep', '--uc', '/var/folders/nk/qmrj4r6d3jl131r7r4h959rw0000gn/T/tmpz2azm5k6', '--qmask', 'none', '--xsize']' returned non-zero exit status 1

Debug info has been saved to /var/folders/nk/qmrj4r6d3jl131r7r4h959rw0000gn/T/qiime2-q2cli-err-21hvz62b.log

Hey @JoanGalarza,

Assuming the file still exists, could you attach this file: /var/folders/nk/qmrj4r6d3jl131r7r4h959rw0000gn/T/qiime2-q2cli-err-21hvz62b.log to your reply? Otherwise, try rerunning in --verbose and paste the output.

Unfortunately it isn’t clear what is going wrong yet, but at least you have the types worked out!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.