Importing FASTA file

sbrifkin27 · March 22, 2018, 12:47pm

I am just getting started so I apologize if this is a silly question.

I am trying to import data that is in fasta file format with no line breaks.

For example

S034NL_917TGGGGAATTTAGG>S034TL_42718
TGGGGAA

I used this code listed on the QIIME2 tutorial for unaligned sequence data saved into FASTA files:

qiime tools import
--input-path sequences.fna
--output-path sequences.qza
--type 'FeatureData[Sequence]'

but I get an error code stating that it is not in DNAFASTA format file. is that because it is a text file formatted in one line with no line breaks? If I add line breaks can I import using this format?

Thank you for your help!
Samara

ebolyen · March 22, 2018, 9:14pm

Hi @sbrifkin27!

Everyone has to start somewhere!

Yes and yes. How did you end up with a FASTA file without line-breaks? I've never seen such a thing!

Otherwise, your command looks good, so as soon as you format it more like this:

>S034NL_917
TGGGGAATTTAGG
>S034TL_42718
TGGGGAA

You'll be all set.

sbrifkin27 · April 10, 2018, 1:48pm

Thanks so much for help! I was able to upload the data after I fixed the format of the fasta file. Sorry to bombard you with questions, but I am now trying to demultiplex the data now with the following command:

qiime demux emp-single --i-seqs sequences.qza --m-barcodes-file sample-metadata --m-barcodes-column PatientID --o-per-sample-sequences demux.gza

but I keep getting the following error prompt:
Plugin error from demux:

Argument to parameter 'seqs' is not a subtype of EMPPairedEndSequences | EMPSingleEndSequences | RawSequences.

Is that because the data has already been demultiplexed and I should skip the demultiplexing process?

Thank you so much for your help!
Samara

ebolyen · April 13, 2018, 2:40pm

Hi @sbrifkin27,

Sorry for the delayed response.

Could you elaborate on the purpose of this FASTA file? There's two notions of it in QIIME 2, as SampleData[Sequences] and FeatureData[Sequence]. You've imported as the latter which means you've already selected your "features" and are associating some sequence information with them. Demultiplexing doesn't really make sense in that context.

But if these are sequences for you samples, then you are probably dealing with post-split-libraries output? In which case you need to import as SampleData[Sequences] instead and use qiime vsearch to perform OTU clustering/feature-selection.

Let me know if you need more specific details!

system · May 14, 2018, 8:41pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.