FASTA files importing

mc92 · August 2, 2018, 1:33pm

Hi,

I'm trying to import several FASTA files, one per sample, to generate a Features Table. I know how to do this with FASTQ files, using a mapping file to import all the FASTQ files at once, but I couldn't find something similar for the FASTA ones. Is there any way to do that?

Thank you very much in advance.
Cheers,

willowblade · August 2, 2018, 8:01pm

Hi,

I'm not part of the QIIME2 team, but I am curious as to why you want to use fasta files. Do you also have the quality score files that go with the fasta files? If so, it might be more efficient to convert the fasta and quality score files into fastq files and then import. Also, how do you plan to generate a Features Table? I am less familiar with Deblur, but I know DADA2 uses quality scores as part of the denoising process. And I am pretty certain Deblur uses the quality scores to define an upper bound on error rates. This means that you probably can't generate a feature table from a fasta file using either DADA2 or Deblur.

This paper compares DADA2, Deblur, and UNOISE3: https://peerj.com/preprints/26566.pdf Maybe it will be of some use to you?

thermokarst · August 10, 2018, 12:42am

Hey there @mc92!

@willowblade raises a lot of good points, so please take the time to review their post. For the sake of completeness, if you did want to proceed with these FASTA files, you could convert the FASTA to the QIIME 1 post split_libraries.py seqs.fna format and import as SampleData[Sequences] . Then, you could use vsearch dereplicate-sequences to dereplicate (as @willowblade mentioned, DADA2 is out the question here, since that tool needs quality scores). Hope that helps!