Creating FeatureTable from Illumina FASTA data

jairideout · October 10, 2017, 9:21pm

Hi @marselr2! To import your FASTA file of demultiplexed and quality-controlled sequences, you'll need to make sure it's in the QIIME 1 "demux" format, then run the following command:

qiime tools import --type 'SampleData[Sequences]' --input-path seqs.fna --output-path seqs.qza

Note: you'll need to replace seqs.fna in the above command with the filepath of your FASTA file. The output filename (seqs.qza in the example above) can be named whatever you want.

Once you have the SampleData[Sequences] artifact, you're currently pretty limited with what you can do in QIIME 2. You can use qiime vsearch dereplicate-sequences to dereplicate your sequences and continue analyses like @thermokarst described above. After dereplicating, you can optionally cluster those sequences into OTUs using qiime vsearch cluster-features-de-novo or cluster-features-closed-reference.

It it not currently possible to supply a SampleData[Sequences] artifact to q2-deblur. I think this is possible to hook up in the future since Deblur doesn't require quality scores (Deblur assumes the sequences have been quality filtered already). I created an issue to get this new data type hooked up to q2-deblur. We'll follow up here when it's available in a release (no ETA at this point, perhaps @wasade can provide one).

This type of data won't work with DADA2 because it does not have quality scores associated with the sequences, and DADA2 requires quality scores. If you wish to use DADA2, or Deblur (for now at least), you'll need to obtain the FASTQ files from the sequencing center and analyze those. If you can get FASTQ files that have already been demultiplexed that's probably the easiest way forward. If you end up going this direction and run into issues with your FASTQ data, please create a new forum topic and we can help you out. Thanks!