Creating FeatureTable from Illumina FASTA data

Hi!
I have the same problem that @liubaily,

I want to create a FeatureTable but I have my Illumina sequence in FASTA, since the sequencing service denoised them with QIIME1. Then, I could construct my biom directly.

I know that now QIIME2 2017.9 can support OTU picking but I would prefer to work with these actual sequence variants instead of OTUs.
I have not found the way to do a Featuretable from sequence. Only with deblur pipeline in “Moving Pictures” tutorial but it doesn’t work for me.

I have my sequences imported as seqs.qza --type SampleData[Sequences] but then I don’t know how I could continue with the analysis.

Could you help me with that?

Thank you!

Hi @marselr2! Thanks for reaching out!

Have you had a chance to check out vsearch dereplicate-sequences? This method accepts SampleData[Sequences] and will output a FeatureTable[Frequency] and FeatureData[Sequence]! It doesn’t do any OTU clustering, it just dereplicates your sequences. Then, if you are following along with the Moving Pictures tutorial, you could pick up after the denoising step, using those two new Artifacts! Hope that helps, and please let us know if you get stuck or have any additional questions! :t_rex:

1 Like

Hi,

I was trying with vsearch dereplicate-sequence, but the input for that is a 'SampleData[Sequences] type artifact.

The only way that I have found to import my sequences to qiime2 was in the importing data tutorial as FeatureData[Sequences]:

qiime tools import
–input-path sequences.fna
–output-path sequences.qza
–type ‘FeatureData[Sequence]’

When I tried to run vserach dereplicate-sequence, it gave me an error because my input is not in the correct format.

Do you know some way to import my data as SampleData?

Thank you so much for your help!!!

Hi @marselr2! To import your FASTA file of demultiplexed and quality-controlled sequences, you’ll need to make sure it’s in the QIIME 1 “demux” format, then run the following command:

qiime tools import --type 'SampleData[Sequences]' --input-path seqs.fna --output-path seqs.qza

Note: you’ll need to replace seqs.fna in the above command with the filepath of your FASTA file. The output filename (seqs.qza in the example above) can be named whatever you want.

Once you have the SampleData[Sequences] artifact, you’re currently pretty limited with what you can do in QIIME 2. You can use qiime vsearch dereplicate-sequences to dereplicate your sequences and continue analyses like @thermokarst described above. After dereplicating, you can optionally cluster those sequences into OTUs using qiime vsearch cluster-features-de-novo or cluster-features-closed-reference.

It it not currently possible to supply a SampleData[Sequences] artifact to q2-deblur. I think this is possible to hook up in the future since Deblur doesn’t require quality scores (Deblur assumes the sequences have been quality filtered already). I created an issue to get this new data type hooked up to q2-deblur. We’ll follow up here when it’s available in a release (no ETA at this point, perhaps @wasade can provide one).

This type of data won’t work with DADA2 because it does not have quality scores associated with the sequences, and DADA2 requires quality scores. If you wish to use DADA2, or Deblur (for now at least), you’ll need to obtain the FASTQ files from the sequencing center and analyze those. If you can get FASTQ files that have already been demultiplexed that’s probably the easiest way forward. If you end up going this direction and run into issues with your FASTQ data, please create a new forum topic and we can help you out. Thanks!

2 Likes

Thank you, @jairideout! I don’t think it would be an issue to add in support for this semantic type for 2017.10, although I haven’t worked yet with supporting multiple or nested types – will follow up if and as needed.

Best,
Daniel

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.