I want to create a FeatureTable but I have my Illumina sequence in FASTA, since the sequencing service denoised them with QIIME1. Then, I could construct my biom directly.
I know that now QIIME2 2017.9 can support OTU picking but I would prefer to work with these actual sequence variants instead of OTUs.
I have not found the way to do a Featuretable from sequence. Only with deblur pipeline in “Moving Pictures” tutorial but it doesn’t work for me.
I have my sequences imported as seqs.qza --type SampleData[Sequences] but then I don’t know how I could continue with the analysis.
Have you had a chance to check out vsearch dereplicate-sequences? This method accepts SampleData[Sequences] and will output a FeatureTable[Frequency] and FeatureData[Sequence]! It doesn’t do any OTU clustering, it just dereplicates your sequences. Then, if you are following along with the Moving Pictures tutorial, you could pick up after the denoising step, using those two new Artifacts! Hope that helps, and please let us know if you get stuck or have any additional questions!
Note: you’ll need to replace seqs.fna in the above command with the filepath of your FASTA file. The output filename (seqs.qza in the example above) can be named whatever you want.
Once you have the SampleData[Sequences] artifact, you’re currently pretty limited with what you can do in QIIME 2. You can use qiime vsearch dereplicate-sequences to dereplicate your sequences and continue analyses like @thermokarst described above. After dereplicating, you can optionally cluster those sequences into OTUs using qiime vsearch cluster-features-de-novo or cluster-features-closed-reference.
It it not currently possible to supply a SampleData[Sequences] artifact to q2-deblur. I think this is possible to hook up in the future since Deblur doesn’t require quality scores (Deblur assumes the sequences have been quality filtered already). I created an issue to get this new data type hooked up to q2-deblur. We’ll follow up here when it’s available in a release (no ETA at this point, perhaps @wasade can provide one).
This type of data won’t work with DADA2 because it does not have quality scores associated with the sequences, and DADA2 requires quality scores. If you wish to use DADA2, or Deblur (for now at least), you’ll need to obtain the FASTQ files from the sequencing center and analyze those. If you can get FASTQ files that have already been demultiplexed that’s probably the easiest way forward. If you end up going this direction and run into issues with your FASTQ data, please create a new forum topic and we can help you out. Thanks!
Thank you, @jairideout! I don’t think it would be an issue to add in support for this semantic type for 2017.10, although I haven’t worked yet with supporting multiple or nested types – will follow up if and as needed.