I am trying to use vesearch de novo otu picking in qiime2 to create a reference database of v4_16S rRNA reads. I want to gather all unique reads at 99% ID. I have 3158266 reads after QC from all my samples so its taking very long time to run.
So to reduce the initial data step I first ran vsearch open otu picking on each individual sample. I then exported the reprsentive sequences from each sample and concatenate these to run vsearch open otu picking for a second round.
However when I try to import this fasta file back to a qza I receive an error:
"There was a problem importing PensacolaCat_round1_dn99.fa.mod3.fa:
PensacolaCat_round1_dn99.fa.mod3.fa is not a(n) QIIME1DemuxFormat file"
How are you importing your sequences? They look like a FeatureData[Sequence] semantic type to me. Have you tried importing them that way? There are a fair number of good tutorials on importing sequences, so maybe have a look?
After I exported the representative sequences from each sample and concatenate these, I can import as a FeatureData[Sequence]. The problem is that vsearch does not take FeatureData[Sequence] as an input format. So when I run:
(1/1) Invalid value for "--i-sequences": Expected an artifact of at least
type SampleData[Sequences] | SampleData[SequencesWithQuality] |
SampleData[JoinedSequencesWithQuality]. An artifact of type
FeatureData[Sequence] was provided.
I thought I could just edit the fasta headers from this >cf5c43911866d34cdb99ff57b6dbce4bc1a3 to this >PensacolaCatRound1dn99.1_1 to trick qiime2 into thinking I provide a SampleData[Sequences], but the import tool saw through my deception and gave me the error which I first posted with "PensacolaCat_round1_dn99.fa.mod3.fa is not a(n) QIIME1DemuxFormat file".
Maybe I’m missing something, but is there a motivation for renaming the centroid from the hash for this application? If you just merge your sequences with feature-table merge-seqs, I think should dereplicate because your hash should be*** consistent across all your OTUs. So, then, you’ll have a consistent reference set.
A second question… why not use a denoiser? Then, you could avoid the picking and re-picking?