Hello, everyone. Today when I read vsearch clustering, a question arised.
When we run:
qiime vsearch dereplicate-sequences \
--i-sequences seqs.qza \
--o-dereplicated-table table.qza \
We will get a table.qza and a rep-seqs.qza. The tutorials seems ask us to do subsequencet clustering.
We need to do Clustering of
And after these codes (take de novo clustering as an example):
qiime vsearch cluster-features-de-novo \
--i-table table.qza \
--i-sequences rep-seqs.qza \
--p-perc-identity 0.99 \
--o-clustered-table table-dn-99.qza \
We will get table-dn-99.qza, and rep-seqs-dn-99.qza.
And my question is what is the difference between the table.qza and table-dn-99.qza, rep-seqs.qza and rep-seqs-dn-99.qza? What if I stop analysis after I finish the vsearch dereplicate-sequences, which means I use table.qza and rep-seqs.qza to do the taxonomy analysis?
Thank your for your reply!
Haha, I found solution in a useful topic.
And now I know, vsearch just does the classic OTU clustering, which means, I need to Clustering and followed by chimera filtering and aggressive OTU filtering. Just like the picture shows:
But for a Dada2 results, use the table.qza and rep-seqs.qza to do taxonomy analysis and diversity analysis is recommended!
This forum is really useful!
Looks like you figured out this one out by yourself! This was a good question about vsearch, so I thought I would summarize your finding for new users.
They are both tables of reads that are either 100% identical (and no gaps!) or 99% identical.
These are the sequences from the 100% and 99% tables.
You pose another great question:
You could! But vsearch clustering does not remove noisy reads the way dada2 or clustering does, so you would likely end up with 1000s extra, low abundance reads, that are just errors of real reads. But you could
Happy new year!
Thank you for your warm answers! That’s helpful!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.