Question about vsearch dereplicate

Mavis · January 6, 2020, 2:14pm

Hello, everyone. Today when I read vsearch clustering, a question arised.

When we run:

qiime vsearch dereplicate-sequences \
   --i-sequences seqs.qza \
   --o-dereplicated-table table.qza \
   --o-dereplicated-sequences rep-seqs.qza

We will get a table.qza and a rep-seqs.qza. The tutorials seems ask us to do subsequencet clustering.

We need to do Clustering of FeatureTable[Frequency] and FeatureData[Sequence]

And after these codes (take de novo clustering as an example):

qiime vsearch cluster-features-de-novo \
      --i-table table.qza \
      --i-sequences rep-seqs.qza \
      --p-perc-identity 0.99 \
      --o-clustered-table table-dn-99.qza \
      --o-clustered-sequences rep-seqs-dn-99.qza

We will get table-dn-99.qza, and rep-seqs-dn-99.qza.

And my question is what is the difference between the table.qza and table-dn-99.qza, rep-seqs.qza and rep-seqs-dn-99.qza? What if I stop analysis after I finish the vsearch dereplicate-sequences, which means I use table.qza and rep-seqs.qza to do the taxonomy analysis?

Thank your for your reply!

Mavis

Mavis · January 7, 2020, 2:46am

Haha, I found solution in a useful topic.

And now I know, vsearch just does the classic OTU clustering, which means, I need to Clustering and followed by chimera filtering and aggressive OTU filtering. Just like the picture shows:

But for a Dada2 results, use the table.qza and rep-seqs.qza to do taxonomy analysis and diversity analysis is recommended!

This forum is really useful!

colinbrislawn · January 8, 2020, 7:28pm

Hello Mavis,

Looks like you figured out this one out by yourself! This was a good question about vsearch, so I thought I would summarize your finding for new users.

They are both tables of reads that are either 100% identical (and no gaps!) or 99% identical.

These are the sequences from the 100% and 99% tables.

You pose another great question:

You could! But vsearch clustering does not remove noisy reads the way dada2 or clustering does, so you would likely end up with 1000s extra, low abundance reads, that are just errors of real reads. But you could

Happy new year!

Colin

Mavis · January 9, 2020, 12:28pm

Thank you for your warm answers! That's helpful!

Best wishes!

system · February 9, 2020, 6:28pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.