Many questions about clustering in the qiime vsearch plugin

colinbrislawn · November 27, 2023, 9:20pm

P1: on dereplication and removing features

qiime vsearch dereplicate-sequences
--i-sequences total.qza
--p-min-seq-length 50
--p-min-unique-size 10
--o-dereplicated-table table.qza
--o-dereplicated-sequences rep-seqs.qza

Plugin error from vsearch:
Mapping not provided for observation identifier: D1_10122. If this identifier should not be updated, pass strict=False.
Debug info has been saved to /tmp/qiime2-q2cli-err-62rjm8jn.log

Dereplication is used to reduce data size before clustering.
Typically, this is a lossless process for features; all unique reads become unique features in the output table.
In this example, it's a lossy process; features are discarded if they are too short (<50 bp) or too rare (total count <10). This causes the error about missing feature identifiers.
Keeping all features fixes the error.

P2: pickling??

This can happen when the device is out of space. Databases are pretty big, so make sure you have plenty of space on the computer or worker node!
To investigate more, can you post that log file?Debug info has been saved to...