Vsearch plugin error

ebolyen · June 10, 2019, 8:09pm

Thank you very much for sharing your data (I have a copy now so you may delete the drive folder if you wish).

I think I know what is happening. Your representative sequences from DADA2 are not only very few (which suggests something may have gone wrong there), but also very short. In fact a few reads (at least 318a6e710e5f1b26a6b467882529143e and f5799ce61d170ba63291b8a68ad67fd3) are shorter than 32 bases.

This is important, because vsearch will automatically filter out any reads smaller than 32:

  --minseqlength INT          min seq length (clust/derep/search: 32, other:1)

In QIIME 2, we don't have this parameter exposed, and we've never run into this situation before, so our code expects all of the reads we provide to vsearch to be returned (in a clustered state). Since some of your reads are too short, vsearch removes them, and when QIIME 2 tries to find them again, we get that strange KeyError you keep seeing.

I think this is something we need to fix in QIIME 2, but it doesn't help with your larger problem of very very short reads (and unusually few unique ASVs).

Could you elaborate on your sequencing technology, and where these samples come from? Perhaps this is expected, and clustering is just not the best approach here, but the read in question:

>318a6e710e5f1b26a6b467882529143e
TCCGTAGGTGAAGAACGCAGC

seems a little too short for me to believe. Does anyone else have suggestions for ITS and DADA2?