Which version of UNITE database should I choose?

sixvable · October 19, 2019, 6:47am

Recently I have been trying to analysis my ITS amplicon sequences data and I found a little bit confusing thing about the Unite database.
On the Unite database they have distributed two different type of fungi unite database in all formats, not only qiime pre-format:

Includes singletons set as RefS (in dynamic files).
Includes global and 97% singletons.

I have read the README documents in Unite but they did not mention why they posted two versions.
I do know what means about singleton , RefS or dynamic files but I can not figure out what it means together ?
Does anyone familiar with Unite? Which one should I choose normally?

yangzy51 · February 19, 2024, 9:16am

May I ask if you understand? I encountered the same problem

colinbrislawn · February 23, 2024, 8:03pm

Here is my understanding.
(I am not part of the UNITE dev team, so my understanding may be incomplete.)

Context: the UNITE database is clustered

The UNITE database is distributed at three clustering levels:

99%
97%
'dynamic'

Some clusters in UNITE are chosen manually by the devs and others are included automatically.

refs = this is a manually designated RefS
(reps = this is an automatically chosen RepS)

The problem: 'singleton' clusters

Most output clusters represent multiple input reads, but some 'singleton' output clusters represent only one input read!

(Do you want this in your database? If a word is spelled diffffffferently is it wrong or novel? )

The choice: do you want to include singleton clusters?

There are a bunch of automatic RepS clusters that represent a single read, and you may not want to include these 'singleton' clusters.

Includes singletons set as RefS (in dynamic files).
Singletons have been removed from 99% and 97%

Includes global and 97% singletons.
Includes all the singletons!

I would also love the UNITE team to clarify this.