Hello @Rob_DNA ,
There is a lot to unroll here. I'm not an expert, but my first steps with metataxonomy are precisely with ITS sequences. So I will share my thoughts on the points on which I feel "confident" enough to discuss them.
Agreed. Since ITS sequences vary notably in length (they are indel-rich regions) I use ITSxpress for two purposes:
- The original purpose described in the tutorial:
- My other purpose: dynamic quality filtering. I don't want to truncate sequences to a fixed length because of the nature of ITS sequences. So I take advantage of the fact that ITSxpress is trimming sequence sections not belonging to ITS, which turn out to be sections with low quality. Consequently, DADA2 works much better without me having to specify further truncation or trimming parameters. I don't know if this is the best way to do things, but at least it's the way that works best for me.
I am an advocate of the idea that for ITS sequences we should use ASVs without further clustering. On this note, I really like this Colin answer in the post you cite.
Although I also use Colin's pre-trained classifiers, I'm afraid I cannot help here since I assign taxonomy to unclustered ASVs. However, just in case you want to know, I'm currently using 99%, all eukaryotes, version without "s"¹ database.
--
¹ Even after reading release descriptions in the UNITE webpage, I'm not sure about what the difference is between the versions with and without "s".