Dear QIIME developers, other QIIME users and @Adam_Rivers!
We have some fungal data from human samples, and we are trying to compare different trimming methods. It seems that ITSxpress does well in terms of number of ASVs and taxonomic assingment. However, there are some ASVs only assigned to Fungi at kingdom level and then nothing, and some assigned to Fungi at kingdom level and then unidentified. We blasted this ASVs using https://blast.ncbi.nlm.nih.gov/Blast.cgi and https://www.ncbi.nlm.nih.gov/sites/batchentrez, and found some worrying results. For instance, Candida (albicans, dubliensis, tropicalis) and Cyberlindnera were present. Trimmomatic assigned these down to species level. Hence we compared the sequences from ITSxpress and Trimmomatic, and the blast result is found below (a screen shot):
“full” is Trimmomatic, and “itsxpress” is ITSxpress. 18S should be from <1…190, ITS1 from 191…330, and 5.8S from 331…487.
The Trimmomatic sequence spans over the whole ITS1 region. ITSxpress seems to trim too much of the ITS1 region, resulting in a sequence ranging from position 60 to 255. We aligned the two sequences and the sequence from ITSxpress was only equal to the first half of the Trimmomatic sequence, the last half was trimmed.
We used ITSxpress in QIIME2 with taxa F, region ITS1 and cluster ID default (0.995). We are rerunning now with cluster ID 1.0.
Why do ITSxpress trim so much of the ITS1 end?
Please also explain the difference between cluster ID 0.995 and 1.0