I noticed that sometime, depending on the number of reads, the use of the plugin “qiime feature-classifier classify-sklearn” may produce an output where most of the features are “unidentified” and very few features are classified in only two alternative ways.
In particular, I get very different results if I use reads deriving from different denoising with dada2: denoised reads that gave me, for example, a total of 1585260 reads (for 90 samples) and 1157 different features were classified as follows
Ah, this is becoming more clear. While your --i-classifier is the same, changes in your dada2 --p-trunc-len-r will cause changes in truncation, which can cause changes in joining, that propagate downstream.
Have you compared your two --o-denoising-stats files to see how many reads were able to join, and how long, on average, were the ones that did?
I compared the two --o-denoising-stats files. After dada2 denoising and filtering I obtained the following non-chimeric amounts of sequences. So I see that if I trunc my reverse reads at 170 I obtain more sequences maybe because in this way I take advantage of a better quality of reverse reads
This is not a bug, this is most likely due to mixed read orientations; trimming to different lengths is probably changing the inclusion or order of sequences that are used for read orientation prediction. See here for an explanation:
Some of your reads are being poorly classified no matter the direction, so it could also be that there are many non-target reads in your samples that are interfering with the orientation detector (since the orientation is chosen based on match to a reference sequence).
You can also specify the read orientation if this is known and you do not want classify-sklearn to choose for you.