Hi @Nicholas_Bokulich and @arwqiime,
I finally got some time to look at your original problem. I answered some of your smaller questions when I didn't have access to my computer, which is why I initially ignored your first set of questions. Sorry for the confusion.
Anyway, Nick was right about the first 100 reads being used to guess the read orientation. Most of the time it works, but clearly we haven't tested with your custom reference data set. The outcome looks random because the reads that get included in the first 100 reads of the combined data set (set C) contain different reads to both A and B sets. Just the right reads to fool the auto-orientation heuristic, as it happens. The workaround is to force the orientation, as you have discovered.
As we found in our benchmarks, trimming the UNITE reads does not work spectacularly well. So my first recommendation is to stop trimming. In our tests the algorithm was quite robust to extraneous sequences from outside your primers being included in the reference data. This will not increase classification time or memory overhead.
My second recommendation is to use the full 99% UNITE data set. Is there a reason that you've restricted your data set to the 97% OTUs?
My third recommendation only occurred to me this afternoon when I too ran into some memory issues when running classify-sklearn. Try setting --p-reads-per-batch to a small number, say 1000. When there are many reads in your samples (I ran into this problem when I had > 500,000), it can cause memory issues. This is in addition to making sure that --p-n-jobs is not too high.
Hope that helps,
Ben