Hi Everyone!
@cjone228 and I want to clarify this point - we just realized that using the the full length sklearn classifier is not advised because the pre-trained sklearn classifier doesn't do well with mixed-orientation reads (see post). We just had the problem where we were only getting ~60-70% classification with both Silva and GG and realized that mixed orientation reads are likely the culprit. Thus, we will try to move forward with classify-consensus-vsearch. Just found this post - super helpful! @KMaki have you gotten vsearch consensus classifier to work? If so, do you mind sharing your final code?
Investigation of this hurdle brought up another question: do we need to deal with our mixed orientation reads prior to the taxonomy steps? (Just a reminder that IT produces single end reads, but each sample includes reads from 12 different pooled amplicons: 6F primers and 6R primers.) @Jen_S posted on this topic and @Nicholas_Bokulich recommended separating out F from R reads prior to denoising, denoising separately, and then merging. Is this something we should do? Could mixed oriented reads be part of why only ~70% of our sequences are passing the dada2 filter? (see @cjone228 and @Jen_S discussion on that below).
Thanks much!
Lauren