Q2-clawback questions

hugh · November 15, 2018, 9:44pm

Hi,

Thanks heaps for putting up this tutorial! I have been keen to set taxonomic weights for my own projects. This is super helpful. I understand this is a work in progress, but I am a bit confused by some of the components.

First in 'Assembling Weights', in the fit-classifier-naive-bayes command, the input reference reads are ref-seqs-v4.qza, but the previous command outputs ref-seqs-150-v4.qza. I changed the fit-classifier to this and it worked. However, the same thing happens under 'Retrain the Classifier', where the fit-classifier-naive-bayes has the same input ref-seqs-v4.qza, which as far as I can tell was not created anywhere. In this section, I assumed to use the ref-seqs-120-v4.qza, however then got this error: Number of priors must match number of classes.

My guess is that the ref-seqs used should match the one used to create the uniform-classifier, and thus the weights, else the taxa included will be different and hence the above error. If this is right then I am not sure why the tutorial has a second extract-reads command for 120, as the weights were derived from 150 bp extract.

I apologize if I missed something and got it totally wrong. I have been very happy with Qiime2, and using Naive-Bayes, even without bespoke weights, has improved my results.

BenKaehler · November 20, 2018, 6:08am

Hi @hugh,

Thanks very much for your question. I'm really glad that you've had a good experience with the feature classifier.

Your guess was correct, and the mismatch was caused by a very slightly different primer pair being used to extract the 120 bp reference sequences.

For the purposes of the tutorial, and your purposes if you're just using V4, it isn't absolutely essential to truncate to 150 or 120 nt, so I have modified the tutorial to take out the truncation and only use one set of trimmed reference sequences.

It does however highlight a problem that I was about to encounter in a separate project anyway: how can we use weights derived from a dataset with one set of primers to classify reads derived from another, different set of primers? I have raised an issue that I will close when I've written a method to automate a solution to that question. At that time I will also reintroduce the slightly different primers into the tutorial to demonstrate how it works.

Thanks,
Ben