Taxonomy assignment using bespoke weights on V3-V4

Ravenclaw · February 21, 2019, 12:28pm

Hi
I'd like to try using q2-clawback to assemble taxonomic weights. In the tutorial you assemble weights from Qiita. But my Illumina sequence data is from the V3-V4 region, and I don't see that in Qiita. What do you recommend I use to assemble weights? Or should I trim my data to just V4?

Nicholas_Bokulich · February 21, 2019, 2:13pm

Hi @Ravenclaw,

You can use summarize-Qiita-metadata-category-and-contexts or query qiita directly using redbiom to see what "contexts" are available (this will detail the sequence domains available). It looks like there are several thousand V3-V5 samples present, and a smaller number of V3-V4. So you have a few options:

Use the V3-V5 samples from QIITA (if they are appropriate sample types), and train a V3-V5 classifier.
Use a custom collection of samples (e.g., from outside of Qiita) to assemble taxonomic weights.
Trim your reads to V4, though I agree that is a very unappealing option.

We are working on some solutions to make this easier in the future, e.g., use V4 class weights for any other domain, but right now it's complicated.

cc: @BenKaehler

Ravenclaw · February 23, 2019, 5:57pm

I assume the samples from QIITA have to have been processed the same as my samples (ie GreenGenes 97% OTU vs Deblur), is that correct?

Nicholas_Bokulich · February 23, 2019, 7:06pm

that sounds about right — the qiita context info gives some of those details. You can also check those qiita studies manually to see how the reads were processed (though theoretically that should be unnecessary; the context is all you need)