I’m trying to build a bespoke classsifier using V34 primers. Due to a short timeline and a philosophy that it’s usually better to get some one else to do complex computation for you, I was hoping to be able to use some of the weights from the Ready to Wear Repository.
Will the full length weights work if I trim to another over lapping region or do I need to train my own weights?
Hey @jwdebelius ,
I think you’ll want weights specific to your region.
And on that note, (though not directly answering your question) a while back I had a chat with @BenKaehler about putting together some additional weights to readytowear, including, V3-V4, here is a summary of that conversation in case it helps or you find yourself in a contributing mood (but also as a public shaming of myself for not getting my portion done):
- Adding V3-V4 weights: We discussed that since we can’t grab denoised/processed V3-V4 data from Qiita we would have to manually find public V3-V4 samples, denoise them with DADA2, and then they can be used in clawback. Ben mentioned a minimum of at least ~120 samples per environment would be needed to see positive gains. They shouldn’t be biased samples from a specific disease either. A large heterogenous mix probably would be ideal.
- There was a mention of automating clawback to work with SRA queries as well, though I’m guessing that needs some external momentum to get going.
- The existing V4 primers used for readytowear are based on the old EMP primers, might be worth adding additional ones with the updated EMP primers. (doubt this will have a big impact overall though).
- Those primers were used because the pre-cooked classifiers on the QIIME 2 resource page also uses the old EMP primers. @Nicholas_Bokulich and @SoilRotifer any thoughts on updating or adding the new EMP primers into the resource page (using rescript)?
- Update to include newest GTDB (2 new releases exist now since the 89 relase. latest: Release 06-RS202 as of April 27, 2021)
Since the EMP website seems to be down at the moment, these are old & and new V4 primers I mentioned above (copied from EMP website):
Updated sequences: 515F (Parada)–806R (Apprill), forward-barcoded:
Original sequences: 515F (Caporaso)–806R (Caporaso), reverse-barcoded:
I think then the specific answer to my question is to use the full length classifier becasue training specific weighting seems like more than I want to do for a one-off project. (See aforementioned laziness ). I wondered if one solution for other region that might at least represent some kind of average or midpoint might be to at a minimum filter the weights so that you could have a different set or subset of weights based on what was amplified
Thanks very much @Mehrbod_Estaki for remembering that conversation. I note that the Caporaso primers are still used for the pretrained classifiers. Should we recommend that they update those?
@jwdebelius, there are two options I can think of that won’t take months of development. In my experience they will probably give fairly similar results.
- Just use full length sequences and full length readytowear weights. That is, don’t trim anything.
- Trim your reference db using your V3V4 primers then use it to build new weights for your habitat of choice.
I know you said you were too lazy for the second option, but it is probably just as easy as downloading weights from readytowear. You can probably do it with a single call to
clawback assemble-weights-from-Qiita. You would have already done most of the steps in the tutorial.
It’s up to you, though, and it probably won’t make much difference.
I hope that helps.