Hey @jwdebelius ,
I think you’ll want weights specific to your region.
And on that note, (though not directly answering your question) a while back I had a chat with @BenKaehler about putting together some additional weights to readytowear, including, V3-V4, here is a summary of that conversation in case it helps or you find yourself in a contributing mood (but also as a public shaming of myself for not getting my portion done):
- Adding V3-V4 weights: We discussed that since we can’t grab denoised/processed V3-V4 data from Qiita we would have to manually find public V3-V4 samples, denoise them with DADA2, and then they can be used in clawback. Ben mentioned a minimum of at least ~120 samples per environment would be needed to see positive gains. They shouldn’t be biased samples from a specific disease either. A large heterogenous mix probably would be ideal.
- There was a mention of automating clawback to work with SRA queries as well, though I’m guessing that needs some external momentum to get going.
- The existing V4 primers used for readytowear are based on the old EMP primers, might be worth adding additional ones with the updated EMP primers. (doubt this will have a big impact overall though).
- Those primers were used because the pre-cooked classifiers on the QIIME 2 resource page also uses the old EMP primers. @Nicholas_Bokulich and @SoilRotifer any thoughts on updating or adding the new EMP primers into the resource page (using rescript)?
- Update to include newest GTDB (2 new releases exist now since the 89 relase. latest: Release 06-RS202 as of April 27, 2021)
Since the EMP website seems to be down at the moment, these are old & and new V4 primers I mentioned above (copied from EMP website):
Updated sequences: 515F (Parada)–806R (Apprill), forward-barcoded:
Original sequences: 515F (Caporaso)–806R (Caporaso), reverse-barcoded: