Reference db compatibility SILVA 138 - 138.1

Hello,
SILVA v138.1 differes in a few entries (according to the diff file 1333 added/removed) compared to v138. Since this number represents only a tiny fraction of the silva references, would it be ok to retrain v138.1 sequences and taxonomies (prepared by the rescript tutorial) with the ready-made taxonomic weights from the inventory at GitHub? I assume, the inventory data were based on v138.
Thanks for this great addition!

Hi @arwqiime, I'm not sure I understand the question - are you asking if we can update some resource for you? Or are you looking for specific instructions on how to do something? Can you please clarify?

Hi @thermokarst, no, I did not ask for updating ressources nor for instructions related to rescripts and clawback. Your tutorials are great. May be I should add @BenKaehler here.
Let me clarify it: I have processed silva ssu 138.1 NR99 (trunc) reference sequences as described in the rescript tutorial and obtained a 515f-806r classifier. Now I wanted to calculate a retrained classifer by
qiime feature-classifier fit-classifier-naive-bayes (...) --i-class-weight (some readytowear tax weights) (...).

Since the ready-made weights at readytoware are labelled as silva 138 (not 138.1), I was wondering whether this would be a problem.

Thanks for your comments!

Hi @arwqiime ,

No these are probably not compatible — even though the differences are minor it could still (and probably would) lead to an error if there is any mismatch in the taxonomies between 138 and 138.1 (i.e., a taxonomic group in the 138.1 taxonomy might be missing in the 138 weights).

The scripts used to generate the weights are available on readytowear so you could use these to generate weights for the 138.1 release (you are also very welcome to contribute these to readytowear with a pull request). It is on our very long to-do list!

1 Like

Hi everyone, there are two issues here: @arwqiime's immediate needs and longer term plans regarding readytowear.

@arwqiime, @Nicholas_Bokulich is correct, 138 weights won't work with 138.1 reference data. Fortunately, you can generate the weights you want with a single call to a clawback method:

qiime clawback assemble-weights-from-Qiita \
  --i-classifier <the classifier you generated from the 138.1 reference>.qza \
  --i-reference-taxonomy <your 138.1 reference taxonomy>.qza \
  --i-reference-sequences <your 138.1 reference sequences>.qza \
  --p-metadata-key empo_3 \
  --p-metadata-value "<the particular habitat you're interested in - probably Animal distal gut>" \
  --p-context Deblur_2021.09-Illumina-16S-V4-150nt-ac8c0b \
  --o-class-weight my-138-1-weights.qza \
  --p-n-jobs <as many cores as you are allowed on a single machine>

That call will take at least overnight to run and process. Perhaps try it with "Plant surface" for metadata-value first as a smoke test.

Some background on that command is available in the tutorial.

Now, @thermokarst and @Nicholas_Bokulich, I think that it is pretty important to make weights available for the latest versions of SILVA. The current script makes use of the SILVA 138 references that are made available on the QIIME 2 data resources pages.

Is there any intention to update the QIIME 2 data resources any time soon, or should we go it alone? RESCRIPT would make it pretty easy (if slow) to create our own standard references.

2 Likes

Hi @BenKaehler
Thank you for your comments and the clawback call.
Best regards

Hi @BenKaehler
I tried the clawback example to test it with "Plant surface" on a q2-2021.11 installation, but I got this error:

Plugin error from clawback:
Parameter 'reads_per_batch' received 0 as an argument, which is incompatible with parameter type: Int % Range(1, None) | Str % Choices('auto')

The command and the full error message are in the attached file.
20220121_clawback-error.txt (1.8 KB)

Best,

Hi @arwqiime ,
Thanks for giving it a try! Good thing you tested first with "Plant surface" :grin:

Run the same command as above, but add the following parameter:
--p-reads-per-batch 'auto'

Hi, I tried it but got this error message:
(1/1) Invalid value for '--p-reads-per-batch': received as an
argument, which is incompatible with parameter type: Int

Thanks for trying! it looks like an update to q2-feature-classifier a little while back caused this action to break

I have submitted a fix for this in this source code. Once that PR is merged you can re-install clawback and this should be fixed...

Or as a current workaround, try this:
--p-reads-per-batch 2000

Great, it worked for "Plant surface"! (and I corrected the output naming to 'weight', not 'classifier'; :slight_smile: )

1 Like

thanks for confirming! And thanks for catching this bug, this will be fixed soon... (probably along with release of all pre-generated EMPO 3 weights on readytowear and SILVA 138.1 pre-trained classifiers on the QIIME 2 data resources page)

2 Likes

Ok, q2-clawback is now fixed on conda, pip, and GitHub, so no-one should see that plugin error from now on.

1 Like

Hi @BenKaehler
Thanks for updating q2-clawback. It completed without errors!
Best,

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.