RESCRIPt forward filling

owlpen · May 25, 2023, 1:34pm

Hello, me again!

I have a question about forward filling of taxonomic ranks, as explained in the RESCRIPt Tutorial.

Everything in this tutorial seemed to work well for me, and I've had a few goes at training an amplicon specific classifier to use with my own small dataset.
I recently ran a small set of samples - DNA extracted from Boar Semen - with the aim of getting this pipeline working right before moving onto human samples.
i used a Mock Community that I'd prepared myself; one that is relevant to the community I've found so far in semen, and in this sequencing run in particular I see evidence of contaminants and index-hopping. My next step was to try and experiment with some tools outside of Qiime to try and identify and quantify these, and as part of that I wanted to create a table of read counts in csv format, that I could then try and work into an unspread.py script.

I have followed the advice here about exporting my feature table and taxonomy as a tsv, but have not been successful at merging the taxonomic metadata to the biom-tsv file using the command below:

'biom add-metadata -i exported/feature-table.biom -o table-with-taxonomy.biom --observation-metadata-fp biom-taxonomy.tsv --sc-separated taxonomy'

I have searched the forum and found posts from others about this issue, none of which seem to have been solved. I was trying to figure out why this step might not have worked, and was wondering if it is because I have some incomplete strings in my final taxonomic annotation file. E.g.:
'd__Bacteria;k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;;'

I hadn't really considered these gaps before, but now I went back to the RESCRIPt tutorial to try and figure out where I might have gone wrong.
I've gone through all the steps about 3 times now.
The steps which I figured were most important, and hence I've played around with are:
'qiime rescript parse-silva-taxonomy
--i-taxonomy-tree taxtree-silva-138.1-nr99.qza
--i-taxonomy-map taxmap-silva-138.1-ssu-nr99.qza
--i-taxonomy-ranks taxranks-silva-138.1-ssu-nr99.qza
--o-taxonomy silva-138.1-ssu-nr99-tax.qza'
And here I've added --p-rank-propagation TRUE (although I'm aware this is the default setting), and also I've tried this both with and without specifying ranks with the '--p-ranks' command.

The other command which I've tried various combinations of is dereplicate, where initially I selected '--p-mode "super"' and have since tried '"uniq"' just in case this was messing with forward filling.

Ultimately, each time I make a visualization of the final taxonomy.qza I get the same blank ranks, with no forward filling.

I'm wondering if I am misunderstanding something here, or missing an important step, or I dunno - I am a bit of a novice at everything.

Anyway, there are a lot of commands I've run, so have only selected those I thought most relevant for now, but obviously will post more if needed.
Am running qiime2-2023.2 on a conda environment and just want to say that both the quality of the tools you folk have created, and the level of support here has been amazing. Thanks for everything.

Richard

Nicholas_Bokulich · May 25, 2023, 2:41pm

Hi @owlpen ,
Thanks for the kind words and for using RESCRIPt and QIIME 2!

The problem at the end of the day is with biom add-metadata, and is related to how the taxonomic classification occurs. It is not coming from RESCRIPt. So I can explain the taxonomic annotations and where these empty ranks come in... but I encourage you to open a NEW separate topic with the full error message that you are receiving from biom add-metadata so that someone more familiar with biom-format can help — if you want to troubleshoot that error.

Let's look at the taxonomy:

These missing ranks are due to incomplete classification of your query sequences because they do not have any good hits in the database (or more specifically they match multiple different families of Enterobacterales with sufficient confidence that the sequence cannot be confidently classified to one family).

These gaps should not appear in the reference database (at least when you are using RESCRIPt with rank propagation). When a reference sequence has an incomplete taxonomic annotation you would instead see an annotation like this:

d__Bacteria;k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__;g__;s__

You also ran this with and without rank propagation, so clearly the issue did not originate with RESCRIPt (but nice troubleshooting! thanks for that )

It sounds like you don't even want to run biom add-metadata anyway, as you are not trying to get a biom table:

There is an easier way! You can merge these using metadata tabulate then download as a CSV. See some instructions here:

Good luck!

owlpen · May 25, 2023, 3:02pm

Thanks for the link to the solution to obtain a csv, and for explaining the missing ranks to me.
It makes perfect sense now, but sometimes these things are not as intuitive as I would like them to be to me! My poor old brain has been quite challenged in the last week.

Best wishes,
Richard

system · June 26, 2023, 12:00am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.