I have performed a metagenomic classification on several extreme enviromentes (halophilic, Acidophile, radioactive and minery). I have even trained the SILVA database. But the taxonomy resolution I obtain is very low for some of the samples (for instance in the halophilic medium I get D_0__Eukaryota;;;;;; | 100.000%)
I was wondering if theres ANY way of enhacing this results, either with qiime2 or other external software.
Hey @jose_gacia
I am sorry for the late response. I am not entirely sure that I understand your issue. You trained your own classifier using the SILVA data base and your taxonomic Resolution is low in some of your samples. Let me know if I am correct?
Thank you
My problem is that in the taxa-bar plot I have samples where I get no taxonomy identification at all. Like I said before
So my objetive was to identify wich OTUs where present in that sample, in order to obtain their sequences and do blast. Since that is an information, that to my undertanding, is not directly aviable in qiime .qzv objects, I tried the following:
later I exported it with qiime tools export and converted It to tsv with biom convert
What I finally obtain It’s a table with that I undertand Its every OTU ID that contains “D_0__Eukaryota” and the frecuency of that OTU in every sample.
With that information I was later able to export taxonomy.qza and join the taxa for each OTU, and later export the rep-seq.qza and join the sequence.
So finally I obtain a table with the following information for each column: OTU ID, frequency of that ID in every sample, taxa and sequence.
With that information I am curing manually all the non-identified OTUs doing blasts.
My question is the following: Is my strategy correct?, and what does that frecuency from qiime taxa filter-table exacltly mean?
Hello @jose_gacia
Here are my suggestion, I think that you are getting low resolution because your classifier isn’t doing a great job of representing your sequences. You could try to find a better one and that might help you get a better taxonomy result. One that might work is : SILVA 138 DB available on the 2020.8 data resources page. It has better rank prefixes (i.e. d__ instead of D_0__ , etc…) and is generally a smaller database.
However, if you are getting successful results by running blast on the rep-seq.qza sequences that didn’t get identified that is awesome and a great strategy. There are a couple of things that could make this easier. 1) you might be able to download the blast db and query against it locally which could save time compared to clicking sequences individually. 2) you also could look into the command classify-consensus blast. Either way I think you strategy is good. Hopefully my suggestions can help with saving some time.
Finally you asked what the frequency from the table.qzv means. The frequency per sample looks at how many features are in each sample and frequency per feature which looks at how many samples contain that feature.
I hope this helps!