The consensus in that post was to try using the SILVA 132 release. Why not do that? As noted in that post by @SoilRotifer , you can use RESCRIPt to download and automatically format the SILVA 132 release.
If you want to switch to RDP, why not import the RDP sequences/taxonomy into QIIME 2 and classify with q2-feature-classifier ? This might be an easier task (re-formatting RDP to fit the standard taxonomy format accepted by QIIME 2) than parsing the RDP classifier outputs.
Thanks for your attention.
First, I dont use the silva 132, because I looked up some articles and burkholderia-Caballeronia-Paraburkholderia still appeared when using 132 version，such as this article:https://doi.org/10.1016/j.foodres.2021.110241.
Second, I indeed overlooked the second way you advised, because I browsed the forums and it seems difficult to import it. I will try it.
Okay I am not sure what release that label first appeared in — it is just indicating that the genus is unresolvable/contested between those 3 genera, so they concatenated the labels.
If you just want a different database, not specifically RDP, you could use NCBI-refseqs 16S, see the tutorial on this forum for using RESCRIPt to automatically download the refseqs 16S database. This way would have not importing errors.
We do also plan to add support in RESCRIPt for automated download/formatting of the RDP database in the future, but I do not have an ETA on that.
However, I'd advise against removing these groups, as you'll want to keep off-target / outgroup taxa in your database. Otherwise you'll likely identify many taxa incorrectly as "d__Bacteria; ;", when in fact they may be Archaea or Eukaryota.
Thanks,sir. I'm still a little confused. Even if I retain Archaea Eukaryota, I would still only focus on the bacteria in the downstream analysis, so I would still filter out the archaea in the downstream analysis (I used the phyloseq. How is this different from the current deletion
The idea is to correctly classify what is and is not Bacteria. Again, if there are no outgroup taxa in your reference database, then you might incorrectly over-classify sequences as being Bacteria when in fact they are not Bacteria. Then you will have greater confidence that you are removing (or retaining) the appropriate data.
This is no different then removing chloroplast and mitochondria sequences after you classify your reads, as shown in the filtering tutorial. It is usually better to identify everything and then filter your table based on what you need for your analyses.
If you read through this tutorial, you'll see that this focuses on downloading from the RefSeq target loci data. Specifically, if you read under the "Bacteria and Archaea: 16S ribosomal RNA project" section, you'll see that this data only contains sequences from "...bacteria and archaea type materials."
Unlike the other larger references databases (i.e. SILVA, GTDB, RDP,...) that contain a mix of environmental sequence data, type material, etc...
Not necessarily. It depends on what your goals are. But in my general experience you might identify a broader range of taxa with SILVA and GTDB. Or you can simply use all of the databases and see if there is a general consensus of which taxa you can constantly identify.
Thanks sir. I want to use RDP and silva database. But, as I said above, silva appeared burkholderia-Caballeronia-Paraburkholderia. And Qiime2 does not seem to support RDP. So,I used NCBI.But if I classified 16s from soil to this database, and wrote an article, will the reviewer question it?
Thanks. But I still have some questions.
My previous analysis procedure was to remove all categories other than bacteria after the classification, as shown in the code below. But after listening to your explanation, I am still confused and don't know how to operate it. I want to do downstream analysis in qiime2, such as alpha diversity. Thus I should focus on the k_Bacteria. If I didn't delete Archaea Eukaryota, it will get a incorrect result. What should I do? And I filter the seqs according to filtering tutorial after classify. So,what I should do to filter sequence correctly ?
My understanding is that it has something to do with the categories contained in the database? If the database contains more categories, will it be more credible to remove unwanted categories? But the current database, which usually contains bacteria and archaea together, such as RDP and NCBI that I used, does not ensure that the deletion is correct?