What are differences for the files in Silva database which used to train the classifier?

Lei · October 15, 2019, 1:21am

Thank you for your detail explanation. Now I have better understanding of my questions.
1.

After you give me this specific examples, I was able to understand their differences completely. It looks like the 16s pre-trained classifiers which provided in the Qiime2 data resources website Qiime 2 are using the "taxonomy_7_levels" file instead of the "consensus_taxonomy_7_levels" and I have used it for my 16s data taxonomy classification. To make consistent, I might keep using my old 18s classifier which trained by using "consensus_taxonomy_7_levels". Is it make sense to you?

2

I am working on both 16s and 18s data. I am satisfied with the 16s result by using the "7_levels" files. But 18s I am really not sure how the taxonomy of Eukaryote works and what is the best way to present the results. It has so many taxonomy rank compare to bacteria. So I thought putting all of the result in the some level might be better for interpretation purpose? I want to create Phyloseq object which require to specify the name of each taxonomic rank. If I use the 7 level, I do not know what name I should give if I go further than level 3 for the 18s data.

3

I used to follow the classifier tutorial to train my Bayesian classifier for 18s data several month ago. After refreshing my mind, I was able to remember how to import the files to qza files.
4

It make sense to me. DADA2 used extra long time to process all the 10 samples due to orientation problems. I understand other download stream analysis like beta diversity can also be affected since the distance matrices might not be correct under this conditions.
You mentioned that I can either classify my sample sets into two sets and run the classify-sklearn separately or using the classify-consensus-vsearch to solve this issues. However, I still cannot run the beta diversity analysis if I did not correct the orientation problem, right?
5

Do you mean the vsearch plugin for clustering and dereplicating the sequence? Did you suggest to use this method instead of using the DADA2 method? If I use vsearch method for clustering, do I still need to concern downstream taxonomic classification and beta diversity issue due to the orientation problems? For example, can I use classifier-skearn to do taxonomy after using the vsearch method?
6

If I still want to use DADA2 for denoise, as you recommend, I need to reverse the orientation of the reads for the raw fastq. But how can I do that for the fastq file. Can you please let me know by using which tools I can change the orientation of the fastq file.
Thank you so much for sharing your time to help me