Annotate Bacterial Genes Amplicon data

Keegan-Evans · July 28, 2021, 5:24pm

Thanks for providing the details that you have about your analysis! I am wondering about the genes you are targeting in your study, it could just be that SILVA is not the correct database to use with your project.

However given the extremely low rate of annotation that you ended up with, I would suspect that somehow you still have "garbage" data in your samples. This often happens when primer sequences are not removed before any other steps are performed in the analysis. Making sure that you have removed all of your primer sequences is where I would start. You might take a look at this forum post.

You say that you use DADA2 for denoising, but you do not say what sequencing technology was used to generate your data. I bring this up because DADA2 is specifically for use with Illumina-sequenced data.

Also, DADA2 generates ASVs, which are a more precise, modern equivalent to OTUs. While it is possible to cluster again afterwards, take a look at the following resources and consider not clustering after denoising.

closed-reference-otu-picking-vs-taxonomic-annotation
Exact sequence variants should replace operational taxonomic units in marker-gene data analysis
ESVs vs OTUs
To Cluster Or Not To Cluster

If you are specifically wanting PCA, you might want to checkout DEICODE. However, there are other PCoA methods that are more commonly used besides PCA(Euclidian PCoA), you might want to check out this forum thread for a more in depth discussion.

Hopefully this will help you move forward with your analysis! If this doesn't get things moving forward or you have more questions let me know.