Hi! I'm so sorry this is long but I don't know what's relevant for you to know and what isn't.
I'm sorry if this is a dumb question but I am very lost and hoping for some guidance from more experienced bioinformatics folks. I'm doing my first microbiome 16s Amplicon data analysis from start to finish - I've previously only worked in the wet lab side. My primary question is when to agglomerate with respect to handling my NAs.
My second question is broader: I'm struggling to get a sense of direction on the order of steps of how to appropriately wrangle my 16s data in R using the following packages: phyloseq (deseq2 isn't available for my version of R so haven't used it), vegan, microbiome, and microViz. I have coding experience with ggplot2, dplyr, tidyverse, plotly type pf packages (data viz) which I was hoping to bring into the alpha diversity, relative abundance, beta diversity plots, heat maps, and any relevant plots after ANCOM and mixed models analysis.
Background on data: I have successfully performed all the denoising in Qiime2 with deblur, generated an OTU feature table (that I did not turn into relative abundance prior to exporting for R because I figured it was better to clean up my data in R and then transform to relative abundance from absolute counts), generated my .qza files for taxonomy and my phylogenetic tree, and imported everything along with my metadata into Rstudio. (I manually verified that the order of my samples in my metadata matched the excel files for my OTU table columns before importing).
I shifted my OTU column into rows for my OTU table and my taxonomy table, made each into a matrix, and created a phyloseq object that includes OTU, Tax, metadata, and tree. I took that initial object, and created two objects from it. The first has dropped all my unclassified data in each rank level (i.e. k__; s__, na, unknowns). The second object I used tax_fix to rename anything unknown based on the next closest known classification but not drop them because I would like to see if there's anything worth noting in the unclassified organisms between my control group's samples and my treatment group's samples.
HERE'S where I'm lost! Now that I've made these two objects, I don't know what the next appropriate step is or if it doesn't matter what order I do things? Should I convert my absolute counts to relative abundance now, then agglomerate at each rank level, then begin plotting alpha diversity/beta diversity/heatmaps/visual of phylogenetic tree at different rank levels? Should NAs be dropped for these to be accurate visualizations? Should I subset my data by rank level or is agglomerate equivalent/better?
Other than plotting the relative abundances at each rank level to show compositional shifts between my variables of interest and groups, should I be using the relative abundances for all the other plots/analysis or go back to absolute counts for heat maps/alpha/beta metrics? Is there a separate step to calculate the alpha diversity and beta diversity and THEN plug the data into the code for the respective plots? or does the plotting code do the calculations in the background? Some examples go straight from merging the initial object to plotting the alpha diversity and others have this interim step of calculating indices which is separate from visualization??
All the workflows are different from one another, and many do not explain the in-between steps but jump between the code for the plots and don't specify the characteristics my data needs (absolute or relative, low reads trimmed off or kept in, use object with all the hierarchies or must filter out data so only one rank is in the object before doing it).
I realize I've packed a LOT of questions in here (might indicate how lost I am currently). If you're able to help with even just one of these questions I'd be grateful! Ultimately, I'm stuck after the step of "dealing" with my NA's and need to produce the standard visualizations I've listed and diversity metrics. Do all plots need both agglomeration and pruning prior? Are these functions the same thing? do any of these need NAs or will they work with and without NAs?
I don't know where to go from here, thank you in advance for any guidance!! (and for reading all of this)