Hello! I have been having difficulty with taxonomic names in R and wanted to ask what are people's suggestions for this.
For context, I have been working with phyloseq objects made from qiime exports (feature table.qza, tree.qza, taxonomy.qza, and metadata). I then use the phyloseq objects for downstream analyses (e.g. differential abundance testing). I'm running into an issue with how taxonomic names are appearing in graphs (i.e., full taxonomic name from phylum to species) and I want to be able abbreviate this somehow to a taxonomic rank of my choosing.
The created phyloseq object has an otu_table (which has the sequences and their counts for each sample) and a tax_table (which associates the sequences with the taxonomic names and each taxonomic rank has their own column). Should I try to merge the otu_table and tax_table together? My concern for this is that the combined table would lack metadata until I then merge metadata with it, which seems like it defeats the purpose of even generating a phyloseq object.
Should I be trying to split the taxonomic names first before running any downstream analyses? My issue with this is that some of the taxonomic ranks have the same delimiter within its rank as between ranks (e.g., family_genus_species_group10).
Since I do use ggplot2, I was suggested to use labels of the preferred taxonomic names and then add the labels to the ggplot code but I feel like this would be tedious to do every time and might be prone to error if you mislabel/misremember
labels <- c("A", "B", "C")
scale_x_discerete(labels = labels)
I would appreciate any advice and suggestions on how to shorten/abbreviate taxonomic names when using phyloseq objects in R. Thank you!