Visualize abundance of ASVs after taxonomic analysis

georgia · January 29, 2018, 12:52am

Hi!

I have been working on a project where I analyze the abundance of specific bacteria (down to the genus level, ex: Pseudomonas, Bacillus, Halomonas) in over 150 water samples spread out between sites. I am presenting my findings soon, and am not exactly sure the best way to explain to my audience what ASVs represent and what my graph shows.
I am counting the total number of ASV hits (is that correct?) at each site, then dividing that amount by the number of samples at each site to determine the abundance of specific bacteria. I am displaying them now in a bar chart. I have not found any papers or methods yet that follows this and am wondering if this is the best method of visualizing my findings.

I’m sorry if this is unclear, but I am not sure how to correctly display and explain this.

Thank you!

Nicholas_Bokulich · January 29, 2018, 1:24am

Hi @georgia,
Thanks for posting! Let me see if I can help.

If you are focusing on the relative abundance of each genus at each site, I would recommend that you steer clear of ASVs and explaining what they are (unless if you are trying to give an overview of the methodology or present any results that deal with ASVs).

Instead, explain that you are measuring the relative abundance of genera X, Y, and Z based on detection of 16S rRNA genes (I assume) that are indicative of each site.

If you must explain ASVs, you could expand from the concept of 16S rRNA gene sequencing. When sequencing the 16S rRNA genes amplified from a mixed microbial community, you will generate a mixture of sequence variants. The ASVs are the set of unique sequence variants observed in this mixture, each of which may thus represent a distinct taxon. (I'd certainly steer clear of explaining the "actual" part of ASV, i.e., that a method like dada2 or deblur detects and removes sequence noise and chimeric sequences. That's just getting too far into the details)

These ASVs are used to determine the similarity between samples/groups by determining the similarity of presence/abundance patterns of ASVs in each sample and using that to calculate the pairwise distance between samples. We can also count the number of ASVs ("observed OTUs") and/or the branch length that they cover on a tree ("phylogenetic diversity") in each sample as metrics of alpha diversity.

Finally, we predict the taxonomic affiliation of each ASV by comparison to a set of reference sequences with known taxonomy.

Does the barplot show each individual sample or each individual site (averaged across multiple samples)? The answer to your question depends on how the barplot was generated...

The barplot is showing the relative abundance of taxa X, Y, and Z in each sample. This is the number of times that any ASV that is predicted to belong to a given taxon is observed in that sample, divided by the total number of sequences observed in that sample. Hence, the barplot illustrates the predicted taxonomic composition of each sample, or the fraction of observed sequences belonging to each taxon.

If you have used a method like feature-table group to group your samples by metadata categories (e.g., site) and then display the average taxonomic composition in a barplot, then take the definition that I have given above and replace each occurrence of "sample" with "site".

Papers often use barplots, heatmaps, or similar plots to provide an overview of the taxonomic composition; this is often very informative, e.g., to show readers the most abundant taxa observed in each sample. However, these plots are qualitative and do not really give a sense of whether samples/groups are significantly different from each other, or which ASVs/taxa are significantly different between groups. So barplots/heatmaps are also frequently not used in many papers. It all depends on personal preference and the goals of the analysis.

I frequently use barplots or heatmaps in my papers when it is important to show readers the most abundant taxa and the "landscape" of microbiota present; I have also frequently omitted such qualitative plots in my work, particularly when there is a large degree of heterogeneity between samples, there is an extremely high level of diversity present that cannot really be captured by a barplot, and/or the overall composition is less important than showing which taxa/ASVs are significant between groups of samples. So it all depends on the goals of the analysis.

Not knowing much about your project or the goals of your analysis, my guess is that barplots (possibly grouped by site) may be a useful way for you to discuss overall trends in taxonomic composition and tell readers/audience members about the most abundant taxa in each sample type (if this is important), then dig into statistical tests comparing/contrasting sample types and the taxa that differentiate them.

I hope that helps!

system · March 1, 2018, 7:24am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.