Number of samples and Frequency per samples Vs. Number of Features and Frequencies per Features

(Meha) #1

Dear Admins and Colleagues,
After using qiime dada2 denoise-paired, and using visualizer, I got two graphs in two different pdf files. One of the pdf indicating Number of feature and Frequencies per features and the another demonstrating Number of Samples and Frequencies per Samples.
I am unable to interpret the two graphs firstly, and what are differences between samples and features? To put in another way, what does each say?


(Meha) #2

Also, the first graph has a couple of vertical tiny dark blue lines at the bottom of histograms. Please give me another explanation about them? Is it a normal phenomenon? What they are exactly? Important?

(Meha) #3

To digest efficiently the matter, I prefer to add extra info that you be on the line. In this library I have two type of samples: 1. treated and 2. untreated.

The library is composed of 16S rRNA PCR amplicons reads, by the way.

(Meha) #4

This graph is my demultiplexed data.
I do not know the dada2 results are correct based on the demultiplexed data. The graph shows I have 120 000 reads but I cannot compare it with with dada2 output (graphs). I confused!

Just help!

(Meha) #5

And also, in the second graph there are two tiny blue histogram. What they are?

(Meha) #6

I am wondering why there are not written my samples names under the each histograms while I dedicated a description column for my samples.

I found a Qiime2 user’s graphs mentioned his samples names.

How is Qiime2 able to name samples? What should I do to name my samples automatically by Qiime2?


(Matthew Ryan Dillon) split this topic #7

An off-topic reply has been split into a new topic: Interactive plots missing in demux summarize plot

Please keep replies on-topic in the future.

(Matthew Ryan Dillon) #8

These are called histograms, a tool for evaluating distributions.

In QIIME 2, samples and features are orthogonal axes of your dataset — samples are the biological samples (or replicates, or whatever is most appropriate there), while features can represent ASVs, OTUs, Species, Metabolites, Proteins, whatever! So, the features are the “things” present in your samples.

The first plot is showing you that most of your samples only have 0-25 features (ASVs if you ran DADA2).

The second plot is telling a similar story — most of your samples only have 0-25 features present.

That is a rug plot. It is a 1D scatterplot, showing the same data as found in the histogram, sans binning.

No, that isn’t quite right — that part of the plot is saying that you have 3 samples that have between ~90,000 and 120,000 sequences (features). The total number of sequences is listed elsewhere in the same visualization.

I think I answered that above.

Please see my link at the top of my post about histograms — I think you might be confusing that with a bar chart — these are two different plots (that sort of look the same!).

On import is usually how people declare the names. Then, in your metadata, you will have a sample-id column with the same names.