The number of features is normal but the number of observed_otus is too low?

sandro.goforit · September 23, 2019, 1:44am

I finished my "alpha-rarefaction.qzv" by using "qiime diversity alpha-rarefaction". The observed_otus arrived at a horizon at 60-100. According to other papers, this number of observed_otus is too low.
this is my feature_table summary

Table summary

Number of samples |110|
Number of features |4,594|
Total frequency |1,498,609|

Frequency per sample

Minimum frequency |4,998.0|
1st quartile |9,167.5|
Median frequency |13,016.0|
3rd quartile |16,425.5|
Maximum frequency |25,567.0|
Mean frequency |13,623.718181818182|

Frequency per feature

Minimum frequency |1.0|
1st quartile |42.0|
Median frequency |107.0|
3rd quartile |284.75|
Maximum frequency |23,618.0|
Mean frequency |326.2100565955594|

My feature_table is normal? But how can I get so few observed_otus???

sandro.goforit · September 23, 2019, 5:07am

Hi~
I know that table.qza is equal to the OTUs table.
In my analysis, mean frequency per sample is 4,526.0. However, the number of observed_otus in the α diversity analyses is 60-100. Why is my observed_otus so low?
this is my table.qzv-summarise👇

this is my rarefaction curve👇

timanix · September 23, 2019, 5:11am

Hi! Which kind of samples are you processing? And which region and primers you used for an amplification? And you are showing two different tables in the post and a comment

sandro.goforit · September 23, 2019, 7:37am

Hi!
My samples are stool samples from cancer patients treated with chemoradiotherapy. The region for amplification is V3-4, and the forward primer is 5′-CCTACGGRRBGCASCAGKVRVGAAT-3′; the reverse primer (5′-GGACTACNVGGGTWTCTAATCC-3′).
Really sorry for the confusing tables I uploaded!!! I will upload the correct pictures below👇
This is my table.qzv-summarize👇

This is my Alpha rarefaction👇

Sorry again!

timanix · September 23, 2019, 8:40am

I never worked with fecal samples but I saw in several articles that other researchers usually get from 50 up to 200 observed OTUs on 16s libraries (at least, in articles with which I am familiar), so your samples are good enough, in my opinion. But I am sure that somebody with more experience with such samples will clarify it soon here

sandro.goforit · September 23, 2019, 11:15am

Thanks
But I'm very confused about👉 why the number of observed_otus is lower than the number of features from the table.qza? "Feature" is not equal to "OTU"? If "Feature"≠ "OTU"，can you show me how is the process to change "Feature" to "OTU"? Is "Features" clustered to become "OTU"?

timanix · September 23, 2019, 11:44am

Actually, if you not performed clustering of your table by VSEARCH to 97% similarity, you have ASVs instead of OTUs, which have higher resolution than OTU.
You can have a lot of frequencies in your samples and low amount of ASVs or OTUs at the same time just because the same ASVs/OTUs are presented in very big amounts. But I think that in Qiime2 ASV or OTU = feature.
You are showing the number of observed OTUs for treatments column, which not reflect the total richness of each sample in the treatment, if you change that column to some unique to each sample column, you will see samples with a lot of features. That can mean that you have several very rich samples with unique features that are not pops up a lot in other samples. As you can see from my data, the differences are quiet big as well.

sandro.goforit · September 23, 2019, 1:13pm

Thank you for your response:grin:
Yes, like you said, I haven't used vsearch, I only used dada2 to get a table.qza. I think I got so many features and frequencies because of this.
This is my data processed with vsearch👇. The number of features is lower：4000+👉1000+

I changed the column for the α-rarefaction, which can almost reflect the total richness of each sample👇. However, the number of observed_otus is still ≠ the number of features. Why is this?

jwdebelius · September 23, 2019, 1:58pm

Hi @sandro.goforit,

A couple things... first, your number of OTUs looks in the range of reasonable, maybe a little bit low for fecal samples, but plausable. Keep in mind that you only have 100 samples and the number of total features (OTUs, ASVs, etc) is often proportional to the number of total samples. So more samples -> more observed ASVs because of noise, sequencing, real data, etc etc etc.

Im a little bit more concerned that your data is plateauing at 100 ASVs. It seems like a very shallow depth, and Id expect your observed ASVs to look like @timanix's bottom curve. I'm not saying this is wrong - its just not behaviour you might expect, like your title says. My take away from it might be that you just have low complexity samples... how do they look when you go to other methods of examination?

Best,
Justine

ben · September 23, 2019, 2:13pm

Can you show me your DADA2 de-noise code? Ben

timanix · September 23, 2019, 2:15pm

You have, for example, 4000 features in 100 samples, and it's ok to have 50-150 OTUs curve, since the OTUs may differ from sample to sample. 4000 is general amount of OTUs distributed among all your sample together. Some OTUs are the same for every sample, other are not. Or I am not understanding right what are you asking?

sandro.goforit · September 24, 2019, 1:57am

Thank you so much!!! Now I understand

sandro.goforit · September 24, 2019, 3:03am

Hi Ben,
This is what I've done👇

Best,
Sandro

sandro.goforit · September 24, 2019, 3:27am

Hi Justine,
Thank you. Your explanation helped me a lot. My samples came from cancer patients. When I check the number of observed_otus in other papers which used the same kind of samples, I found the numbers of observed_otus are around 150-250.
You menntioned other methods of examination and this is what I've done in two ways.
Do you have some suggestions about methods of examination?

Best,
Sandro

jwdebelius · September 24, 2019, 7:45am

Hi @sandro.goforit,

First, that is a beautiful graphic and/or your handwriting is lovely.

If the curves (if not the numbers) are consistent between the two methods, then I would stick with that. It's a bit odd to me because I would expect your curves to continue to grow, but if this is your data, this is your data.

The actual alpha diversity number presented in papers is a function of their rarefaction depth, sequencing protocol, metric... and not as externally valid as you'd like. Depending on their depth, it seems low to me, and so your low depth may correlate. How do these numbers compare to your healthy controls?

My recommend continuation would be to look at things like beta diversity to see what your overall patterns look like. I'd (personally) stick to the dada2 table without clustering because you've done all the work to get there and it seems wasteful to then collapse it back again.

Best,
Justine

ben · September 24, 2019, 1:13pm

Can we see the denoising stats and the cut adapt stats? Thank you. Ben

sandro.goforit · September 25, 2019, 1:28am

Hi Justine,
Thank you.
Actually, I haven't sequenced the samples from healthy people. This is a great suggestion.
And the numbers of observed_otus are consistent between the two methods. I think I will accept this result. Here is a result of beta_diversity. The samples are divided into 6 groups according to different treatment phases. I think there is no obvious difference between these groups👇.

jwdebelius · September 25, 2019, 7:28am

Hi @sandro.goforit,

I think at this point, it looks like you're okay to move forward. I would recommend looking at some of the tutorials to see what pipelines for statistics, etc are. For base QIIME 2, I think the moving pictures tutorial is a good start, and the Parkinson's mice is a bit more comprehensive. (It's new for this release, and covers some more methods and hopefully a bit of interpretation). You also might want to look into q2-longitudinal, if your data is a time series. If you're working in R, there are probably a lot of really good tutorials there (I'm just less familiar). But, of course, these are all just starting places and there are a lot more good options for exploring and analysing your data.

Best,
Justine

sandro.goforit · September 25, 2019, 7:35am

Hi Ben~
This is my stats.dada2.qzv👇

I got a trimmed-sequences.qza from cutadapt.
And this is the "qiime demux summarize" of it.
trimmed_sequences.qzv👇

Is there something wrong?Thank you.

Best,
Sandro

sandro.goforit · September 25, 2019, 7:37am

Hi Justine~
Thank you for your suggestions. I will read these tutorials one by one:grinning:

Best,
Sandro