I finished my “alpha-rarefaction.qzv” by using “qiime diversity alpha-rarefaction”. The observed_otus arrived at a horizon at 60-100. According to other papers, this number of observed_otus is too low.
this is my feature_table summary
Number of samples |110|
Number of features |4,594|
Total frequency |1,498,609|
Frequency per sample
Minimum frequency |4,998.0|
1st quartile |9,167.5|
Median frequency |13,016.0|
3rd quartile |16,425.5|
Maximum frequency |25,567.0|
Mean frequency |13,623.718181818182|
Frequency per feature
Minimum frequency |1.0|
1st quartile |42.0|
Median frequency |107.0|
3rd quartile |284.75|
Maximum frequency |23,618.0|
Mean frequency |326.2100565955594|
My feature_table is normal? But how can I get so few observed_otus???
I know that table.qza is equal to the OTUs table.
In my analysis, mean frequency per sample is 4,526.0. However, the number of observed_otus in the α diversity analyses is 60-100. Why is my observed_otus so low?
this is my table.qzv-summarise👇
this is my rarefaction curve👇
Hi! Which kind of samples are you processing? And which region and primers you used for an amplification? And you are showing two different tables in the post and a comment
My samples are stool samples from cancer patients treated with chemoradiotherapy. The region for amplification is V3-4, and the forward primer is 5′-CCTACGGRRBGCASCAGKVRVGAAT-3′; the reverse primer (5′-GGACTACNVGGGTWTCTAATCC-3′).
Really sorry for the confusing tables I uploaded!!! I will upload the correct pictures below👇
This is my table.qzv-summarize👇
This is my Alpha rarefaction👇
I never worked with fecal samples but I saw in several articles that other researchers usually get from 50 up to 200 observed OTUs on 16s libraries (at least, in articles with which I am familiar), so your samples are good enough, in my opinion. But I am sure that somebody with more experience with such samples will clarify it soon here
But I’m very confused about👉 why the number of observed_otus is lower than the number of features from the table.qza? “Feature” is not equal to “OTU”? If "Feature"≠ “OTU”，can you show me how is the process to change “Feature” to “OTU”? Is “Features” clustered to become “OTU”?
Actually, if you not performed clustering of your table by VSEARCH to 97% similarity, you have ASVs instead of OTUs, which have higher resolution than OTU.
You can have a lot of frequencies in your samples and low amount of ASVs or OTUs at the same time just because the same ASVs/OTUs are presented in very big amounts. But I think that in Qiime2 ASV or OTU = feature.
You are showing the number of observed OTUs for treatments column, which not reflect the total richness of each sample in the treatment, if you change that column to some unique to each sample column, you will see samples with a lot of features. That can mean that you have several very rich samples with unique features that are not pops up a lot in other samples. As you can see from my data, the differences are quiet big as well.
Thank you for your response:grin:
Yes, like you said, I haven't used vsearch, I only used dada2 to get a table.qza. I think I got so many features and frequencies because of this.
This is my data processed with vsearch👇. The number of features is lower：4000+👉1000+
I changed the column for the α-rarefaction, which can almost reflect the total richness of each sample👇. However, the number of observed_otus is still ≠ the number of features. Why is this?
A couple things… first, your number of OTUs looks in the range of reasonable, maybe a little bit low for fecal samples, but plausable. Keep in mind that you only have 100 samples and the number of total features (OTUs, ASVs, etc) is often proportional to the number of total samples. So more samples -> more observed ASVs because of noise, sequencing, real data, etc etc etc.
Im a little bit more concerned that your data is plateauing at 100 ASVs. It seems like a very shallow depth, and Id expect your observed ASVs to look like @timanix’s bottom curve. I’m not saying this is wrong - its just not behaviour you might expect, like your title says. My take away from it might be that you just have low complexity samples… how do they look when you go to other methods of examination?
Can you show me your DADA2 de-noise code? Ben
You have, for example, 4000 features in 100 samples, and it’s ok to have 50-150 OTUs curve, since the OTUs may differ from sample to sample. 4000 is general amount of OTUs distributed among all your sample together. Some OTUs are the same for every sample, other are not. Or I am not understanding right what are you asking?
Thank you so much!!! Now I understand
This is what I've done👇
Thank you. Your explanation helped me a lot. My samples came from cancer patients. When I check the number of observed_otus in other papers which used the same kind of samples, I found the numbers of observed_otus are around 150-250.
You menntioned other methods of examination and this is what I've done in two ways.
Do you have some suggestions about methods of examination?
First, that is a beautiful graphic and/or your handwriting is lovely.
If the curves (if not the numbers) are consistent between the two methods, then I would stick with that. It’s a bit odd to me because I would expect your curves to continue to grow, but if this is your data, this is your data.
The actual alpha diversity number presented in papers is a function of their rarefaction depth, sequencing protocol, metric… and not as externally valid as you’d like. Depending on their depth, it seems low to me, and so your low depth may correlate. How do these numbers compare to your healthy controls?
My recommend continuation would be to look at things like beta diversity to see what your overall patterns look like. I’d (personally) stick to the dada2 table without clustering because you’ve done all the work to get there and it seems wasteful to then collapse it back again.
Can we see the denoising stats and the cut adapt stats? Thank you. Ben
Actually, I haven't sequenced the samples from healthy people. This is a great suggestion.
And the numbers of observed_otus are consistent between the two methods. I think I will accept this result. Here is a result of beta_diversity. The samples are divided into 6 groups according to different treatment phases. I think there is no obvious difference between these groups👇.
I think at this point, it looks like you’re okay to move forward. I would recommend looking at some of the tutorials to see what pipelines for statistics, etc are. For base QIIME 2, I think the moving pictures tutorial is a good start, and the Parkinson’s mice is a bit more comprehensive. (It’s new for this release, and covers some more methods and hopefully a bit of interpretation). You also might want to look into q2-longitudinal, if your data is a time series. If you’re working in R, there are probably a lot of really good tutorials there (I’m just less familiar). But, of course, these are all just starting places and there are a lot more good options for exploring and analysing your data.
This is my stats.dada2.qzv👇
I got a trimmed-sequences.qza from cutadapt.
And this is the "qiime demux summarize" of it.
Is there something wrong?Thank you.
Thank you for your suggestions. I will read these tutorials one by one:grinning: