How to choose my sampling depth according to aplpha rarefaction plot?

Hello, everyone,
I'd like to ask about the sampling depth.
In my feature table, the minimum frequency is 162 and the maximum 2387, but most of them are lower than 1000. Therefore, I draw the alpha-rarefaction diagram and set the median frequency to --p-max-depth.( --p-max-depth 540).
In the alpha-rarefaction figure, the Downstream curve is not stable. In order not to lose samples, can I choose my rarefy depth with minimum frequency162?

1 Like

Hi there,

Normally for diversity analyses I set my rarefaction depth to the number of sequences of the shallowest sample so I don't lose samples. But I only do this if all the curves plateau. I can see your light-blue curve doesn't plateau as it should. I think this is because your sequencing depth is really low. In my short experience, if the sequencing depth is acceptable, Shannon rarefaction curves plaeau really fast (and by really fast, I mean that the first point already reaches the plateau).

Observed features curves normally plateau slower (you can check it by changing the rarefaction metric in the Metric drop-down menu). I think we should take a look at the Observed features curves but without the sample grouping (you can disable it in the Sample Metadata Column drop-down menu). But as an initial guess I think at least the light-blue samples are of insufficient sequencing depth.

Anyway, a forum senior may provide you with better advice :slight_smile:

As a side comment, normally when I run alpha rarefaction I set as --p-max-depth the number of sequences of the biggest sample, so I can see full curves in the plot. It doesn't hurt anyone since samples with less depth are not removed from the rarefaction plot.

Best wishes,



Disclaimer: I'm only another forum user, just like you. Please don't take my answer as a ground truth. A Forum Moderator would probably provide you with a more accurate answer.


Hi @Chnyng,
I think @salias gave a great answer!

I personally dont prioritize setting my rarefaction depth to the lowest samples. I typically prioritize having more features for each sample. But as @salias mentioned as long as the curves plateau, you are golden :star:

I think you should follow @salias advice of looking at Observed features per sample and I would also suggest setting your max-depth to the the number of sequences of the biggest sample so we can see all the patterns in your data.

Hope this helps!



Thank you for your answers.

I have set my max-depth to the the number of sequences of the biggest sample(2387). And there is my result. It seems that the sequencing depth of downstream samples is too low. So what can I do?

Thank you so much :grinning:

Hi again @Chnyng ,

In my opinion, both Midstream and Upstream samples (light-blue and orange curves) don't have enough sequences to confidently perform further analyses on them. Downstream samples seem to plateau (although the final rise of the curve puzzles me a bit). Anyway, there may be individual samples within Downstream (I assume SamplePosition is one of your metadata columns) with not enough sequences. To check that, select the metadata column containing your sample IDs in the drop-down menu. Then you'll be able to see individual curves per sample, and you can decide whether or not individual samples should be ruled out.



Thank u!

I have done what you said.

It seems that some individual samples have low sequencing depth, but I am not sure how to do with these. Can I get rid these data directly?

I wanna to set the rarefy metric at the minimum frequency, but it just retained 2,916 (23.72%) features in 18 (100.00%) samples at the specifed sampling depth.

I am so confusing...

It's the rarefaction plot

This is the feature table

I'm afraid I cannot assist you directly in the decision of discard samples, as that is completely up to you and depends heavily on your experiment details, biological question, etc. However, what I can say about your curves is that I think the sequencing depth is not enough for obtaining reliable further analyses.

Regarding that really low sequencing depth, I think maybe we were missing something here all this time. Did you create your feature table with DADA2? If so, would you mind sharing with us the denoising stats QZV file? I'm suspecting that maybe the sequencing depth is good, but DADA2 is discarding too much sequences.

1 Like

Thank you!

It's the result of dada2, but I think it' s not worse.

Good news: sequencing depth is not that bad!

You have many sequences for each sample before (and while) DADA2. Note that you have way more sequences while DADA2 analysis than in your final feature table: the shallowest sample in the DADA2 table has 45k (in the merged column), while your biggest sample in the feature table has only 2387. So two things can be happening here:

  1. DADA2 chimera removal is discarding too much sequences. Can you also share last two columns of the DADA2 denoising stats table? Those for chimera removal stats.

  2. (I think this is more likely to be happening) You are performing some aggressive filtering after DADA2. What do you do between DADA2 and the rarefaction curves?

1 Like

Thank u!You are so kind. :blush:

There is my non-chimeric data, I think it's not bad.

The answer for q2:

After dada2, I seperate the archaea and bacteria from my sample. What I showed to you before is the feature table of my archaea data. As for the bacteria data, it's normal, so I have done the rarefy work. But the archaea data seems that it there's a big difference between the maximum and the minimum frequency (162 and 2387).

But I have try to rarefy at 162 depth, although it just retain about 20% frequency, but I found the difference in the alpha diversity.

Can I just rarefy like that? What's your advice?

Thank you so much! Best wishes! :star_struck:

1 Like

Yes, DADA2 is not discarding too much sequences. So DADA2 is not the problem.

Okay, now we have something! It is quite likely that the environment your samples come from has a very low number of Archaea compared to Bacteria. That's why you have too few sequences.

I don't think the problem here is the difference. In all your samples, the number of ASVs classified as Archaea is small. The problem is that with so few sequences, subsequent analyses may not be too reliable.

What can you do? Well, it depends on your biological question. You can analyze both Bacteria and Archaea inside the same feature table. If you are not really interested in Archaea, you might want to rule out Archaea ASVs and focus on Bacteria. It's up to you.



But I wanna to analyze archaea and bacteria respectively, so I post these problem.

If I want to analyze archaea, can I rarefy at 162?

As your opinion, the number of archaea is too low, so it might not make sense if I continue to analyze the archaea, right?

Yes, you technically can rarefy at any depth.

I cannot tell you whether or not you should analyse your Archaea data¹. What I can tell you is that if I had to analyse these data, I would probably go with one of the options I outlined in my last post. But it's up to you to perform the analysis, and you should take these potential limitations (e.g. low number of sequences) into account when discussing results and drawing conclusions.

Good luck!



¹ Click to see why

As stated in the QIIME2 Forum Code of Conduct: "Your Work is Your Work. Technical challenges are real, software bugs happen, and the state of the discipline is constantly changing. This forum is here to help you navigate those (and other) challenges, but it is ultimately your responsibility to understand your data and your tools, perform your own analysis, and derive meaning from your data."

1 Like

Ok! Thank u

It is really subjective when performing thedata. And I will take your advice! :star_struck: