How to perform filtering step for forward read only analysis?

Hi @colinbrislawn,

Thank you very much for taking the time to look into my query. I have gone through @jwdebelius Justine’s advice and wanted to confirm whether my understanding is correct.

From what I understand, @jwdebelius is suggesting that the alpha-rarefaction depth can be used as a guide for setting --p-min-frequency filtering criteria (i.e., retaining only samples with sequencing depth ≥ rarefaction depth). Is that interpretation correct?

I have attached the alpha-rarefaction plot for my dataset. Based on this curve, how much --p-min-frequency can be reasonable for filtering, or would you recommend a different threshold based on where the curves begin to plateau?

Also, I checked the DADA2 output (table.qzv) and found that:

  • There are no singleton ASVs in the dataset (min total frequency = 2).

  • Each ASV is present in at least 1 sample.

Given this, it seems that feature filtering based on min freq may not be necessary, particularly since I am already removing mitochondrial, chloroplast, and eukaryotic reads.

However, I was considering applying a prevalence-based filter (e.g., --p-min-samples 2) to retain only those features observed in at least 2 samples. But here, my concern is that this may remove rare taxa that could be biologically meaningful, especially given my samples are heterogeneous (case-control).

I would appreciate your guidance on whether prevalence filtering is advisable in this context or whether skipping this freq/sample based filtering all together is fine?

Thank you again for your time and guidance.