Hi @colinbrislawn,
Thank you very much for taking the time to look into my query. I have gone through @jwdebelius Justine’s advice and wanted to confirm whether my understanding is correct.
From what I understand, @jwdebelius is suggesting that the alpha-rarefaction depth can be used as a guide for setting --p-min-frequency filtering criteria (i.e., retaining only samples with sequencing depth ≥ rarefaction depth). Is that interpretation correct?
I have attached the alpha-rarefaction plot for my dataset. Based on this curve, how much --p-min-frequency can be reasonable for filtering, or would you recommend a different threshold based on where the curves begin to plateau?
Also, I checked the DADA2 output (table.qzv) and found that:
-
There are no singleton ASVs in the dataset (min total frequency = 2).
-
Each ASV is present in at least 1 sample.
Given this, it seems that feature filtering based on min freq may not be necessary, particularly since I am already removing mitochondrial, chloroplast, and eukaryotic reads.
However, I was considering applying a prevalence-based filter (e.g., --p-min-samples 2) to retain only those features observed in at least 2 samples. But here, my concern is that this may remove rare taxa that could be biologically meaningful, especially given my samples are heterogeneous (case-control).
I would appreciate your guidance on whether prevalence filtering is advisable in this context or whether skipping this freq/sample based filtering all together is fine?
Thank you again for your time and guidance.
