I chose a frequency of 68 based on my Mean Frequency, which was 67,851 and I multiplied this number by 0.001 to get 68, so meaning an ASV is kept if it is seen at least 68 times in my samples.
However, after this filtering step, I went from over 25,000 features to 4,000 features. That seems like a lot is removed and I don't know if that's normal? These are seawater + oil samples. Is this step necessary?
Hello @emmlemore,
Looking at the tutorial you linked it seems like your math is good and that you have just a large amount of low frequency features. I don't think that it is super abnormal and your command looks good to me.
This step is definitely not specifically necessary. There are a lot of ways to go about filtering out contaminants. In general, selecting a minimum frequency threshold can be a trade-off between minimizing the number of false positive features (i.e., features that appear due to noise or sequencing errors) and retaining real signals in your data.
You could look into lowing your --p-min-frequency and also using a --p-min-samples parameter. This would allow you to select for samples that have a minimum frequency and appear in a minimum amount of samples.
Thank you for your reply. I played around with the --p-min-frequency parameter and ended up retaining ~7000 features, which is double the previous ~3500.