However, after this filtering step, I went from over 25,000 features to 4,000 features. That seems like a lot is removed and I don't know if that's normal? These are seawater + oil samples. Is this step necessary?
Looking at the tutorial you linked it seems like your math is good and that you have just a large amount of low frequency features. I don't think that it is super abnormal and your command looks good to me.
This step is definitely not specifically necessary. There are a lot of ways to go about filtering out contaminants. In general, selecting a minimum frequency threshold can be a trade-off between minimizing the number of false positive features (i.e., features that appear due to noise or sequencing errors) and retaining real signals in your data.
You could look into lowing your --p-min-frequency and also using a --p-min-samples parameter. This would allow you to select for samples that have a minimum frequency and appear in a minimum amount of samples.