Hello good friends in Qiime2 developer team
I wanna filter my feature table based-on total frequency in all samples. actually there are many low abundant features in my table. But i doubt how to perform this filtering and with which --p-??
Would you please clarify my mind according to the attached table.qzv file,
Hi @sajjad.sarikhan,
If I understand correctly what you're trying to do, you should provide the --p-min-frequency parameter to achieve this.
Here's an example using the table.qza file from the Moving Pictures tutorial. You can see from the feature table summary that the median feature frequency is 25. The feature frequencies presented in this summary are the total number of times each feature was observed across all of the samples, which as I understand it, is what you want to filter based on.
If I then summarize that table, I can see that the minimum feature frequency is now 25 (here's a screenshot of the feature table summary that I created locally with the command qiime feature-table summarize --i-table table-mc25.qza --o-visualization table-mc25.qzv):
Hi @gregcaporaso
Thank you for the clear description. my main question was that which p-min-frequency will be the appropriate number??? You decided to choose 25 according to the median frequency of the tutorial. Shall we choose the median freq all the times? May i choose 10 for this parameter?
Hi @sajjad.sarikhan, Thanks for clarifying, I understand your question now. There's not a "right" answer to this question. Our concern here is that the samples with low frequencies may not be a good representation of the underlying biology, so you have to think about how you want to set this value with that consideration in mind.
Based on the qzv that you shared, I would probably choose the lowest non-zero frequency, which in your case is 298, as the "min frequency" value. If you had a few others that were very low frequency (say less than 200), I might still choose 298 to drop samples those samples.When you generate taxonomy profiles (e.g., with qiime taxa barplots), if samples that have low frequencies seem to have unexpected compositions, you might consider setting this value a little higher.
When you get to the steps like qiime diversity core-metrics were you have to choose an even sampling depth, I think you'll want to go quite a bit higher, maybe 1000 based on your qzv.
Hi @gregcaporaso
And thank you so much for your kindly explanation of the issue, But i think there was a confusing description in choosing median frequency.. You stated that i should choose 298 , But this is the min frequency of all features at the samples but you earlier mentioned that the med frequency in the "frequency per feature" table should be used. and i think the later one is correct. Am i right?
Hi @sajjad.sarikhan,
I chose 298 here because it was your lowest non-zero frequency, so allows you to retain as many samples as possible. If you choose the median frequency, you'll remove half of your samples from all analyses that follow, which I expect is probably not what you want to do. For that reason I wouldn't recommend choosing the median value at all times.
Hi @gregcaporaso
Thank you for kindly following my issue and explanation of the matter. But my main question was about the type of the table you choose to pick up the min frequency. At your first description you used Frequency per feature table and used its min frequency, But i my case you used min frequency of total features in the other table. This is my question.
Shall i use frequency per feature table or total frequency of the features per sample table?
@sajjad.sarikhan, my apologies for the confusion. I got mixed up on what you were trying to filter.
There isn't a right answer for what value to use to filter low abundance features. I used the median frequency in my earlier example for illustrative purposes of how it works, but in practice I wouldn't use that value as by definition it would throw away half of the features.
In my own studies, I don't consider filtering features to be an essential step, especially given how well the modern denoisers (such as DADA2 and Deblur) work. When I do filter features, I don't filter based on minimum frequency, but rather (a) by requiring them to show up in some minimum number of samples, or (b) if they don't achieve some minimum level of taxonomy assignment.
Option (a) is achieved with qiime feature-table filter-features using --p-min-samples (a value of 2 is a good start). This would retain features only if they were observed in more than one sample, which is a good approach for dropping features that are likely sequencing errors. This is illustrated in the documentation here.
Option (b) is achieved with qiime taxa filter-table. There are a couple of examples in the documentation here which show how to filter features that are not assigned at least to the phylum level, which is what I typically do.
I commonly apply one or both of those feature filters to my data, but I don't often filter by minimum frequency so I can't offer advice on a minimum value to provide for that type of filter.
Does this information help you figure out how to proceed? Let me know if not, and I can try to help more.
Hi @gregcaporaso
Indeed, I am very grateful to you for your kindly clarifying of filtering issue which was a confusion to me. That helped me a lot and i got the picture.