How to reproduce featureTable filtering by featureIDs?


After denoising the raw sequence data using dada2/deblur, we need to do some “housekeeping” works, such as excluding contaminant features from the featureTable using unique featureIDs, before we proceed to downstream analyses. My question is how can we make the feature filtering reproducible if the raw sequence data were to be denoised again using the same commands? Providing the featureIDs used for the filtering won’t work as these featureIDs are randomly generated during the denosing process. The same feature (ASV) may have totally different featureIDs if we denosie the raw sequence data multiple times. Using the exact sequence of the feature for filtering should be reproducible. Is it possible to use the exact sequence of the features for filtering instead of featureIDs?

Actually, the feature IDs aren’t random, they are generated using a deterministic hashing of the sequence. That means the same sequence, seen in multiple runs, will have the same hashed ID value.


Hi Matthew, thanks for clarifying! Good to know that.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.