I've been trying a range of
--p-trunc-len-r parameters in
dada2 denoise. (I wrote a handy bash script to run the step repeatedly and record the resulting numbers of features and read counts.) As expected different values produce different numbers of features and reads passing the filter. There are areas of parameter space that are clearly bad (i.e. just a handful of features), but I'm less certain about what criteria would be best to optimize this step.
Different parameters optimize feature/OTU number versus the number of reads passing the filter. Which would be better to proceed with in the analysis? In my case, I'm less interested in rare features than in getting an accurate estimate of differences among sample groups. I'd be interested in hearing people's thoughts on the best strategy.