That’s actually a good question, I’m not sure what those negative spots are, they may just be there for aesthetics to make the graphs easier to read but they don’t have any boxes because they are not true positions so have no quality scores. Until we get clarification what those are let’s just ignore them for now. As for the few spots where there is no ‘blue box’ but there is a line, that’s just because all the quality scores at that point are the same value so there is no variance in the ‘boxplot’.
As for your inquiry about sampling depth, the choice of picking an appropriate sampling depth really depends on your data and question being asked. See this topic discussed here and here for example but I will say that all 3 scenarios you proposed for sampling depth above are too low.
You are right that the ideal situation is to retain the highest value of sampling depth while including all your samples but you will not have a true representation of a community with 160 sequences. How do the rest of your samples look as far as number of retained sequences after dada2? Are they all pretty low or only a few of them are that low? I suspect perhaps you are losing much of your reads after dada2 because they are failing to merge. What is your target region and what is the expected overlap region of your primer pair?
Are you talking about the negative values on the x-axis? If so, that can be ignored, it shows up when you zoom on an ROI on the far left side of the plot, rest assured though, there are no negative positions!
(Matthew Ryan Dillon)