Hi @timpiel, welcome to the QIIME 2 community!
I will start with few basic observations about mock communities:
- we almost never see perfect replication of expected results!
- exogenous contamination (e.g., in reagents, library preps) and cross-contamination (e.g., from other samples in your sequencing run) are common issues leading to observation 1
- Index hopping and other technical errors can also lead to spurious detection of false-positives in your samples
- Primer bias and other issues can seriously skew the expected relative abundances (which fortunately does not seem to be an issue here).
Your data actually look pretty good in that you have TDR=1.0 (i.e., 100% recovery of all expected organisms) and pretty good R2 values at level 6 (indicating that at the genus level the abundances of your 10 mock community members are more or less observed at the expected levels).
So the problem is, as you say, a long tail of low-abundance species.
I would bet cross-contamination and index hopping are the main causes in your case; possibly also some library prep/reagent contamination.
No! This is very likely from cross-contamination and index hopping, so the false-positives you see are organisms from your real samples. Filtering these would severely skew your real samples.
QIIME 2 does not have any methods currently implemented to handle index hopping, but there are some methods out there, e.g., in R, that attempt to discover index hoppers — give those a look! We have a nice discussion of some of these on the forum:
You could also try the R package decontam (not yet implemented in QIIME 2, but will be one day soon). This will help identify any exogenous contaminants, though not cross-contaminants.
I am very glad to see that you are using q2-quality-control for this... this will give you a good comparison to demonstrate magnitude of improvement after applying other filtering techniques.
Give those a spin and please do share your results!