Hi @SoilRotifer
Thanks for the continued help
I always use cutadapt, as I think it is a nice form of quality control. I figure that if I am unable to find the primer in the sequence... then what else is wrong with the sequence?
Yes it does make sense to use cutadapt
over hardcoding cut-offs with dada2
.
Regarding each of these:
Were all samples collected / stored similarly?
Was the DNA extracted from all samples similarly?
Was the same sequencing preparation and sequencing facility used?
Were the samples from these randomized across the different runs? If not, run-to-run biases can inflate differences, especially the various sample types / treatment groups were run separately on their own run.
- Yes, to be the best of our ability this was the case given we collect samples in (very) rural Ivory Coast.
- Yes this was all do identically.
- Again all identical.
- I did not randomize across runs. However some groups of samples did by chance cross over runs. To this point when looking ordination plots of the data we do see strong (expected) biological clustering:
I believe the ordination plots can still be trusted as even if there is, for example, a feature split 20 times due to potentially differing lengths of something, that group of features will still then be present in the same biological samples if that makes sense?
Looking at the forum, notably here I see that I can use q2-vsearch
to mimic the function of DADA2's collapseNoMismatch()
which might alleviate, if not perfectly, some of this issue?
EDIT: I just tried this clustering and it barely reduces the feature count and the post decontam
feature count it actually ever so slightly higher. So it would seem it is not a DADA2 thing?
One other option is to cluster at 99% or by genus from the taxa data?