I came across to an unusual observation. I have two different data set but controls are common/same. When I analyzed these two data sets separately and checked the mean/median of controls in both, they are different (big difference). Why? i used the same control fastqs for both data set and used the same parameters/criteria. does diversity (alpha/beta) of one sample/group affects others?
I am sure other members can give you a more complete response, but first thing that comes to my mind is that as far as I know denoising is dataset dependent, which means your control samples may end up with different ASVs depending on the context (denoising run), which is causing the different observed diversities that you reported. Therefore, it is possibly not enough to use the same control fastqs and parameters throughout your two pipelines, maybe you should denoise all of your samples together and separate the datasets afterwards. Alternatively, if each dataset represents a different sequencing run, maybe you should run denoising in parallel respecting the grouping of the sequencing runs and merge/separate these samples as you like after denoising.
As @vheidrich suggests above, if you’re denoising multiple sequencing runs with DADA2, you should not merge your data prior to denoising; it can mess with the error model. Deblur is not similarly affected, AFAIK.
If you are using core-metrics, core-metrics-phylogenetic, or any other workflow in which you rarefy your data, you are randomly subsampling without replacement prior to calculating diversity. Depending on a few factors (e.g. sampling depth), variations in outcome based on this random subsampling can be significant. The deeper you sample, the more representative your outcomes will be.
Thanks for your suggestion. I dont have multiple sequencing runs. Now I am thinking of doing denoising all my fastqs. Then separate my samples as two different data set but keep the control in both sets. Do you know how to separate the output of denoised samples?
Thanks for the suggestion. I filtered the table.qza and did the diversity analyses. Still, I got a minor difference between the control from the two data sets.
Now I don’t know what would be the reason for getting different values? e.g. observed features for the same controls are different in two different datasets.
Hi! Sorry for delayed reply.
If you applied a rarefaction to certain sequencing depth before calculation of diversity metrics then minor differences in metrics may be observed since rarefaction is a random subsampling of reads from each sample.
Even repeating the analysis with the same dataset and rarefaction depth may produce a minor differences in diversity metrics.