core metrics results paradox

Hello, i would like to ask regarding the results of core metrics for biodiversity. example: bray curtis plots. I have samples of two categories.

To explain briefly, i import the samples of one category and analyse until i get the feature table and the representative seqs qza and then merge those qza with the ones of the other category thus making one table.qza and one repseq.qza for all samples. The core metrics results after i create a rooted tree for this merged table/seqs is different from when i import and analyse all samples together. when i analyse all samples together i dont get very good clusters for the categories in bray curtis emperor plot. when i merge the separately produced tables from the two categories and then make the core analysis i get perfect distinct clusters. Why is that? From the information of the commands i dont see that merging two tables keeps them somehow still separate in the qza but it seems from the results that probably even though the tables are merged, somehow they are processed as separate, otherwise i cannot explain the formation of clusters in one case and not in the other case.

Thank you very much for your time

Hello!

Could you provide more information regarding your steps when you process groups separately until merging feature tables?

For example, if you are using different truncation /trimming values in Dada2 for each group, then your ASVs of the same origin can be different in each group. In that case, separation of samples in the emperor plot is technical (strong batch effect) and has nothing to do with biological differences.

Best,

2 Likes

i use the same parameters for trimming for each group so i basically import the samples, run dada and then merge table1, table2, repseq1 repseq2 and then i make the rooted tree and core metrics analysis to make bray curtis qzv

1 Like

That’s strange. Are you sure that you apply absolutely the same parameters for both runs, including the primer removal (if any) step? Every time you run dada2 / core-metrics, results will be slightly different, but not too much.

1 Like

the command i use for dada is

qiime dada2 denoise-paired \

–p-trunc-len-f 298

–p-trunc-len-r 280

–p-min-overlap 12

–p-n-threads 14

–o-table

–o-representative-sequences

–o-denoising-stats

and i use it right after the import without any previous step. I use the same parameters for all groups. When i run it for each sample separately and then merge they are still different but maybe somewhere in between than processing them all together or in two groups. It just troubles me because I didnt expect this step to make such a big difference and i cant decide how to process them. The amplicons are from the standard V3-4 illumina primers protocol and the runs where in Mi seq and Nextseq2000. Can it be that the sequencer makes all that difference? But if so then shouldnt analysing each sample separately and then merge erase any platform errors?

Can you post this two .qzv files? I think a direct comparison would be helpful!

1 Like

Yeah, I really do see that split vs overlap you spoke about.

Can you upload the full .qzv files too? That will let us view the full provenance of the data too! Only share what you are able; for example, you might be able to post the .qzv files but not other parts of patient metadata.

2 Likes

Could you also compare the number of reads retained for each sample in a sequencing run after joined dada2 and after denoising each run separately?

It is a very long shot from my side, but it also can be, that when you join different sequencing runs (especially Mi seq and Nextseq!), the error model gets biased towards one of the runs and it leads to a stricter filtering of ASVs from one of the sequencing runs, so all “survived“ ASVs are more similar to the dominating run, so the separation on the PCoA plots is weaker.

In any case, it is recommended to denoise each sequencing run separately with identical settings before merging the datasets.

2 Likes

It is true that the two groups composition was of different runs for 9/10 of the samples of each group, I will stick to denoising each run separately. Thank you very much!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.