Concern regarding merged denoisied data


(Einar Marius Hjellestad Martinsen) #1

Hi everyone!

I have recently merged my denoised data, and want to share some thoughts to look for other experiences. I am analyzing fungal data from human samples. Samples were sequenced on three runs using Illumina Hiseq. Sample collection was done consecutively, and there should be no obvious biological difference between the three runs.
Data was denoised using qiime dada2 for each run separately. Then, qiime feature-table merge was used to merge data from the three runs. Number of features/ASVs from the three runs are as followed:
Run 1: 616. Run 2: 869. Run 3: 105.

I would expect that most of the features/ASVs are shared between the runs, and that the merged sum should be something close to 8-900. However, the sums of features/ASVs after merging is 1369, almost equal to the total sum. I don´t like that. Or is it expected?

Furthermore, we filtered low-abundance features using
qiime feature-table filter-features \ --p-min-samples 2 \ --p-min-frequency 10

and we were down to 235 features. Let´s think of a Venn-diagram using the three runs. In an ideal world, I would expect the common area of the diagram to consist of those features that are shared between the runs (or at least run 1 and 2 in this case due to the low number in batch 3). Is there any way I can check this?


(Justine) #2

Hi @einamart,

I’d recommend using your favorite distance metric (my perfered metrics and unweighted UniFrac, weighted UniFrac and Bray-Curtis, with ITS you might want to use Jaccard distance as your unweighted metric) and look in PCoA space to see if you have separation between runs. Unfortunately, batch effects are a ThingTM in microbiome research, and far more common that you’d expect.

Best,
Justine