I have recently merged my denoised data, and want to share some thoughts to look for other experiences. I am analyzing fungal data from human samples. Samples were sequenced on three runs using Illumina Hiseq. Sample collection was done consecutively, and there should be no obvious biological difference between the three runs.
Data was denoised using
qiime dada2 for each run separately. Then,
qiime feature-table merge was used to merge data from the three runs. Number of features/ASVs from the three runs are as followed:
Run 1: 616. Run 2: 869. Run 3: 105.
I would expect that most of the features/ASVs are shared between the runs, and that the merged sum should be something close to 8-900. However, the sums of features/ASVs after merging is 1369, almost equal to the total sum. I don´t like that. Or is it expected?
Furthermore, we filtered low-abundance features using
qiime feature-table filter-features \ --p-min-samples 2 \ --p-min-frequency 10
and we were down to 235 features. Let´s think of a Venn-diagram using the three runs. In an ideal world, I would expect the common area of the diagram to consist of those features that are shared between the runs (or at least run 1 and 2 in this case due to the low number in batch 3). Is there any way I can check this?