I have control and experiment groups in my study. In my preliminary experiment (60 samples), I found several microbes has significant difference in the two groups (non-parametric t test).
However, after I sequenced more samples (240 samples) and analysed these samples together with the previous samples, which means that I analysed 300 samples one time by QIIME. I found no difference between the two groups, even I picked the preliminary experiment samples for non-parametric t test.
I was wondered why no differences could be detected after I have more sample. I have doubt about the stability of results from QIIME, so I compared the results from the previous two runs of QIIME (60 samples and 300 samples). I found some samples have totally different microbiota communities in previous and latter QIIME analyses (the same sample, different results).
(In the above figure, I selected several samples with obvious difference.
q2 means QIIME2,
latter means the 300 samples analysis, the color indicates the Pearson correlation coefficient between samples)
So I selected the previous 60 samples in preliminary experiment and combined with other 40 samples (random selected from the 240 samples) and re-run the QIIME2 analysis. When I compared the result with the preliminary analysis, I found this time the two runs almost the same, the Pearson correlation coefficients of all the 60 samples between the preliminary analysis and this time were larger than 0.99.
(in above figure, samples with no suffix came from preliminary analysis and suffix
z means the 100 samples analysis)
So I guessed that if we analyse too many samples one time, some unknown factors may affected the analyzing process and resulted in fatal error. However, I don't know why I have this error and what's the sample size threshold for analyzing.
I repeated the above analysis in QIIME1.9 and found QIIME 1.9 have the same problem. If I analysed 300 samples by QIIME1.9 in one batch, the serious deviation also appeared in some samples when compared with 60 samples in one batch.
I selected 7 samples with the most serious deviation merged the results from different batches, I found that
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__ have the largest differences among batches.
I attached the merged file here. In this file,
q1 means analysis by QIIME1.9,
q2 means analysis by QIIME2,
previous means 60 samples one batch,
latter means 300 samples one batch.
merge.txt (25.6 KB)
Anyone have the same error? Any help would be appreciated. Thank you!