A larger sample size leads to fatal error when analysing with QIIME

1. The arising of the problem

I have control and experiment groups in my study. In my preliminary experiment (60 samples), I found several microbes has significant difference in the two groups (non-parametric t test).

However, after I sequenced more samples (240 samples) and analysed these samples together with the previous samples, which means that I analysed 300 samples one time by QIIME. I found no difference between the two groups, even I picked the preliminary experiment samples for non-parametric t test.

2. Different microbiota communities of the same sample in two runs

I was wondered why no differences could be detected after I have more sample. I have doubt about the stability of results from QIIME, so I compared the results from the previous two runs of QIIME (60 samples and 300 samples). I found some samples have totally different microbiota communities in previous and latter QIIME analyses (the same sample, different results).

(In the above figure, I selected several samples with obvious difference. q2 means QIIME2, latter means the 300 samples analysis, the color indicates the Pearson correlation coefficient between samples)

So I selected the previous 60 samples in preliminary experiment and combined with other 40 samples (random selected from the 240 samples) and re-run the QIIME2 analysis. When I compared the result with the preliminary analysis, I found this time the two runs almost the same, the Pearson correlation coefficients of all the 60 samples between the preliminary analysis and this time were larger than 0.99.

(in above figure, samples with no suffix came from preliminary analysis and suffix z means the 100 samples analysis)

So I guessed that if we analyse too many samples one time, some unknown factors may affected the analyzing process and resulted in fatal error. However, I don't know why I have this error and what's the sample size threshold for analyzing.

3. Which microbe leads to the fatal error

3.1 QIIME1.9 vs QIIME2

I repeated the above analysis in QIIME1.9 and found QIIME 1.9 have the same problem. If I analysed 300 samples by QIIME1.9 in one batch, the serious deviation also appeared in some samples when compared with 60 samples in one batch.

3.2 which microbe leads to the error

I selected 7 samples with the most serious deviation merged the results from different batches, I found that k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Faecalibacterium and k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__ have the largest differences among batches.

I attached the merged file here. In this file, q1 means analysis by QIIME1.9, q2 means analysis by QIIME2, previous means 60 samples one batch, latter means 300 samples one batch.
merge.txt (25.6 KB)

Anyone have the same error? Any help would be appreciated. Thank you!


Hi @Yujun! Thanks for providing such a detailed post.

Since you’re seeing the same issue in QIIME 1.9 vs QIIME 2, let’s focus on solving your issues in QIIME 2 here (since this is the QIIME 2 forum and it’ll help simplify the problem for us).

To help you we’ll need some additional info:

  1. What version of QIIME 2 are you using? You can get this info by running qiime info.

  2. What are the exact commands you’re running? Please include commands you ran for the small and larger datasets, as well as how you’re comparing them.

  3. Were the 60 samples resequenced along with the 240 samples, or are the 60 samples and 240 samples from two different sequencing runs? If it was the latter case, you may be seeing a “batch/run effect” that is stronger than the control-vs-experiment differences you’re seeing within a batch.

Hi, @jairideout! OK, let's focus on QIIME2.

My QIIME2 version is 2017.2.0.


The exact commands I use in QIIME2 were:

qiime tools import --input-path ./01.data --type SampleData[SequencesWithQuality] --output-path demux.qza
qiime dada2 denoise-single --i-demultiplexed-seqs demux.qza --p-trim-left 0 --p-trunc-len 100 --o-representative-sequences rep-seqs.qza --o-table table.qza --verbose
qiime feature-classifier classify --i-classifier ../97_classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza
qiime taxa collapse --i-table table.qza --i-taxonomy taxonomy.qza --p-level 2 --o-collapsed-table table-l2.qza
qiime tools export table-l2.qza --output-dir table-l2
biom convert --to-tsv -i ./table-l2/feature-table.biom -o ./table-l2/feature-table.tsv --table-type "Taxon table"
qiime taxa collapse --i-table table.qza --i-taxonomy taxonomy.qza --p-level 6 --o-collapsed-table table-l6.qza
qiime tools export table-l6.qza --output-dir table-l6
biom convert --to-tsv -i ./table-l6/feature-table.biom -o ./table-l6/feature-table.tsv --table-type "Taxon table"

I compared the level 6 taxon table in R, the R code was:

name <- c("OTU", "ZY006", "ZY055", "2004",  "2017",  "2036",  "2049" )
d <- read.table('./l6_q2_z7.tsv', header = T, sep = '\t', check.names = F)
#select the most unstable samples
d_sel <- d[, colnames(d) %in% name]
colnames(d_sel) <- paste0(colnames(d_sel), c('', rep('_q2', 6)))

e <- read.table('./feature-table_stand.tsv', header = T, sep = '\t', check.names = F)
#select the most unstable samples
e_sel <- e[, colnames(e) %in% name]
colnames(e_sel) <- paste0(colnames(e_sel), c('', rep('_q2_latter', 6)))
comb <- merge(d_sel, e_sel, by = 'OTU', all = T)
comb <- na.omit(comb)
comb_t <- comb[order(apply(comb[-1], 1, sd), decreasing = T),]
write.table( comb_t,'merge.txt', quote = F, sep = '\t', row.names = F)


The 60 samples and 240 samples were from two independent sequencing runs. So, I think you are right, the 'batch/run effect' may be the prime reason that no significance differences could be detect between the two groups. THANK YOU! Now, I am focus on the error that several samples have different taxon table from two QIIME2 analysis (60 samples and 300 samples). A larger sample size leads to more reads to analysis one time, I think larger number of reads may introduce some mistakes in OTU clustering. So, I am going to run more analysis with different sample size (like 150, 200 samples). Meantime, I trained the 97% classifier myself from the data I download from Greengene. The classifier may be unreliable. I will re-train a classifier based on SILVA database and then do the same compare.
Thank you very much!


Thanks for the info @Yujun!

How did you combine your 60 and 240 samples to produce the full 300 sample dataset? If you put all of the samples into the 01.data directory and imported them as a single .qza, you may receive different results from dada2 than if you had denoised each sequencing run separately and combined the feature tables and representative sequences afterwards. dada2 will perform best when run independently on each sequencing run, so I recommend trying that approach. The FMT tutorial shows how to combine multiple sequencing runs using qiime feature-table merge and qiime feature-table merge-seq-data.

After merging your data you can perform taxonomy classification and compare the results of the 300 sample dataset to the 60 sample dataset. You’ll still need to be wary of run/batch effect when interpreting your results.

Let us know how it goes!

1 Like

Thank you very much! @jairideout

As you referred, I just put all my samples into one directory and imported them. Maybe that’s the reason why the difference came out. I will try your method, run independently on each sample. Thank you very much!

1 Like

You should run dada2 independently on each sequencing run (which will contain multiple samples). You don’t need (and aren’t recommended) to run dada2 on each sample independently.

OK,Understand. Thank you.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.