Get different number of sequences in denoising-stats and feature-table sums.

Nayeli_Luis_Vargas · August 24, 2022, 3:55pm

I did an analysis about microbiome in cancer vs normal tissue. I performed the analysis with QIIME2 and then wit R. And I have a question in rarefaction,

First: After a denoising I got the table stats.tsv and it contains de number of non-chimeric sequences by sample.

Here an example:

sample-id	input	filtered	percentage of input passed filter	denoised	merged	percentage of input merged	non-chimeric
NT01	210543	28876	13.72	22526	15749	7.48	9172
NT02	212269	38087	17.94	33803	28788	13.56	8286
NT03	218688	63771	29.16	58613	50601	23.14	12628

In this experiment the minimum number of non-chimeric sequences was 2319, so I performed my rarefaction curve with the plugin qiime diversity alpha-rarefaction, with this command:

qiime diversity alpha-rarefaction \
  --i-table table.qza \
  --i-phylogeny rooted-tree.qza \
  --p-max-depth 2319 \
  --m-metadata-file metadata.tsv \
  --o-visualization alpha-rarefaction.qzv

And I got my rarefaction and I was happy.

Then, I import my data to R in order to use phyloseq, so I constructed my phyloseq object with my metadata, the taxonomy that I got in taxonomy.qza, the feature table that I got in table.qza and my phylogenetic tree. But when I sum the columns of feature-table I didn't get the same number of sequences than in the denoising-stats, actually it is very different. Obviusly, when I performed the rarefaction with rarefy_even depth it's also very different.

So.. the question is: Why I get different number of sequences per sample? Shouldn't it be the same result if you sum the feature-table columns (they are the samples) and the data in the column non-chimeric in the denoising-stats table?

Thanks in advance.