Inconsistent results

Hello,

I have been using qiime2-2022.2
I have observed a huge inconsistency in the results I have obtained from qiime eventhough I used the same samples and the same codes both the times. My feature table varied greatly. The abundances indicated varied greatly between the two output files. Can I know why this happed?

Thank you in advance!
Brigitta

Hi @Brigitta1,

Can you explain the analytical steps you used to generate the results? There are a handful of places in microbiome processing that have a random component. If you're willing to share the final artifacts you're comparing, that can be super helpful. Otherwise, could you share your scripts?

Thanks,
Justine

3 Likes

The codes I entered are as follows:

qiime tools import **
--type 'SampleData[SequencesWithQuality]' **
--input-path /home///manifestfile1.txt **
--output-path single-end-demux.qza **
--input-format SingleEndFastqManifestPhred33V2

qiime demux summarize **
--i-data single-end-demux.qza **
--o-visualization single-end-demux.qzv

qiime cutadapt trim-single
--i-demultiplexed-sequences single-end-demux.qza
--p-adapter CCATCTCATCCCTGCGTGTCTCCGACTCAG
--p-front CCTCTCTATGGGCAGTCGGTGATGTGCCAGCMGCCGCGGTAA
--p-error-rate 0.1
--o-trimmed-sequences trimmed-seqs.qza
--verbose

qiime demux summarize
--i-data trimmed-seqs.qza
--o-visualization trimmed-seqs.qzv

qiime dada2 denoise-pyro **
--i-demultiplexed-seqs trimmed-seqs.qza **
--p-trim-left 15 **
--p-trunc-len 240 **
--o-representative-sequences rep-seqs-dada2.qza **
--o-table table-dada2.qza **
--o-denoising-stats stats-dada2.qza

qiime metadata tabulate **
--m-input-file stats-dada2.qza **
--o-visualization stats-dada2.qzv

mv rep-seqs-dada2.qza rep-seqs.qza
mv table-dada2.qza table.qza

qiime feature-table summarize **
--i-table table.qza **
--o-visualization table.qzv

qiime feature-table tabulate-seqs **
--i-data rep-seqs.qza **
--o-visualization rep-seqs.qzv

qiime feature-classifier classify-sklearn **
--i-classifier gg-13-8-99-515-806-nb-classifier.qza **
--i-reads rep-seqs.qza **
--o-classification taxonomy.qza

qiime metadata tabulate **
--m-input-file taxonomy.qza **
--o-visualization taxonomy.qzv

qiime taxa barplot **
--i-table table.qza **
--i-taxonomy taxonomy.qza **
--o-visualization taxa-bar-plots.qzv

qiime tools export
--input-path table.qza
--output-path table

biom convert
--to-tsv
-i feature-table.biom
-o feature-table.tsv

I ran these codes twice for the same set of samples. The "feature-table.tsv" I obtained is not the same.

Hi @Brigitta1,

DADA2 has stochastic learning step, where it takes a subset of your total sequences and uses this to learn error rates. So, it's possible that you may have slightly different denoising results if you re-runt he data. If you're not sure whether this alters the relationship between y our samples, you might use a procrustes analysis to compare the two datasets. I would also expect your abundant taxa to be more similar, maybe exclude things with fewer than 10 counts per person?

I also noticed that it looks like you're denosing for pyrosequencing but using the EMP classifer, which is designed for a specific illumina primer pair. (And not the primers you're using!) You might get better results wit a region specific classifier.

Best,
Justine

1 Like