dada2 feature count not same

Yin_Hui_Cheok · May 25, 2020, 6:58am

Hi guys!

I had 16s bacteria DNA extracted from plants stem and leaf. When I tried out many things in QIIME2, I found out a weird situation. As shown in the screenshots below, my samples displayed a different number of feature counts when they are uploaded to QIIME2 environment in different folders.

At first, I imported all FASTQ files in one folder, for a particular sample, BOKTC1.6, the result feature count in table.qzv files is 32122 features counted. Later on I was ordered to analyse my samples separately (only stem bacteria etc.), and this time i got 32043 features instead. And 32068 in other circumstance.
I use the same qiime command when I run dada2 for all these trials which is:
time qiime dada2 denoise-single --i-demultiplexed-seqs lfs1.qza --p-trim-left 0 --p-trunc-len 400 --o-representative-sequences rep-seqs-dada2.qza --o-table table-dada2.qza --o-denoising-stats stats-dada2.qza

So, my question here is:

what caused the differences in different feature count here? Didn't DADA2 give back the same amount of features from the same sequences given?
will this affect the following downstream analysis?

Thanks in advance. I hope I expressed my doubts clearly.

Mehrbod_Estaki · May 27, 2020, 5:54am

Hi @Yin_Hui_Cheok,

It is important to note that these differences are in the total # of reads/sample not unique # of features you have per sample. Also the differences are super small. But why are they happening? Well when DADA2 is training its error model it takes the first 1000000 (the default --p-n-reads-learn) reads and trains its model based on that. So if you remove some samples from your analysis, then you those reads will be drawn from different samples, thus the model being used to denoise sequences is slightly different. This is causing the slight differences you are observing.

Not at all. Especially when the differences are so small and your total reads/sample are quite high.

One additional note, you don't need to denoise those samples separately unless they come from from a different sequencing run/PCR. It would be just as appropriate to denoise them all together since they would share the same error profile (again if they came from the same run) and then filter them afterward based on your metadata of choice.

Hope that makes sense.

Yin_Hui_Cheok · May 27, 2020, 6:10am

@Merhrod_Estaki

Thank you for the clarification on the matter. Have a nice day!

system · June 27, 2020, 12:12pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.