Trying to understand dada2 in qiime2-2022.2 with my data

ag1170 · August 23, 2023, 8:10pm

I am new to qiime2 and trying to work through some of my samples for practice. I have 27 samples from tree root nodules that I am trying to analyze for alpha and beta diversity as well as play around with some other analyses tools offered in qiime. I have the sequences uploaded into qiime2, but I cannot seem to get the right dada2 parameters to move past this denoising step. I have been reading many posts in the qiime2forum about demux.qzv results and denoising parameters but I feel like I'm still not understanding when it comes to my data.

The demux.qzv file is:
demux.qzv (309.2 KB)

You'll see that several of the samples have <1000 sequence counts, with the lowest being 48 (I don't know why this is so low since I gave the sequencing center double the DNA concentration requirement).

My understanding is that these sequences did have the adapters trimmed, which may explain why the quality plot in the demux file may look the way it does?

I have tried the following dada2 parameters:

qiime dada2 denoise-paired --i-demultiplexed-seqs demux.qza --p-trim-left-f 20 --p-trim-left-r 20 --p-trunc-len-f 250 --p-trunc-len-r 249 --o-table dada5table.qza --o-representative-sequences dada5rep-seqs.qza --o-denoising-stats dada5denoising-stats.qza

dada5table.qzv (452.8 KB)

qiime dada2 denoise-paired --i-demultiplexed-seqs demux.qza --p-trunc-len-f 251 --p-trunc-len-r 251 --o-table dada6table.qza --o-representative-sequences dada6rep-seqs.qza --o-denoising-stats dada6denoising-stats.qza

dada6table.qzv (431.7 KB)

qiime dada2 denoise-paired --i-demultiplexed-seqs demux.qza --p-trim-left-f 10 --p-trim-left-r 10 --p-trunc-len-f 170 --p-trunc-len-r 190 --o-table dada4table.qza --o-representative-sequences dada4rep-seqs.qza --o-denoising-stats dada4denoising-stats.qza

dada4table.qzv (402.7 KB)

As you can tell, the least truncation removes the most features (dada4table), yet no truncation or trimming still gives about 447 less features (dada6table) than trimming 20bp with little truncation (dada5table). I don't understand these descrepancies, nor if my data is even usable since an acceptable sampling depth seems unattainable with even the best dada2 results.
Any help or direction is greatly appreciated.

colinvwood · August 24, 2023, 7:56pm

Hello @ag1170,

You'll see that several of the samples have <1000 sequence counts, with the lowest being 48 (I don't know why this is so low since I gave the sequencing center double the DNA concentration requirement).

Unfortunately there is nothing you can do about this now. During library prep did you pool all samples at equal concentration? You will probably have to drop the lower read samples later when you choose sampling depth.

My understanding is that these sequences did have the adapters trimmed, which may explain why the quality plot in the demux file may look the way it does?

If you're referring to the boxy look of these quality plots, that's not because of adapter trimming but the sequencing technology which in your case output binned quality scores.

In fact from looking at you read length distribution, I don't think any trimming has been preformed. Was your sequencing run 2 x 250bp? You will want to trim your primers at some point. I would recommend trimming your primers by sequence not by length using qiime cutadapt trim-paired. Once you have that figured out you can re-visualize your demux and choose dada2 parameters.

ag1170 · September 6, 2023, 7:56pm

thanks for the reply. What confuses me is when I check the fastqc report for the untrimmed, "raw" fastq.gz files I received from the sequencing center, there are no adapters being shown in the fastqc report, nor can I find any adapters in the fastq.gz files when searching manually. Do you know why this is?

colinvwood · September 6, 2023, 8:29pm

Hello @ag1170,

Depending on library prep and the 16S region you amplified, you may not expect to see any adapters at all. Or you may be searching for the wrong sequences. Primers are a different story and you should be finding those if your sequences are truly untrimmed.

system · October 8, 2023, 2:29am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.