selecting trim/trunc lengths with dada2

YuZhang · January 20, 2021, 6:10am

Dear all，

Im doing16s data analysis now.I have some questions about DADA2.

The primers I used is
515F

GTGCCAGCMGCCGCGGTAA

907R

CCGTCAATTCCTTTGAGTTT

The insert DNA length is 353 bp.
I used the code below:
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 200
--p-trunc-len-r 200
--p-n-threads 40
--o-representative-sequences rep-seq-dada2-3.qza
--o-table table-dada2-3.qza
--o-denoising-stats stats-dada2-3.qza

Now I finished dada2,

and get represent sequence.

Is that a good result? After all, only 2% of sequences length are smaller than the insert DNA.
But,Im not sure if there is a standard range to represent sequence length?

cherman2 · January 20, 2021, 10:37pm

Hello @YuZhang,
Your data looks pretty good.
You truncated at 200 and then merged the paired-end reads. So that means that you have a sequence length ~ 400. I specified around 400 because there is going to be overlap between the two sequences and that will result in shorter than length 400 but around there.
I would like to take a look at your data though to confirm. Would you be able to upload the rep-seq-dada2-3.qzv, stats-dada2-3.qzv and demux.qzv so that I can take a look?
Chloe

YuZhang · January 21, 2021, 12:23am

Thanks cherman,these are my data you nead.
Could you tell me how to check the data is good through theses images？
and could you help me take a look at my another post? Also relative DADA2.
Thanks again.

rep-seq-dada2-3.qzv (7.1 MB) stats-dada2-3.qzv (1.2 MB) demux.qzv (288.3 KB)

cherman2 · January 21, 2021, 5:43pm

Hello @YuZhang,
First of all your data at this point looks pretty normal.
Let's walk through these visualization to get a better idea of what they are telling you. Sound good?

demux.qzv
When looking at the interactive quality plot, you can analysis the quality of the data. "Good Data" is really hard to quantify but a good rule of thumb is anything around 30 is a pretty good quality score. Why do you want to truncate, in part it is so that dada2 doesn't get lost in the weeds of a lot of not good quality data. If your case there is really no "bad data" (This would be where you see a fall off in the quality of data. You can see this here) so you can truncate at the very end of your sequences length. If you wanted you could truncate at 210 but 200 works just fine! :qiime2:

stats-dada2-3.qzv
This is an overview of the quality control that dada2 did to your data. The short and sweet of it is you don't want to loose too many sequences at one step in the quality control. The steps being the columns at the top. When looking at your data here nothing is out of the ordinary.

rep-seq.qzv
The rep-seq files shows all your features and their sequences. The range you have of sequence lengths seems good to me like I said previously. Most of your data is around 400 in length and looks good.

I know that was lot! I hope that this helps not only with this dataset but in future analysis as well! :qiime2:
Chloe

YuZhang · January 22, 2021, 1:24am

Thanks very much.I don't quite understand this point.

could you explain it for me? or give me an example?

cherman2 · January 22, 2021, 7:13pm

Hello @YuZhang,
Sorry for the confusion. All I was saying there was that the stats-dada2.qzv file is an overview of what happened in the dada2 step. If you look at the stats-dada2.qzv, you will see that the header of the columns is the steps of dada2 ( input, filter, denoise, merge, and nonchimeric). For example if I was looking at a stats-dada2.qzv and I saw that I lost 75% of all my sequences in the non-chimeric step then I would know that a large portion of my data was chimeras and I would know that there was something wrong in the data processing. If you don't see a huge loss for all of your samples at any one step then your data has gone through the quality control step effectively! In your specific case, I don't see anything that would make me think that something is wrong with your data.
Chloe

YuZhang · January 24, 2021, 3:39am

Thanks Chloe! Iknow my dada2 result is good to me. But I want to understand it clearly. Is there an exact number (like 75%) to evaluate this step?

And in rep-seq.qzv . If there is a ITs data used single end data,how I determine is good or right?

cherman2 · January 25, 2021, 3:42pm

Hello @YuZhang,
Like many things in QIIMe2 there is no specific number. All I can clarify for you is that if there is a recognizable pattern of all sequences being thrown out in one dada2 step then you should investigate that and make sure there isn't an issue with the sequencing data.

As for the ITS data, all the rules for how to make sure that the data made it through dada2 are the same.
Hope this helps!
Chloe