DADA2 ,trunc-len-r /f

Hello everyone,
I'm processing paired reads 16s of V3-V4 with DADA2 and I have some question about the -p-trunc-len-f and -p-trunc-len -r.
Here is the quality plot.image

First, I' ve watched a viedo about DADA2 denosing posted by QIIME2. In this vedio, the teacher suggested we can choose the region where the 25th quality score below than 30 to truncate. Given that, I set my parameters as follows:
qiime dada2 denoise-paired
--i-demultiplexed-seqs 16s_paired_end_primer.qza
--p-trunc-len-f 234
--p-trunc-len-r 172
--o-table table_16s_primer.qza
--o-representative-sequences rep_seqs_16s_primer.qza
--o-denoising-stats stats_16s_primer.qza
Here is the feature table summary of -p-trunc-len-f 234 and -p-trunc-len-f 172.image
But it seems like the number of features is too low. Is there something mistake? Meanwhile, I change the parameters to -p-trunc-len-f 0 and -p-trunc-len-f 278, the result seems to be better, but I don't truely understand the meaning of -p-trunc-len-f 0. Is there setting 0 avaliable and meaningful?
Here is the feature table of -p-trunc-len-f 0 and -p-trunc-len-f 278c68ccf00d8f4928bcd933aa322f9a9e

Second, How I judge whether the parameters are good or not in DADA2? Shoud I focus on the number of features, the percentage of passed input filter, or something others?
Thanks for the support!

Hi @yingying_qiu,
Welcome to the :qiime2: forum!

These definitely are low. There isn't a mistake, but it seems like your parameters might need modifying so you are not losing so many reads! Can you upload the dada2-stats for these 2 different dada2 runs? They will tell us all the juicy details of why sequences are getting filtered out! This will help us modify your command so that you can get more reads!

Yes! -p-trunc-len-f 0 means that you are not trimming your sequences at all and this seemed to give you better results!

Lets look at your stats so we can see how many are passing each filtering step! My general rule of thumb is that if one step (i.e. one column in the dada2-stats.qzv) loses more then 50% of reads for every sample, then something definitely went wrong! Otherwise its about optimizing your command by looking at your dada2-stats and seeing if you can tweak parameters to get more sequences to pass.

Some examples:

  1. If your losing a lot of sequences in filtering: its probably because you trim too late in the sequence and sequences are being filtered out because of the bad quality at the end.

  2. if you are losing a lot of sequences in merging: its because your sequences aren't long enough to overlap and get merged. That sometimes can be fixed by increasing the trunc value (or setting it to zero).

I hope that helps!
:turtle:

2 Likes

Thank you very much for your reply! It helps me a lot.

Does trunc mean to cut the end of sequnece and trim mean to cut the head of sequence? So -p-trunc-len-f 0 is equal to -p-trunc-len-f 300, is that right?

Thanks again! I changed my parameters again to -p-trunc-len-f 0 and -p-trunc-len-r 194 to get a overlap of 30 reads (300+194-(805-341)=30), and I got the result as below.



I think the result is much better than last two results. So I'd like use the parameters. If you have better ideas, please feel free to tell me. Thank you very much.

Hi @yingying_qiu

Exactly!

These results do look better. I think you are good to keep going! I would however check that dada2-stats file you get out of dada2. It is a great place to check to see how many sequences are making it through filtering.

Thanks for your clear explaination.
But I still have a question, would you explain this to me please? When I view the QIIME2 docs, there is a specific introduction with DADA2 plugin: denoise-paired: Denoise and dereplicate paired-end sequences.

"Position at which reverse read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles.Reads that are shorter than this value will be discarded.After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed."
What "Reads that are shorter than this value will be discarded." means? "This value" refers to the parameter or quality socres ?Thank you very much.
Best wishes,
Qiu

Also,I am afraid that this statement is false.Because I run --p-trunc-len-f 300
--p-trunc-len-r 194 \ , and the result is different from --p-trunc-len-f 0
--p-trunc-len-r 194 \ . Why this happen? :sob: :sob:

These are the results of parameter 300.


These are the results of parameter 0.


Hi @yingying_qiu

Could you explain your question again? I am not sure I understand

It looks like 0 is significantly better then 300. What length are your sequences? could you upload a demux interactive quality score plot for me to look at?

Thanks for your reply. I have addressed the last question that I asked you. I am sorry that I can't offer the detail file,'cause the data have not been published. I am so sorry about that. I will provide some pictures here about the interactive quality score.




I think maybe the different results due to the parameter of 300, because at the position 300,there are still sequences, so maybe I should set 301. This run is going on, and I will get the result later.
Best wishes,
Qiu.

No problem! Your screenshots are more then enough!

Looking at this picture, some of your reads are 283 nts long, and some are 300. When you chose a trunc value of 300 all the sequences that are shorter then 300nt are thrown out. When you chose a trunc value of 0, it does not trim which allows you to keep the 278nt, 283nt and the 300 nt sequences!

I would set your trunc value to 0, and keep moving forward with your analysis!

I got that! Thank you very much,my friend. I will keep moving forward with my analysis.
Best wishes,
Qiu.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.