ITS (Fungi) DADA2 question -Not all sequences were the same length

SarahH · January 15, 2019, 12:47pm

Hi, I just wanted to double check that it is ok to use reads of different lengths for ITS in DADA2. I get the output:

Learning Error Rates
Not all sequences were the same length.

But it seems to output anyway. I just wanted to check that they will have indeed learnt ‘Error Rates’ along the way.

I used:

qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end.qza --o-representative-sequences rep-seqs-dada2.qza --o-table table-dada2.qza --o-denoising-stats stats-dada2.qza --verbose --p-n-threads 0 --p-trunc-len-f 0 --p-trunc-len-r 0 --p-trunc-q 10

I used --p-trunc-q 10 so that reads would be trimmed at the first fairly poor quality base, but there would still be enough errors for it to learn from. I can’t truncate the reads to the same length due to the variable size of the amplicon (~200-400 bp). I got a good amount of reads using this method, although I may try varying the q value. Do you think this is a reasonable approach. I thought I may try ITSexpress also.

Many thanks

Nicholas_Bokulich · January 18, 2019, 1:09pm

Hi @SarahH,

Yes! That is okay — you can ignore that warning message (it is only a warning, after all — dada2 would fail if it could not learn the error rates).

Yes, that is usually what I do with ITS (use trunc-q). You will need to make sure you are getting appropriate yields after joining/merging. Check the stats file and look out for samples that lose lots of reads at the merging stage. If you are losing many reads there, it is probably a sign that you are losing longer reads,skewing your results. If you get reasonable merge yields, you should be okay to proceed. If you do not, consider adjusting trimming parameters or using only forward reads in your analysis.

Definitely give it a try... you would just use ITSxpress to trim away non-ITS sequence, and then input that to dada2.

Good luck!

SarahH · January 18, 2019, 4:36pm

Thanks Nicholas! Is there or will there be a minlen option on DADA2 like on the R script so that for ITS where we are trimming based on quality we can still remove spurious sequences say under 100 bp?
Cheers

Nicholas_Bokulich · January 18, 2019, 4:51pm

Not currently! We do have an open issue tracking this feature request.

system · February 18, 2019, 10:51pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.