ITS data analysis : How to determine which dada2 analysis is the best

YuZhang · January 20, 2021, 3:12am

Dear all,
I have some confuse in the the data2 of ITS data analysis.

My prime is
The primes were removed.
my demux image like this

the score drop site is 168 in the left ,151 in the right
My fist try in data2 is

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 160
--p-trunc-len-r 140
--p-n-threads 40
--o-representative-sequences rep-seq-dada2.qza
--o-table table-dada2.qza
--o-denoising-stats stats-dada2.qza

the result is ：

I think the result is not good,because Ascomycota occupied too much .
Then I tried second time used the single end.

qiime dada2 denoise-single
--i-demultiplexed-seqs demux.qza
--p-trim-left 0
--p-trunc-len 162
--o-representative-sequences dada2-single-end-rep-seqs.qza
--o-table dada2-single-end-table.qza
--o-denoising-stats dada2-single-end-stats.qza
qiime metadata tabulate
--m-input-file dada2-single-end-stats.qza
--o-visualization dada2-single-end-stats.qzv
Results:

image1366×624 28.9 KB

image1366×624 49.2 KB

And third time retain more length.
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 167
--p-trunc-len-r 150
--p-n-threads 40
--o-representative-sequences rep-seq-dada2-1.qza
--o-table table-dada2-1.qza
--o-denoising-stats stats-dada2-1.qza
Results:

image1366×624 31.3 KB

image1366×624 48.6 KB

Now, Im confused.
I don't think the difference between the three attempts was great.
So, I should select whicn?
In the fact, I have done a lot of analysis used the first result, is it OK?

Yu

ChrisKeefe · January 20, 2021, 5:32pm

@YuZhang, different forum users have widely varying levels of experience. How well do you understand what DADA2 does, and how to use it?

Best,
Chris

YuZhang · January 21, 2021, 12:11am

Because，The amplified fragments length of ITs varied greatly，So Idont know how to select the way of dada2.So I had three times attempts，and I dont know how to determine which is good

ChrisKeefe · January 22, 2021, 12:54am

This is an important consideration with ITS, and it's great that you're thinking about it. How much length variation is there in your sequences?

The impact of these varying lengths is often significant in cases of readthrough (where you capture non-target nucleotides on "the outside" of the primer), and in cases where sequences are kept or dropped unfairly due to length.

Readthrough can be managed by trimming properly with cutadapt. This is discussed at some length here. DADA2 could be involved in dropping sequences, but your DADA2 results show a high rate of sequence recovery, with no prominent bottlenecks where you're losing a lot of reads.

I agree. You seem to be getting similar results from single-end and paired end runs, with slightly different parameters.

You haven't answered my question above, and I'm not entirely clear on what your concern is with these results. Are you just uncertain about how to select DADA2 parameters? Do you think something problematic is happening in DADA2 specifically? Or just in general? If so, what do you think might be causing the unexpected results?

YuZhang · January 22, 2021, 1:45am

Thanks very much.

Sorry,Sir, Im a fresh man. So, I dont know how to check the sequence length variation.

I m worried about the DADA2 parameters I set was wrong, so I get a not credible results.

ChrisKeefe · January 22, 2021, 7:07pm

Learning how DADA2 works

If you haven't already, please spend some time with the DADA2 preprint, this walkthrough of DADA2 for ITS by DADA2's creator, and the ITS tutorial I linked above. There are also many in-depth discussions of how to choose DADA2 parameters on this forum, as well as many great discussions about fungal ITS workflows. The search tool is your friend.

Your approach with ITS data may be very different from your approach with 16s (e.g. you may not want to truncate with ITS). Understanding the tools will help you make good choices in both contexts.

Sequence Lengths

@cherman2 hints at this in your other topic:

Next steps:

Once you have taken some time to learn about DADA2, and have developed more specific questions, please feel free to to post them here.

Here are two questions that may help guide your exploration:

Why do you think your results are bad? (Not why could they be bad, but what evidence makes you think they are bad.)
Why do you suspect DADA2 is at fault?

It takes many steps to produce a taxonomic barplot. Your data has likely been preprocessed in some way, imported, trimmed, denoised, classified using a classifier built on some kind of database, etc. If your results are not what you expected, you will have to consider which of the many steps in the process might be at fault.

system · February 23, 2021, 1:15am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.