Hot Potato in The middle of Analysing

Hello Everybody,

I got confused about DADA2 output. I run the command with this values specially --p-trunc-len 190

qiime dada2 denoise-single
--i-demultiplexed-seqs demux.qza
--p-trunc-len 190
--p-trim-left 35
--p-chimera-method consensus
--p-min-fold-parent-over-abundance 6
--p-n-threads 4
--o-table TableDenoisedLibA.qza
--o-representative-sequences RepresenDenoisedLibA.qza
--o-denoising-stats stateDenoisedLibA.qza
--verbose

but I got the values in --o-representative-sequences


I mean the lengths are 155 nt, while it must be ≥ 190 nt.

I checked the basic statistics of my fastq file. It says the lengths range from 19 to 250. According to the first command's --p-trunc-len parameter, the value is 190. It means, as you know, the shorter read than the 190 removed as described in the tutorial that its link brought below. So, I am wondering why the rest reads existing in the representative file obtained after DADA2 step are 155 nt?

https://docs.qiime2.org/2018.11/plugins/available/dada2/denoise-single/

Screenshot%20from%202019-03-19%2016-22-33

I accidentally realized the problem! it's all Greek to me:worried:

What do you think of it?

Hi @Mehrdad ,

as in the manual you are linking, the ‘-p-trunc-len 190’ parameter has 2 effects:
first, you are telling that any bases after position 190 (in any sequences) is ‘low quality’ and therefore needs to be discarded, so all the reads are shortened to 190 bp using your options;
second, any sequences shorter than 190 have to be discarded.

But you are also using the ‘-p-trim-left 35’. With this you are telling that the initial 35 bases of all the sequences have to be discarded because of low quality (please see the same help as above).

The result is: 190 bp - 35 bp = 155 bp that you see in your representative sequences.

Hope it makes sense now,

Luca

4 Likes

Thanks a lot dear @llenzi

To be flexible, if I want to keep e.g. 240 nt length reads, but remove the reads are shorter than 200 nt reads I need a another parameter called –p-trunc-q INTEGER with this description:
Reads are truncated at the first instance of
a quality score less than or equal to this
value. If the resulting read is then shorter
than trunc_len, it is discarded.
[default: 2]

Am I right?

Hi @Mehrdad,
The ‘-p-trunc-q’ option is used to specify a minimum base quality threshold that will be apply to the sequences. For example, using ‘-p-trunc-q 20’ the sequences will be shortened after the first base with quality value 20 (or less).

If you use ‘-p-trun-len 240’ you already discarding any sequence 239 nt (or less).

Luca

3 Likes

Hello @llenzi

This time the I increased the trunc value, but lost a lot of frequencies:

qiime dada2 denoise-single \

--i-demultiplexed-seqs demux.qza
--p-trunc-len 245
--p-trim-left 35
--p-chimera-method consensus
--p-min-fold-parent-over-abundance 6
--p-n-threads 4
--o-table TableDenoisedLibA.qza
--o-representative-sequences RepresenDenoisedLibA.qza
--o-denoising-stats stateDenoisedLibA.qza
--verbose

The previous time the frequencies were in abundant:

qiime dada2 denoise-single \

--i-demultiplexed-seqs demux.qza
--p-trunc-len 190
--p-trim-left 35
--p-chimera-method consensus
--p-min-fold-parent-over-abundance 6
--p-n-threads 4
--o-table TableDenoisedLibA.qza
--o-representative-sequences RepresenDenoisedLibA.qza
--o-denoising-stats stateDenoisedLibA.qza
--verbose

Now, which one is near to the standard model?
Sequencing platform was HiSeq Illumina.

By the way, if I want to just filter the reads but not to denoise them, what command must be applied? To put it another way, I would like to filter and denoise reads in different values what command should be applied?

@llenzi
Why by decreasing the sample frequency, feature frequency also reduced?
Here feature is ASVs (modern name of OTUs) demonstrating number of ASVs (OTUs) types not number of their frequency if I am not mistake. Please correct me if I am wrong!

Hi @Mehrdad,
(I’m not sure if this question needs to be under a different topic now!)

I agree that your latest denoising statistics does not look great (but I would not trust your first either … ).
What do you mean by ‘standard model’?
It is difficult for me to answer why you get less ASVs, but maybe if you look at the ‘stateDenoisedLibA.qza’ you may have a clue. It may tell you where most of the sequences were discarded, filtering step, denoising or chimera filtering.

It would be useful if you post the quality profile for your reads, if you can. Also, I would ask you the following:
Are you sure you sequenced on HiSeq? Even with demultiplexing you should have a really high number of sequence per sample.
Which region are you amplifying?
Do you have trimmed the PCR primer? If not , is this the reason why you using ‘–p-trim-left 35’?
Why you using the forward reads only at denoising step?

On your second question, if you reduce the total number of sequence per sample you eventually reduce the total count for the identified ASVs. The point is to understand where you lost most of the data.
Also, I would not replace ASVs with OTUs so easily: they are two different concepts.

Luca

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.