Using dada2 denoise-pyro plug in

Hello,

I have received some fastq files from an ion torrent sequencing machine. I used the dada2 denoise-pyro plug in as mentioned in the QIIME 2 website. The command I used is as follows:
qiime dada2 denoise-pyro **
--i-demultiplexed-seqs single-end-demux.qza **
--p-trim-left 33 **
--p-trunc-len 825 **
--o-representative-sequences rep-seqs-dada2.qza **
--o-table table-dada2.qza **

The parameter --p-trunc-len 825 was chosen because the quality score of the 50th percentile drecreased below 27 beyond the 825th position. However, when the command was run I got an error saying that no sequences passed the filter even though my interactive qualitative plot stretched from 0 to 1800 sequences bases. Then I tried changing the parameters. the command stated below was used next.

qiime dada2 denoise-pyro **
--i-demultiplexed-seqs single-end-demux.qza **
--p-trim-left 15 **
--p-trunc-len 240 **
--o-representative-sequences rep-seqs-dada2.qza **
--o-table table-dada2.qza **

When this command was used, all samples passed through the filter. However, when the downstream analysis was carried out and an OTU table was generated using the command stated below, very few OTUs were detected (<100).
qiime tools export
--input-path table.qza
--output-path table
cd table
biom convert
--to-tsv
-i feature-table.biom
-o feature-table.tsv

I doubt these results because when a phylogeny tree created using the Galaxy tool showed the presence of 400-1000 different species in the sample.

before the denoising step was done, I used the cutadapt trim-single plug in to remove the adapters and primers in the sequences. The command I used during this step is stated below;
qiime cutadapt trim-single
--i-demultiplexed-sequences single-end-demux.qza
--p-adapter CCATCTCATCCCTGCGTGTCTCCGACTCAG
--p-front CCTCTCTATGGGCAGTCGGTGATGTGCCAGC
--p-error-rate 0.1
--o-trimmed-sequences trimmed-seqs.qza
--verbose
However, I am quite sure that my sequences do not contain adapter and primer sequences because I search the fastq files for the sequence and ended up finding none. Correct me if I am wrong but since the command contains the sequence to be trimmed, I expect that in instances where none of the sequences in the fastq files contained the mentioned sequences, the samples (fastq files) remains unaltered.

Therefore, I think there is something wrong with the parameters I set in the denoising step but I can not figure out what it is. I would highly appreciate your help.

Thank you in advance,
Brigitta

Hi @Brigitta1,
If this is the same data as your other post it could be that the import warning is effecting which sequences are being imported.

Can you expand on how you created a phylogenetic tree in galaxy? Also when you say that you created an OTU table is it the table that qiime dada2 denoise-pyro generated our did you use another command for the OTU creation? If the phylogeny was based off ASV and the table was based off OTUs that might cause the difference you are seeing.

If you feel comfortable could you attach your demux.qzv and table-dada2.qza or DM me the file so I could investigate the data a little more?

Hope that helps!
:turtle:

1 Like

No, I used another command for this.

This is the command I used. The table.qza file in the input-path was the output file from the qiime dada2 denoise-pyro.

Okay, Thank you so much.

Hi @Brigitta1,
I need a little bit more info before I can help.

  1. Can you elaborate on how you generated the phylogenetic tree?
  1. So the table you generated with dada2 is an ASV table. Did you run another command to cluster your OTUs? Can you include the OTU clustering command you used?

I think the discrepancy in your data might be the difference between number of ASVs and number of OTUs in your data. ASV are all the unique sequences in the data and OTUs are clustered based on a similarity threshold so it would make sense for there to be less OTUs than ASV.

Hope that helps!
:turtle:

1 Like

The phylogenetic tree was generated by the sequencing company itself so I am unable to give any information regarding this. However, I know that it was done using tools on a galaxy platform.

Yes, The commands used are given below. The rep-seqs.qza and table.qza are both output files from the dada2 denoise-pyro step.

qiime feature-classifier classify-sklearn **
--i-classifier gg-13-8-99-515-806-nb-classifier.qza **
--i-reads rep-seqs.qza **
--o-classification taxonomy.qza

qiime metadata tabulate **
--m-input-file taxonomy.qza **
--o-visualization taxonomy.qzv

qiime taxa barplot **
--i-table table.qza **
--i-taxonomy taxonomy.qza **
--o-visualization taxa-bar-plots.qzv

The problem is that both my ASVs and OTUs are really low. Only a very few sequences pass through my denoise step.

Hi @Brigitta1,
looking at your Demux file you are getting the same phred warning about it being out of range. I am not an ion torrent expert but it seems like the quality scores are not exactly the same as illumina and it is causing issues.

I think the best you can do is probably increase --p-trim-left so that there are no really low quality reads at the beginning.

Test that out and also look at the dada2 stats file to see where you are losing your reads.

Hope that helps!
:turtle:

1 Like

Hi @cherman2 ,
Thank you for the information!

According to the TSV file downloaded from the Demux file, I feel like the more I trim, the more sequences I lose. Trimming reduces the number of sequences that pass through the denoising step to about <1%, which is very low.

Also, I have trimmed the adapters from my sequences using the cutadapt plugin. Is there a way for me to be certain that my adapters have been removed? Could it be the reason I am losing a lot of sequences in the denoising step?

Can you send me the denoising-stats file you get out of dada2 denoise pyro?

Can you send me it for this command

but also one where you increase trim-left to 40? because those quality scores seems really low before that.

--p-trim-left 40

You could search your rep-seqs you get out of dada2 for the adapters to check if they are still in the sequences.

I dont think this would be the reason your are losing sequences, I think it mostly is an issue for joining sequences.

1 Like

Okay, sure!

I noticed that I am losing a lot of sequences during the filtering step. Can I know what I can do about it?

Thank you in Advance @cherman2

Hi @Brigitta1,
Looking at your stats.qzv, you are losing a lot of sequences in the filtering step. I also noticed that you are saving more sequences in the data where you trim-left to 40.

Here are the 2 steps I would try:

First, Try trimming more of the front of your sequences and see if that increased the amount of sequences that pass through filtering.

Second, Try upping the --p-max-ee parameter. The default is 2 but increasing it might help get sequences through filtering. However, The more you increase this the more expected errors you are allowing.

Hope that helps!
:turtle:

1 Like

Hi @cherman2

Thank you so much for your suggestions. I tried the 2 steps you suggested and they did decrease the loss in the number of sequences in the filtering step. I set the trim-left to 30, trunc-len to 180 and --p-max-ee to 4. I have sent you the denoising-stats file for it. I am still not satisfied with the number of sequences I am left with. Is there anything else I can do about it?

The amount of representative sequences I get is now around 1100 but only very few of them gets assigned into taxonomic units even with a taxonomic confidence level of 6.5. I have about 25 samples and each of them has only about 40-60 different bacterial species in them, which is really low. Can I please know what I can do about this?

Thank you In Advance,
Brigitta

Hi @Brigitta1,
You are right, you are still losing a lot of sequences.
Can you send me your demux.qza so that I can try messing with your filtering parameters? I'll report back what I find.
:turtle:

2 Likes

Yes, thank you so much. I tried changing a few parameters myself.
「trunc_len: 150」
「trim_left: 30」
「max_ee: 4.0」
「trunc_q: 2」
「max_len: 0」
This increased the percentage input non-chimeric to about 40%-50% but now the sequences don't get classified into OTUs. Even the taxa bar plot shows only about 21 different species.

Hi @Brigitta1,
It sounds like you were able to evade some of the chimera filtering. However, you really need the chimeric sequences to be filter out. I suspect that they are not getting classified into OTUs because they are chimeras and not "real" sequences.

For a little background, chimeras are sequences that are a combination of two or more sequences that get "tangled" during PCR. Then when they are sequenced, the chimeric sequence is a combination of 2 or more bacteria's genetic information.

So you do want to filter out those sequences because they are essentially useless. When we are messing around with dada2 parameters we are trying to optimize the denoising and filtering steps but there is not really anything we can do computationally to make chimeric sequences usable.

My advice is to keep playing around with parameters so you can get the most sequences passed the denoising and filtering steps while still filtering the chimeras.

I hope that helps!

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.