Unable to denoise my sequences using DADA2

dinul_anuka · January 13, 2024, 8:13pm

I tried to denoise my Fastq.gz file using DADA2 Denoise pyro on galaxy environment.But when I select Truc_lenint as 1574 ( where quality starts to drop below 20 at 50th percentile) It gives me this error.

error
An error occurred with this dataset:
This plugin encountered an error:
No reads passed the filter. trunc_len
(1574) may be longer than read
lengths, or other arguments (such as
max_ee or trunc_q) may be preventing
reads from passing the filter.

Here is my visualization After running Cutadapt trim and demux summerize
https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fcancer.usegalaxy.org%2Fdisplay_application%2F31a3451e5d1dd80b%2Fq2view%2Fmain_q2view%2Fc545c7623fd22aa1%2Fdata%2Fgalaxy_31a3451e5d1dd80b.qzv

I analyzed the .tsv file also with ChatGPT to get an idea where to Truncate, and gives me this,

ChatGPT-
Summary of the Analysis of the First TSV File:

Data Characteristics: The file contained summary statistics for the quality scores of sequencing reads at each position. It was used to understand the quality distribution across the sequence lengths.
Quality Score Observation: The analysis revealed that the median quality score fell below a conservative threshold (initially set at 20, then adjusted to 15 for Ion Torrent data) right from the beginning of the sequences.
Challenge in Determining Truncation Point: Due to the consistently low quality scores from the start, it was challenging to determine an appropriate truncation point based on quality scores alone.

Summary of Truncation Parameters Suggestions:

Context of 16S rRNA Sequencing: Considering that you are working with 16S rRNA gene sequences from Ion Torrent data, the typical lengths of these regions (like V3-V4) range from approximately 400 to 500 base pairs.
Initial Trim (trimLeft): DADA2's recommendation for Ion Torrent data includes trimming the first 15 bases from each read (trimLeft=15) to remove low-quality bases at the start.
Suggested Truncation Lengths (truncLen):

Given the typical length of 16S rRNA regions, and after accounting for the initial trimming of 15 bases, I suggested experimenting with truncation points (truncLen) of 385, 435, and 485 bases. These lengths aim to capture the complete 16S rRNA regions while considering the quality profile of Ion Torrent sequencing data.

Iterative Adjustment Approach: Due to the low-quality scores and the nature of Ion Torrent data, it was advised to start with these truncation points and then iteratively adjust them based on the results, focusing on balancing the sequence quality and quantity.

Based on this I tried 385 as truncation length but here are the results.
Input - 5200
Filtered - 473
% Passed the filter 0.91%
Denoised - 1
Non - Chimeric - 1
and it is not good.
Can anyone help me find a way out of this?(This analysis is for my university research and it is very crucial to me)

colinbrislawn · January 14, 2024, 7:39pm

Welcome to the forums! :qiime2:

Thank you for posting your cutadapt output and quality score plot.

These reads are longer than Illumina reads! Is this truly 454 pyrosequencing, or PacBio, or something else? Once we know more about how your reads were sequenced we can suggest what to do next.

Related, I really like your use of ChatGPT to investigate more. I agree with its conclusion that the quality scores are low. Also note the advice from Qiime2 "Some of the forward PHRED quality values are out of range." Check on the phread offset setting during import.

Let us know what you try next and if you have any questions.

dinul_anuka · January 15, 2024, 11:36am

Thanks @colinbrislawn for your prompt reply. These are Ion-Torrent sequenced data.

These are my importing parameters,
Type of Data to import - Sequence with quality
Format to import from - Casava one eight single lane per sample directory format
I got those parameters by experiment with different parameters, other parameters gave me errors upon importing.

colinbrislawn · January 15, 2024, 7:22pm

Can you post the full DADA2 command you ran?

Note this recommendation from the LLM:

That looks like the R API. For the Qiime2 DADA2 plugin, see:
https://docs.qiime2.org/2023.9/plugins/available/dada2/denoise-pyro/

Based on your quality scores, try a higher number, like --p-trim-left 50. I would hope to see >50% of the reads passing the filtering step.

system · February 16, 2024, 1:23am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.