Understanding min sequence length of F and R/ quality scores and how it is impacting dada2 downstream

KaraS · May 24, 2025, 3:04pm

Hello friends-
I have just been assigned the bioinformatics part of a study after someone left their position.
My end goal is to classify sequences in samples that used ANML primers for COI.

After trimming primers I looked at the output quality plots and min sequence length:

Is it weird that the min sequencing length is so different between F and R? Ive don't this analysis twice with my own projects- and I have never had this issue.
I believe it is impacting my dada2 downstream- I am getting
Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

I have tried many different trunc lengths- but I suppose that is an issue to tackle later down the line?

I am using qiime2-2019.10 because that is what my classifier was trained on.

Thanks for help!

colinbrislawn · May 25, 2025, 3:34pm

Hello Kara,

Can you post the full error from DADA2? That should give us more context for what may have caused this error.

Yes? My best guess is that R2 has been trimmed by another program before importing into a Qiime2 artifact.

KaraS · May 26, 2025, 5:15am

R version 3.5.1 (2018-07-02)
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.2 / RcppParallel: 4.4.4

Filtering The filter removed all reads: /var/folders/yp/gmbd6_256wnbb6qyf3s3v_6r0000gn/T/tmpkg09sibn/filt_f/428_138_L001_R1_001.fastq.gz and /var/folders/yp/gmbd6_256wnbb6qyf3s3v_6r0000gn/T/tmpkg09sibn/filt_r/428_139_L001_R2_001.fastq.gz not written.
The filter removed all reads: /var/folders/yp/gmbd6_256wnbb6qyf3s3v_6r0000gn/T/tmpkg09sibn/filt_f/296-2_96_L001_R1_001.fastq.gz and /var/folders/yp/gmbd6_256wnbb6qyf3s3v_6r0000gn/T/tmpkg09sibn/filt_r/296-2_97_L001_R2_001.fastq.gz not written.
Some input samples had no reads pass the filter.
.........................................................................................................................................x....................x.................................................................................................
Learning Error Rates
289560551 total bases in 1067489 reads from 4 samples will be used for learning the error rates.
244454981 total bases in 1067489 reads from 4 samples will be used for learning the error rates.
Denoise remaining samples ..................................................................................................................................................................................................................................................Error in dada(drpR, err = errR, multithread = multithread, verbose = FALSE) :
Invalid derep$quals matrix. Quality values must be positive integers.
Execution halted
Traceback (most recent call last):
File "/Users/karasnow/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 257, in denoise_paired
run_commands([cmd])
File "/Users/karasnow/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
subprocess.run(cmd, check=True)
File "/Users/karasnow/miniconda3/envs/qiime2-2019.10/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/var/folders/yp/gmbd6_256wnbb6qyf3s3v_6r0000gn/T/tmpkg09sibn/forward', '/var/folders/yp/gmbd6_256wnbb6qyf3s3v_6r0000gn/T/tmpkg09sibn/reverse', '/var/folders/yp/gmbd6_256wnbb6qyf3s3v_6r0000gn/T/tmpkg09sibn/output.tsv.biom', '/var/folders/yp/gmbd6_256wnbb6qyf3s3v_6r0000gn/T/tmpkg09sibn/track.tsv', '/var/folders/yp/gmbd6_256wnbb6qyf3s3v_6r0000gn/T/tmpkg09sibn/filt_f', '/var/folders/yp/gmbd6_256wnbb6qyf3s3v_6r0000gn/T/tmpkg09sibn/filt_r', '0', '229', '0', '0', '2.0', '2.0', '2', 'consensus', '1.0', '0', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/karasnow/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/commands.py", line 328, in call
results = action(**arguments)
File "</Users/karasnow/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/decorator.py:decorator-gen-459>", line 2, in denoise_paired
File "/Users/karasnow/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py", line 240, in bound_callable
output_types, provenance)
File "/Users/karasnow/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py", line 383, in callable_executor
output_views = self._callable(**view_args)
File "/Users/karasnow/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 272, in denoise_paired
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

colinbrislawn · May 26, 2025, 3:07pm

Invalid derep$quals matrix. Quality values must be positive integers.

This is the core of the error!

Once I'm back at my computer, I can help you more with this error.

colinbrislawn · May 27, 2025, 1:47am

Here are two threads related to this error:

It sounds pretty rare! I'm not sure what's best here.

While you review the threads, let's see what the other mods have to say!

KaraS · May 28, 2025, 2:35am

hmm I am still trying to figure out my issue! I do hope I can get this figured out.
I am not sure how to figure out which samples might have the weird Phred scores

KaraS · May 28, 2025, 12:04pm

Is there a way to remove samples after demultiplex? I saw that someone was able to get past this error by removing extremely low read samples. I would rather not have to go through demultiplex again.

Nicholas_Bokulich · May 28, 2025, 12:07pm

Hi @KaraS ,

Yes, I think demux filter-samples is the action that you are looking for:

Good luck!

KaraS · June 2, 2025, 5:17am

could I just get an opinion on what my trunc F and R should be ? I have tried multiple lengths- but maybe I am just getting this wrong
PrimerTrimmed_eDNA_Invas3.qzv (300.6 KB)

colinbrislawn · June 2, 2025, 3:33pm

Good afternoon,

Did we ever figure out if the files listed in eDNA_ManifestHighQual.csv contained raw fastq data or if it was already trimmed / processed in some way?
(I'm trying to track all the possibilities!)

Here's where I would try trimming.

Where did you try trimming when running DADA2?
What did the DADA2 stats tell you after using these settings?

KaraS · June 2, 2025, 5:12pm

Hi thank you for the reply!
So would this mean
for forward:
trim. 257
trunc 275
and reverse
trunc 275?

Just by visually inspecting the fastq they look like raw data- with primers still attached. But I emailed the sequencing center to get more details

KaraS · June 2, 2025, 5:13pm

Here are different trunc lengths I have tried:

--p-trunc-len-f 187
--p-trunc-len-r 187 \

--p-trunc-len-f 276
--p-trunc-len-r 276 \

--p-trunc-len-f 291
--p-trunc-len-r 273 \

--p-trunc-len-f 200
--p-trunc-len-r 240 \

colinbrislawn · June 2, 2025, 9:28pm

Yes!

Ah, I meant that as two different trunc length options. I'm not sure which one will work best.
(For reference, trim removes bases from the start of the read and I don't see any need for that in your data.)

That's great! Did you view the DADA2 stats file after? Want to post those here so I can take a look?

Let me know what they say! If they can give you unprocessed fastq files, that's ideal.

KaraS · June 3, 2025, 5:13am

Hiiii
with all of the trunc lengths I have tried I have gotten the same(ish) error code about negative quality scores. (Invalid derep$quals matrix. Quality values must be positive integers.)
I feel pretty confident that the fastq files are raw - the certainly contain primer sequences

It seems like I have used trunc lengths very similar to what you suggested and still have gotten the error....
which leads me to believe there is something else going on here

colinbrislawn · June 3, 2025, 2:41pm

Totally!
(My apologies, I thought you had fixed the error and had moved on to optimizing settings.)

In the DADA2 github repo, I found this thread: Error in dada(derepFs, err = errF, multithread = TRUE) : Invalid derep$quals matrix. Quality values must be positive integers. · Issue #838 · benjjneb/dada2 · GitHub

I'm not sure we have that option in Qiime2, but we may be able to convert the quality scores when we import the data.

Let's see what the devs recommend!
EDIT: The importing tutorial is here. Looks like it may be under construction.

system · July 4, 2025, 8:42pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.